Someone created a framework for evaluating LLM's ability
to write Cobol.
https://bloop.ai/blog/evaluating-llms-on-cobol
For those that do not bother reading the entire article,
then the conclusion at the bottom is:
<quote>
GPT-4 - the best-performing model - generates a correct solution for
10.27% of problems. Compare this to HumanEval, where it solves 67% of problems. CodeLlama, one of the best open-source coding models, fares
even worse, with the 34b variant only clocking 2%. COBOLEval is hard.
Looking at the failure cases, we can see that state-of-the-art LLMs
struggle to generate COBOL that even compiles. Only 47.94% of GPT-4
generated solutions compile with GnuCOBOL.
</quote>
Arne
--- MBSE BBS v1.0.8.6 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)