This is outdated information. There are models that do math just fine. Most recently GPT 4o1 and the various o3 models. Deepseek R1, and some of the recently released Gemini 2.0 models do math just fine.
When I write, "just fine" I mean they exceed human capabilities in most cases. A friend of mine is dealing with a renovation project and his engineer produced a beam design with a steel flitch plate that was overkill for the application. Beam and tributary load calculations are something I'm quite familiar with, but they get complicated with flitch plates and I am not an engineer by trade, nor do I have access to the modeling software commonly used for such things.
In any case, I used both GPT-4o1 and Deepseek R1 to produce a beam design. Both worked, though the former was better. I spot checked the calculations for accuracy. Given that GPT-4o1 yielded a better design (both were correct, one was easier to build), I iterated on it and then had the model produce formulas that I could use to validate the correctness of the calculation independently. I used Deepseek R1 to do some of that validation and hand-calculation for others.