Qingdao Sigma Chemical Co., Ltd (International, US, EU, Canada and Australia domestic

In your example it fails at self knowledge, not math. Here’s a description of the capability and how it is distinct from the 4o model:


In my experience it works quite well. Like any of these tools, it doesn’t absolve one from reasoning and verification, but it certainly accelerates certain activities.

It is quite good at statistics and I am not. In that sense I find it to be beneficial.
To me is it seems Juice's argument is that it's not doing math the way we do math. It has a vast amount of knowledge so it can predict that 1+1 is equal to 2. it's not adding 1 on top of 1. It however has access to models that can help it do math and pull formulae so, same difference. I can see where if it hiccups a 'prediction' it will just throw a garbage result out there
 
Don't worry, I'll research for at least fifteen minutes before I order steroids.


Yep, I've been lurking since QSC went MIA. Created an account so I could react to posts and was sorely disappointed.
Give it some time and you'll be able to react.
I waited some time for that, I could react in other forum sections but NOT in the underground one, then one day it happened.
 
To me is it seems Juice's argument is that it's not doing math the way we do math. It has a vast amount of knowledge so it can predict that 1+1 is equal to 2. it's not adding 1 on top of 1. It however has access to models that can help it do math and pull formulae so, same difference. I can see where if it hiccups a 'prediction' it will just throw a garbage result out there

He is not wrong, in that sense. The only issue I have with that assertion is that each new model handles things differently. As he mentioned, some models would hand math off to a python interpreter, which is less the case recently. 4o, I think does math in the way that you and he described, by guessing at the result and doing pretty well. 4o1 implements “chain of thought” and has also undergone reinforcement learning specifically for math. As such, it’s quite good, pragmatically speaking and is unlikely to produce a hallucination as a result to a mathematical question.
 
I remember when I was a lurker not able to post. And then one day some drunk lunatic came on here randomly picking my name and saying I was his wife and a rep for tracy. Got me banned by Tracy and I couldnt even defend myself. Im assuming he just looked at all the users currently online and picked my name but I was sitting here fuming not able to even say a word lol. Luckily I was able to get myself unbanned the next day
 
I remember when I was a lurker not able to post. And then one day some drunk lunatic came on here randomly picking my name and saying I was his wife and a rep for tracy. Got me banned by Tracy and I couldnt even defend myself. Im assuming he just looked at all the users currently online and picked my name but I was sitting here fuming not able to even say a word lol. Luckily I was able to get myself unbanned the next day
That was definitely a unique, unforgettable introduction. Glad it was resolved quickly (I totally didn't believe you were a separate person for about 15 minutes after your explanation...just sobered up from the meth...lol)
 
In your example it fails at self knowledge, not math. Here’s a description of the capability and how it is distinct from the 4o model:


In my experience it works quite well. Like any of these tools, it doesn’t absolve one from reasoning and verification, but it certainly accelerates certain activities.

It is quite good at statistics and I am not. In that sense I find it to be beneficial.
We may have a different understanding of a term 'reasoning'. If you read the article you just quoted (or make chat summarize it), you'll find that it has nothing to do with mathematical computations. It's all about chain of thought and validating it's conclusions. I still stand by my original statement - LLMs can't count. They may call plugins or guess the result, but they cannot count.
 
you'll find that it has nothing to do with mathematical computations. It's all about chain of thought and validating it's conclusions

Nothing? It describes how it has improved performance over 4o in a math benchmark. It wasn’t intended to prove anything other than the distinction between it and previous models. I’ve read elsewhere that some of the RL done was specifically intended to increase performance in mathematics.

It's all about chain of thought and validating it's conclusions. I still stand by my original statement - LLMs can't count. They may call plugins or guess the result, but they cannot count.

It seems you want to stand on semantics. My original assertion was that 4o1 is good at math. Your counter-argument to that was that it can’t count, which is true in a sense, but that doesn’t change the fact that it’s still really good at math.
 
Back
Top