I know it’s popular to hate on China right now, but can we acknowledge that Chinese companies and research groups have done more for us hackers in terms of making amazing models available with open weights for free, than US companies and research groups?
Recent research results from many groups suggest otherwise. The lag between private models to competitive open models has been shrinking, same for the resources required to train and run them
The people who are spending billions on ai infra build outs want you to believe it's necessary, because frontier mega models are supposedly so much better. China has been showing us otherwise, especially being handicapped by export controls and showing how you can do more with less
> The lag between private models to competitive open models has been shrinking
It really hasn't. It's the opposite, actually. The latest breakthroughs in RL by the big4 labs haven't been replicated yet in any open model (including the latest k2-thinking). Even gemini-2.5 still delivers on generalisation in a way that no open models do, today (almost a year later). The general consensus was that "open" models were 6-8 months behind SotA, but with the RL stuff we can see they've moved further away.
I don't know what exactly it is, if it's simply RL scale, or data + scale, or better secret sauce (rewards, masking, something else) but the way these new models generalise is leagues ahead of open models, sadly.
Don't be fooled by benchmarks alone. You have to test them on problems that you own and you can be fairly sure no one is targeting for benchmark scores. Recently there was a python golfing competition on kaggle, and I tested some models on that task. While the top4 models were chugging along, in both agentic and 0shot regimes, the open models (coding specific or, older "thinking" models) were really bad at the task. 480b models, coding specific, would go in circles, get lost on one example, and so on. Night and day between the open models and gpt5/claude/gemini2.5. Even grok fast solved a lot of tasks in agentic mode.
While I agree with your comments here, I will note that the big 4 models were released this year (summer-ish) so we are still not at a point you can claim the open models are more than a year behind something that is not a year old yet
I know it’s popular to hate on China right now, but can we acknowledge that Chinese companies and research groups have done more for us hackers in terms of making amazing models available with open weights for free, than US companies and research groups?
HF link: https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking
> The model, dubbed ERNIE-4.5-VL-28B-A3B-Thinking
No way at so few parameters
Recent research results from many groups suggest otherwise. The lag between private models to competitive open models has been shrinking, same for the resources required to train and run them
The people who are spending billions on ai infra build outs want you to believe it's necessary, because frontier mega models are supposedly so much better. China has been showing us otherwise, especially being handicapped by export controls and showing how you can do more with less
> The lag between private models to competitive open models has been shrinking
It really hasn't. It's the opposite, actually. The latest breakthroughs in RL by the big4 labs haven't been replicated yet in any open model (including the latest k2-thinking). Even gemini-2.5 still delivers on generalisation in a way that no open models do, today (almost a year later). The general consensus was that "open" models were 6-8 months behind SotA, but with the RL stuff we can see they've moved further away.
I don't know what exactly it is, if it's simply RL scale, or data + scale, or better secret sauce (rewards, masking, something else) but the way these new models generalise is leagues ahead of open models, sadly.
Don't be fooled by benchmarks alone. You have to test them on problems that you own and you can be fairly sure no one is targeting for benchmark scores. Recently there was a python golfing competition on kaggle, and I tested some models on that task. While the top4 models were chugging along, in both agentic and 0shot regimes, the open models (coding specific or, older "thinking" models) were really bad at the task. 480b models, coding specific, would go in circles, get lost on one example, and so on. Night and day between the open models and gpt5/claude/gemini2.5. Even grok fast solved a lot of tasks in agentic mode.
While I agree with your comments here, I will note that the big 4 models were released this year (summer-ish) so we are still not at a point you can claim the open models are more than a year behind something that is not a year old yet