sub:assertion {
sub:assertion dcterms:creator <
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts> ;
a schema:Claim ,
schema:Observation ,
schema:Question ;
rdfs:comment """ Scaling laws don't care about scale of the \"train\" models?
Did anyone else get this?
When I predict a scaling law, the scale of the largest model matters, but the num-models for fitting matters much much much more.
Initial results, scaling error by #models starting from largest https://twitter.com/LChoshen/status/1803401845626511568/photo/1
Maybe more simply put:
You can predict a scaling law with 8 small models, and it would be better than 3 large ones (that costs a lot)
Is that something anyone else seen?
""" ;
schema:keywords "AI" , "cost" , "initialresults" , "models" , "modelscale" , "scalinglaws" , "training" .
}