ʻO ka ARC-AGI Benchmark

ʻO ka hāʻule mau ʻana o ke kumukūʻai o ka mana helu helu—ʻelua mau kauoha o ka nui i kēlā me kēia ʻumi makahiki—ua hoʻoulu nui i nā kumu hoʻonaʻauao hohonu mai ka makahiki 2010. ʻO nā pūnaewele nui aʻe a me nā ʻikepili ʻē aʻe me he mea lā i hāʻawi ʻia i nā helu kiʻekiʻe loa ma nā pae maʻamau-a hoʻoulu i ka manaʻolana ʻo ka scaling wale nō ke alakaʻi i ka AGI. I ka hoʻomaka ʻana o 2019, ua hoʻolauna ʻo François Chollet i ka benchmark ARC-AGI . e ana i ka naauao.


ʻO nā hoʻokolohua e like me ka MMLU a i ʻole HELM e ana nui i ka ʻike i hoʻopaʻa ʻia. ʻO ka mea i nalowale he hōʻailona o ka naʻauao wai—ka hiki ke hoʻomaopopo a hoʻoponopono i kahi pilikia hou loa ad hoc. Aia ka ARC-AGI-1 ("Abstract and Reasoning Corpus for Artificial General Intelligence") he 1,000 mau hana kūikawā ʻaʻole hiki ke "aʻo."

He mea hou kēlā me kēia puʻupuʻu, koi wale i ka ʻike kumu o kēlā me kēia lā (mea, helu, geometry maʻalahi), a aia ma lalo o ka pae kindergarten—no nā kānaka. ʻOiai ma hope o ka lele ʻana o 50,000-fold mai nā LLM maʻamau, ua mau ka helu pā ma luna o 0%. Ma waho aʻe o ka papa alakaʻi , hiki iā ʻoe ke hoʻāʻo i nā luʻi hoihoi pololei ma ka pūnaewele official.:

ʻAʻole hiki i ka makahiki 2024 kahi ala hou i haki ai i ka paʻa: Test-Time Adaptation (TTA) hiki i nā mea hoʻohālike ke hoʻololi i kā lākou mau kaupaona a i ʻole kahi papahana synthesis i ka wā holo. Ua hōʻike ʻo OpenAI i ka O3 i hoʻopaʻa maikaʻi ʻia i loko o ke kanaka no ka manawa mua. Mai ia manawa mai, ua hoʻohana nā ʻano hana ARC holomua i kekahi ʻano o ka TTA—mai ka huli polokalamu a hiki i ka hoʻomaʻamaʻa ʻana ma ka lele.

Hoʻopiha koke ka hana kanaka i ka ARC1, no laila ua hahai ʻo ARC-AGI-2 . Mālama ʻo ia i ke ʻano I/O akā hoʻonui i ka paʻakikī o kēlā me kēia hana. 400 mau kumuhana ma San Diego i hoʻoholo i nā hana a pau; He ʻumi mau kānaka i koho ʻia me ka hapa nui e loaʻa iā 100%. Noho nā LLM me ka ʻole o TTA ma ka 0-2%, akā ke hana mau nei nā ʻōnaehana TTA ma lalo o ke kanaka.

ʻO ka ARC-AGI-3 e hele hou aku ana: Hoʻolei ʻia ke kŘkohu i loko o nā kaiapuni pili, ʻike ʻole ʻia a pono e ʻike i kāna pahuhopu, nā mana, a me ka physics ma kāna iho-ʻoiai e hana ana pēlā i ka manawa a me ka hana kūpono. Hoʻokuʻu ʻia kahi mea hoʻomohala no ka hoʻokuʻu ʻana i Iulai 2025. No ka hoʻomaʻamaʻa ʻana i ka hōʻuluʻulu haku mele, pono e hui pū nā ʻōnaehana i nā ʻano ʻelua. Aia ke kī i ka wikiwiki, ma kahi o Type 1 heuristics e hoʻokaʻawale i ka pahū hui.

ʻAʻole lawelawe ʻo ARC ma ke ʻano he pahuhopu hope, akā ma ke ʻano he pua kuhikuhi: ʻoiai hiki i nā kānaka ke hoʻolālā maʻalahi i nā hana i hiki ʻole i nā LLM maikaʻi loa ke hoʻokō ʻia, ʻaʻole i hoʻokō ʻia ka AGI. ʻO ka holomua ma ARC2—a ʻaʻole koke ʻo ARC3—e hōʻike ʻia inā loaʻa i nā hale hoʻolālā hybrid e hui pū ana i ke aʻo hohonu a me ka ʻimi papahana i ka pae kūpono o ka wai, ka ʻikepili a me ka naʻauao kūpono.

Hope