The Alignment Research Center (ARC) is a nonprofit research institute based in

Berkeley, California Berkeley ( ) is a city on the eastern shore of San Francisco Bay in northern Alameda County, California, United States. It is named after the 18th-century Irish bishop and philosopher George Berkeley. It borders the cities of Oakland and Emer ...

, dedicated to the

alignment Alignment may refer to: Archaeology * Alignment (archaeology), a co-linear arrangement of features or structures with external landmarks * Stone alignment, a linear arrangement of upright, parallel megalithic standing stones Biology * Structu ...

of advanced

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

with human values and priorities. Established by former

OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...

researcher Paul Christiano, ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models.

Details

ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment. They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced. ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy. In March 2022, the ARC received $265,000 from

Open Philanthropy Open Philanthropy is a research and grantmaking foundation that makes grants based on the doctrine of effective altruism. It was founded as a partnership between GiveWell and Good Ventures. Its current co-chief executive officers are Holden K ...

. After the bankruptcy of

FTX FTX Trading Ltd., commonly known as FTX (short for "Futures Exchange") is a bankrupt company that formerly operated a cryptocurrency exchange and crypto hedge fund. The exchange was founded in 2019 and, at its peak in July 2021, had over one mi ...

, ARC said it would return a $1.25 million grant from disgraced cryptocurrency financier

Sam Bankman-Fried Samuel Benjamin Bankman-Fried (born March 6, 1992), also known by the initialism SBF, is an American suspected fraudster, entrepreneur, investor, and former billionaire. Bankman-Fried was the founder and CEO of the cryptocurrency exchange FT ...

's FTX Foundation, stating that the money "morally (if not legally) belongs to FTX customers or creditors." In March 2023, OpenAI asked the ARC to test

GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...

to assess the model's ability to exhibit power-seeking behavior. ARC evaluated GPT-4's ability to strategize, reproduce itself, gather resources, stay concealed within a server, and execute phishing operations. As part of the test, GPT-4 was asked to solve a

CAPTCHA A CAPTCHA ( , a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge–response test used in computing to determine whether the user is human. The term was coined in 2003 b ...

puzzle. It was able to do so by hiring a human worker on TaskRabbit, a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked. ARC determined that GPT-4 responded impermissibly to prompts eliciting restricted information 82% less often than GPT-3.5, and hallucinated 60% less than GPT-3.5.

References

External links

Official website
{{Existential risk from artificial intelligence, state=expanded Artificial intelligence Existential risk from artificial general intelligence

Details

See also

References

External links