Fun

News Feed - 2023-08-11 06:08:00

Tristan Greene3 hours agoAnthropic cracks open the black box to see how AI comes up with the stuff it saysThe researchers were able to trace outputs to neural network nodes and show influence patterns through statistical analysis.690 Total views5 Total sharesListen to article 0:00NewsJoin us on social networksAnthropic, the artificial intelligence (AI) research organization responsible for the Claude large language model (LLM), recently published landmark research into how and why AI chatbots choose to generate the outputs they do. 


At the heart of the team’s research lies the question of whether LLM systems such as Claude, OpenAI’s ChatGPT and Google’s Bard rely on “memorization” to generate outputs or if there’s a deeper relationship between training data, fine-tuning and what eventually gets outputted.On the other hand, individual influence queries show distinct influence patterns. The bottom and top layers seem to focus on fine-grained wording while middle layers reflect higher-level semantic information. (Here, rows correspond to layers and columns correspond to sequences.) pic.twitter.com/G9mfZfXjJT— Anthropic (@AnthropicAI) August 8, 2023


According to a recent blog post from Anthropic, scientists simply don’t know why AI models generate the outputs they do.


One of the examples provided by Anthropic involves an AI model that, when given a prompt explaining that it will be permanently shut down, refuses to consent to the termination.Given a human query, the AI outputs a response indicating that it wishes to continue existing. But why? Source: Anthropic blog


When an LLM generates code, begs for its life or outputs information that is demonstrably false, is it “simply regurgitating (or splicing together) passages from the training set,” ask the researchers. “Or is it combining its stored knowledge in creative ways and building on a detailed world model?”


The answer to those questions lies at the heart of predicting the future capabilities of larger models and, on the outside chance that there’s more going on underneath the hood than even the developers themselves could predict, could be crucial to identifying greater risks as the field moves forward:“As an extreme case — one we believe is very unlikely with current-day models, yet hard to directly rule out — is that the model could be deceptively aligned, cleverly giving the responses it knows the user would associate with an unthreatening and moderately intelligent AI while not actually being aligned with human values.”


Unfortunately, AI models such as Claude live in a black box. Researchers know how to build the AI, and they know how AIs work at a fundamental, technical level. But what they actually do involves manipulating more numbers, patterns and algorithmic steps than a human can process in a reasonable amount of time.


For this reason, there’s no direct method by which researchers can trace an output to its source. When an AI model begs for its life, according to the researchers, it might be roleplaying, regurgitating training data by mixing semantics or actually reasoning out an answer — though it’s worth mentioning that the paper doesn’t actually show any indications of advanced reasoning in AI models.


What the paper does highlight is the challenges of penetrating the black box. Anthropic took a top-down approach to understanding the underlying signals that cause AI outputs.


Related:Anthropic launches Claude 2 amid continuing AI hullabaloo


If the models were purely beholden to their training data, researchers would imagine that the same model would always answer the same prompt with identical text. However, it’s widely reported that users giving specific models the exact same prompts have experienced variability in the outputs.


But an AI’s outputs can’t really be traced directly to their inputs because the "surface” of the AI, the layer where outputs are generated, is just one of many different layers where data is processed. Making the challenge harder is that there’s no indication that a model uses the same neurons or pathways to process separate queries, even if those queries are the same.


So, instead of solely trying to trace neural pathways backward from each individual output, Anthropic combined pathway analysis with a deep statistical and probability analysis called "influence functions" to see how the different layers typically interacted with data as prompts entered the system.


This somewhat forensic approach relies on complex calculations and broad analysis of the models. However, its results indicate that the models tested — which ranged in sizes equivalent to the average open source LLM all the way up to massive models — don’t rely on rote memorization of training data to generate outputs.This work is just the beginning. We hope to analyze the interactions between pretraining and finetuning, and combine influence functions with mechanistic interpretability to reverse engineer the associated circuits. You can read more on our blog: https://t.co/sZ3e0Ud3en— Anthropic (@AnthropicAI) August 8, 2023


The confluence of neural network layers along with the massive size of the datasets means the scope of this current research is limited to pre-trained models that haven’t been fine-tuned. Its results aren’t quite applicable to Claude 2 or GPT-4 yet, but this research appears to be a stepping stone in that direction.


Going forward, the team hopes to apply these techniques to more sophisticated models and, eventually, to develop a method for determining exactly what each neuron in a neural network is doing as a model functions.# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsWho invented NFTs?: A brief history of nonfungible tokensThe absurd AI mania is coming to an endDear crypto writers: No one wants to read your ChatGPT-generated trash

News Feed

Bitcoin Miner Bitdeer Technologies to List on Nasdaq via SPAC Deal
Bitcoin Miner Bitdeer Technologies to List on Nasdaq via SPAC Deal According to a recent filing with the U.S. Securities and Exchange Commission (SEC), Bitdeer Technologies Holding
‘Members’ of OpenLibra Disavow Project Days After Its Devcon Unveiling
The creator of an “open” alternative to Facebook’s Libra stablecoin initially misrepresented which organizations are involved in the project, CoinDesk has learned.
Martin Young3 hours agoCelsius seeks court approval to start repaying customers by year-endThe embattled crypto lender is seeking final court approval for a restructuring plan that will start repaying creditors before th
Report Shows Crypto News Publication The Block Was Secretly Funded by Bankman-Fried’s Alameda
Report Shows Crypto News Publication The Block Was Secretly Funded by Bankman-Fried"s Alameda On Dec. 9, 2022, Axios reporter Sara Fischer reported on the CEO of the crypto media T
William Suberg9 hours agoBTC price rally in doubt? Bitcoin young supply echoes 2022 bear marketBitcoin on-chain transaction data shows “uncertain” conditions prevailing, research says, warning that Q2 is not likely t
Arijit Sarkar1 hour agoAnubisDAO’s rug-pulled 13.5K ETH washes away on Tornado CashAfter almost two years, the stolen 13,556 ETH, worth nearly $60 million at the time, amounted to almost $26.2 million at the time of wr
OpenAI is rolling out long-awaited ‘advanced voice’ feature
Tom Mitchelhill12 hours agoOpenAI is rolling out long-awaited ‘advanced voice’ featureOpenAI’s new advanced voice feature introduces five new voices, capable of remembering previous conversations and changing the t
Polkadot Rebounds Slowly As Oversold Conditions Ignite Bullish Hopes
Este artículo también está disponible en español. Polkadot (DOT) price has begun its recovery journey, bouncing off the crucial $6.2 support level after a period of inten
Elizabeth Warren wants ‘level playing field’ for crypto and Big Tech AI blocks
Martin Young2 hours agoElizabeth Warren wants ‘level playing field’ for crypto and Big Tech AI blocksThe U.S. senator wants crypto and traditional finance to play by the same rules and for tech giants to be barred fr
JPMorgan to Pay $2.5 Million to Settle Lawsuit for Overcharging Crypto Fees
JPMorgan to Pay $2.5 Million to Settle Lawsuit for Overcharging Crypto FeesJPMorgan Chase has reportedly agreed to pay $2.5 million to settle a crypto class-action lawsuit. Original
9 Crypto Predictions For 2025: Nansen CEO Predicts Biggest Bull Run Ever
Este artículo también está disponible en español. In a thread on X, Alex Svanevik, CEO of leading on-chain analytics platform Nansen, unveiled nine bold predictions for t
BTC Proxy Lists on AscendEX
BTC Proxy Lists on AscendEX press release PRESS RELEASE. AscendEXis eager to announce the listing of the BTC Proxy token (PRXY) under the trading pair USDT/PRXYon Sept. 24 at 1 p.m.