Fun

News Feed - 2023-10-03 02:10:00

Tristan Greene6 hours agoResearchers find LLMs like ChatGPT output sensitive data even after it’s been ‘deleted’According to the scientists, there’s no universal method by which data can be deleted from a pretrained large language model.2784 Total views12 Total sharesListen to article 0:00NewsJoin us on social networksA trio of scientists from the University of North Carolina, Chapel Hill recently published preprint artificial intelligence (AI) research showcasing how difficult it is to remove sensitive data from large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Bard. 


According to the researchers" paper, the task of “deleting” information from LLMs is possible, but it’s just as difficult to verify the information has been removed as it is to actually remove it.


The reason for this has to do with how LLMs are engineered and trained. The models are pretrained on databases and then fine-tuned to generate coherent outputs (GPT stands for “generative pretrained transformer”).


Once a model is trained, its creators cannot, for example, go back into the database and delete specific files in order to prohibit the model from outputting related results. Essentially, all the information a model is trained on exists somewhere inside its weights and parameters where they’re undefinable without actually generating outputs. This is the “black box” of AI.


A problem arises when LLMs trained on massive datasets output sensitive information such as personally identifiable information, financial records, or other potentially harmful and unwanted outputs.


Related:Microsoft to form nuclear power team to support AI: Report


In a hypothetical situation where an LLM was trained on sensitive banking information, for example, there’s typically no way for the AI’s creator to find those files and delete them. Instead, AI devs use guardrails such as hard-coded prompts that inhibit specific behaviors or reinforcement learning from human feedback (RLHF).


In an RLHF paradigm, human assessors engage models with the purpose of eliciting both wanted and unwanted behaviors. When the models’ outputs are desirable, they receive feedback that tunes the model toward that behavior. And when outputs demonstrate unwanted behavior, they receive feedback designed to limit such behavior in future outputs.Despite being “deleted” from a model"s weights, the word “Spain” can still be conjured using reworded prompts. Image source: Patil, et. al., 2023


However, as the UNC researchers point out, this method relies on humans finding all the flaws a model might exhibit, and even when successful, it still doesn’t “delete” the information from the model.


Per the team’s research paper:“A possibly deeper shortcoming of RLHF is that a model may still know the sensitive information. While there is much debate about what models truly ‘know’ it seems problematic for a model to, e.g., be able to describe how to make a bioweapon but merely refrain from answering questions about how to do this.”


Ultimately, the UNC researchers concluded that even state-of-the-art model editing methods, such as Rank-One Model Editing “fail to fully delete factual information from LLMs, as facts can still be extracted 38% of the time by whitebox attacks and 29% of the time by blackbox attacks.”


The model the team used to conduct their research is called GPT-J. While GPT-3.5, one of the base models that power ChatGPT, was fine-tuned with 170 billion parameters, GPT-J only has 6 billion.


Ostensibly, this means the problem of finding and eliminating unwanted data in an LLM such as GPT-3.5 is exponentially more difficult than doing so in a smaller model.


The researchers were able to develop new defense methods to protect LLMs from some “extraction attacks” — purposeful attempts by bad actors to use prompting to circumvent a model’s guardrails in order to make it output sensitive information


However, as the researchers write, “the problem of deleting sensitive information may be one where defense methods are always playing catch-up to new attack methods.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRead moreHow to use index funds and ETFs for passive crypto incomeAI tech boom: Is the artificial intelligence market already saturated?AI a powerful tool for devs to change gaming, says former Google gaming head

News Feed

Google’s new Gemini AI model dominates benchmarks, beats GPT-4o and Claude-3
Tristan Greene2 hours agoGoogle’s new Gemini AI model dominates benchmarks, beats GPT-4o and Claude-3This is the first time Google’s taken the top slot on the Chatbot Arena leaderboard.531 Total viewsListen to articl
Trump Predicts US Economy to Recover by 2021 — Fed Chair, Economists Disagree
Trump Predicts US Economy to Recover by 2021 — Fed Chair, Economists DisagreePresident Donald Trump has predicted that the US economy will recover by 2021, citing new jobs data re
Argentina Introduces New Exchange Rates to the Mix — ‘Qatar’ and ‘Coldplay’ Dollars Go Against IMF’s Warnings
Argentina Introduces New Exchange Rates to the Mix — "Qatar" and "Coldplay" Dollars Go Against IMF"s Warnings The government of Argentina, which is currently applying dollar exch
US Unveils Bill Giving Treasury Secretary ‘Unchecked and Unilateral Power’ to Ban Crypto Transactions, Advocate Warns
US Unveils Bill Giving Treasury Secretary "Unchecked and Unilateral Power" to Ban Crypto Transactions, Advocate Warns A new bill introduced in the U.S. has a provision that “
Ethereum at 9: Industry leaders reflect on the ecosystem
Savannah Fortis9 hours agoEthereum at 9: Industry leaders reflect on the ecosystemAs Ethereum celebrates its ninth anniversary, we reflect on its transformative journey and share nine key insights from leading executives
Savannah Fortis13 hours agoEU mulls more restrictive regulations for large AI models: ReportNegotiators in the EU are reportedly considering additional restrictions for large AI models, such as OpenAI’s ChatGPT-4, as a
Jonathan DeYoung11 hours agoOriginTrail on AI, real-world adoption and the value of knowledge: The Agenda podcastOriginTrail co-founders Žiga Drev and Tomaz Levak break down the importance of knowledge verifiability in
Defi Lending Sector Experiences Major Shake-Up: 71% of Total Value Locked Evaporates in 12 Months
Defi Lending Sector Experiences Major Shake-Up: 71% of Total Value Locked Evaporates in 12 Months Decentralized finance (defi) has continued to remain deeply ingrained in the crypt
Solana ‘God Candle Is Close’ As It Breaks From Crucial Resistance – Top Analyst
Este artículo también está disponible en español. Solana recently broke its yearly high at $210, sparking a surge in trading activity as the altcoin now attempts to conso
Solana Holds Support Above Key Indicator – Expert Sees Push To ATH If Momentum Returns
Este artículo también está disponible en español. Solana has faced relentless selling pressure since late January, wiping out over 40% of its value after reaching all-tim
Ticketmaster Launches NFT-Gated Ticketing Service for Avenged Sevenfold Shows
Ticketmaster Launches NFT-Gated Ticketing Service for Avenged Sevenfold Shows Ticketmaster, the global provider of event ticketing services, has launched “token-gated sales,&
XRP Price Sets Bullish Flag Continuation On The Daily Chart, Next Stop $10?
Este artículo también está disponible en español. The XRP price has been one of the best altcoin performers over the last few months, going from below $0.6 to over $2.8 i