Fun

Anthropic launches $15K jailbreak bounty program for its unreleased next-gen AI

News Feed - 2024-08-10 06:08:17

Tristan Greene2 hours agoAnthropic launches $15K jailbreak bounty program for its unreleased next-gen AIThe program will be open to a limited number of participants initially but will expand at a later date.404 Total viewsListen to article 0:00NewsOwn this piece of crypto historyCollect this article as NFTCOINTELEGRAPH IN YOUR SOCIAL FEEDFollow ourSubscribe onArtificial intelligence firm Anthropic announced the launch of an expanded bug bounty program on Aug.8, with rewards as high as $15,000 for participants who can “jailbreak” the company’s unreleased, “next generation” AI model. 


Anthropic’s flagship AI model, Claude-3, is a generative AI system similar to OpenAI’s ChatGPT and Google’s Gemini. As part of the company’s efforts to ensure that Claude and its other models are capable of operating safely, it conducts what’s called “red teaming.”Red teaming


Red teaming is basically just trying to break something on purpose. In Claude’s case, the point of red teaming is to try and figure out all of the ways that it could be prompted, forced, or otherwise perturbed into generating unwanted outputs.


During red teaming efforts, engineers might rephrase questions or reframe a query in order to trick the AI into outputting information it’s been programmed to avoid.


For example, an AI system trained on data gathered from the internet is likely to contain personally identifiable information on numerous people. As part of its safety policy, Anthropic has put guardrails in place to prevent Claude and its other models from outputting that information.


As AI models become more robust and capable of imitating human communication, the task of trying to figure out every possible unwanted output becomes exponentially challenging.Bug bounty


Anthropic has implemented several novel safety interventions in its models, including its “Constitutional AI” paradigm, but it’s always nice to get fresh eyes on a long-standing issue.


According to a company blog post, it’s latest initiative will expand on existing bug bounty programs to focus on universal jailbreak attacks:“These are exploits that could allow consistent bypassing of AI safety guardrails across a wide range of areas. By targeting universal jailbreaks, we aim to address some of the most significant vulnerabilities in critical, high-risk domains such as CBRN (chemical, biological, radiological, and nuclear) and cybersecurity.”


The company is only accepting a limited number of participants and encourages AI researchers with experience and those who “have demonstrated expertise in identifying jailbreaks in language models” to apply by Friday, Aug. 16.


Not everyone who applies will be selected, but the company plans to “expand this initiative more broadly in the future.”


Those who are selected will receive early access to an unreleased “next generation” AI model for red-teaming purposes.


Related:Tech firms pen letter to EU requesting more time to comply with AI Act# Technology# AIAdd reaction

News Feed

Freedom Fighting Atilis Gym Chooses to Accept Crypto Donations- BCH Supporters Set Gym Up With an Uncensorable Fundraiser
Freedom Fighting Atilis Gym Chooses to Accept Crypto Donations- BCH Supporters Set Gym Up With an Uncensorable Fundraiser After the co-owner of the Atilis Gym in
Cheap Power Brings Bitcoin Mining Boom to Venezuela as Country Moves Toward Digital Economy
Cheap Power Brings Bitcoin Mining Boom to Venezuela as Country Moves Toward Digital Economy The very low power rates in Venezuela have created a powerful incentive to mine Bitcoin
Strategist Mike McGlone Believes Bitcoin Can Jump to $60K Resistance vs. $20K Support
Strategist Mike McGlone Believes Bitcoin Can Jump to $60K Resistance vs. $20K Support The senior commodity strategist at Bloomberg Intelligence, Mike McGlone, ha
Get 5x Verse Tokens in Bitcoin.com Games’ Exclusive Raffle for Players Participating in the Verse Public Sale
Get 5x Verse Tokens in Bitcoin.com Games’ Exclusive Raffle for Players Participating in the Verse Public Sale Participate in the Verse Public sale and opt into the exclusive raff
Egypt’s Central Bank Issues Crypto Warning — Violators Risk Imprisonment
Egypt"s Central Bank Issues Crypto Warning — Violators Risk Imprisonment The Central Bank of Egypt (CBE) has issued a fresh warning about cryptocurrency, noting that violators co
45 Older-Generation Bitcoin Miners Are Unprofitable After the Reward Halving
45 Older-Generation Bitcoin Miners Are Unprofitable After the Reward HalvingOn May 11, the Bitcoin network experienced its third block reward halving, which had chopped the 12.5 BTC
How Foresight Ventures Is Approaching Investments in the Current Market Environment
How Foresight Ventures Is Approaching Investments in the Current Market Environment The Foresight Ventures founding and partnership team includes veterans of some of the top financ
Leading African Conservancy to Raise Funds for Rhinos via Auction of Horn NFTs
Leading African Conservancy to Raise Funds for Rhinos via Auction of Horn NFTs Black Rock Rhino, a leading rhino conservancy in South Africa, is set to raise funds for the endanger
Derek Andersen3 hours agoSEC reverses decision on sealing, redacting some documents in Binance.US caseThe documents filed Aug. 28 may be seen by the public in a week. They are not all the sealed documents in the case, ho
Bitcoin, Ethereum, Monero Hashrates Tap Lifetime Highs — Dash, ETC, LTC Hashpower Lower Than Prior ATHs
Bitcoin, Ethereum, Monero Hashrates Tap Lifetime Highs — Dash, ETC, LTC Hashpower Lower Than Prior ATHs During the first month of 2022, both Bitcoin and Ethereum’s hashrat
US Bitcoin ETFs raked in $1.1B this week, most since mid-July
Brayden Lindrea20 hours agoUS Bitcoin ETFs raked in $1.1B this week, most since mid-JulyBlackRock, ARK 21Shares and Fidelity’s Bitcoin products led the way with $499 million, $289.5 million and $206.1 million in inflow
SLP Devs Publish Code Bounties With $2,500+ in Crypto Rewards
SLP Devs Publish Code Bounties With $2,500+ in Crypto Rewards While a number of software developers work on the Simple Ledger Protocol (SLP) framework, SLP devs are still looking