Fun

News Feed - 2023-08-09 05:08:00

Tristan Greene3 hours agoChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientistsThe scientists developed a tool called "AgentBench" to benchmark LLM models as agents.705 Total views37 Total sharesListen to article 0:00NewsJoin us on social networksNearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.


LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.


Related:OpenAI launches web crawler "GPTBot" amid plans for next model: GPT-5


Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.


Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.


Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.


The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.


According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.Flowchart of AgentBench"s evaluation method. Source: Liu, et al


What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.


These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.


Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”


The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsHow to earn passive income with peer-to-peer lending11 ChatGPT prompts for maximum productivityThe absurd AI mania is coming to an endHow to use ChatGPT for project management7 game-changing uses of ChatGPTElon Musk’s new AI startup is as ambitious as it is doomed

News Feed

Chainlink’s CCIP protocol and Automation now live on Gnosis
Vince Quill8 hours agoChainlink’s CCIP protocol and Automation now live on GnosisGnosis developers can now outsource their heavy computing to the oracle network while reducing gas fees by up to 90%, spokespeople said.1
HSBC, SGX to Investigate if DLT Offers Efficiency Boost for Bond Markets
Can digitalizing bonds with distributed ledger technology (DLT) bring benefits to market participants? That’s a question being asked by HSBC Singapore in a new trial being de
Crypto Biz: Bitfarms gets new board after months of power struggles
Ana Paula Pereira3 hours agoCrypto Biz: Bitfarms gets new board after months of power strugglesBitfarms and Riot Platforms settle months of corporate disputes with board review, while BlackRock moves for changes to Bitco
Matrixport Founder Jihan Wu Believes Crypto Space Will Swell to ‘Tens of Trillions of Dollars’
Matrixport Founder Jihan Wu Believes Crypto Space Will Swell to "Tens of Trillions of Dollars" Seven months ago, during the first week of May, the digital currency entrepreneur Jih
Telegram Users Can Send and Receive Toncoin Within Messenger Chats
Telegram Users Can Send and Receive Toncoin Within Messenger Chats Telegram users can now send and receive toncoin directly within the application’s chats, according to a tw
Bitcoin halving not priced in to ‘full extent’ — D8X founder
Zoltan Vardai13 hours agoBitcoin halving not priced in to ‘full extent’ — D8X founderDespite a new Bitcoin all-time high, markets are still yet to price in the upcoming Bitcoin halving, Basile Maire told Cointelegr
Bitcoin Futures Data Shows Bullish Long/Short Ratio – Details
Reason to trust Strict editorial policy that focuses on accuracy, relevance, and impartiality Created by industry experts and meticulously reviewed The highest standards in reporting and pu
ECB President Lagarde Warns of ‘Major Disaster’ If US Defaults on Debt Obligations
ECB President Lagarde Warns of ‘Major Disaster’ If US Defaults on Debt Obligations There is a lot of discussion lately about the U.S. government’s debt ceiling and whethe
Japanese Finance Minister Shoots Down Plan to Cut Bitcoin Tax to 20%, Cites Crypto-Divide
Japanese Finance Minister Shoots Down Plan to Cut Bitcoin Tax to 20%, Cites Crypto-DivideJapan’s finance minister Taro Aso says he is opposed to reducing tax on bitcoin income
Goldman Sachs Predicts Ethereum Could Hit $8,000 This Year
Goldman Sachs Predicts Ethereum Could Hit $8,000 This Year Global investment bank Goldman Sachs is reportedly predicting that the price of ether could rise to $8,000 by year-end. T
Arijit Sarkar10 hours agoAI a powerful tool for devs to change gaming, says former Google gaming headRyan Wyatt deciphers the the possibilities for AI to help gamers and game developers achieve.523 Total views42 Total sh
Dapper Labs and Spanish Soccer League Laliga Launch Memorable Moments NFT Platform Laliga Golazos
Dapper Labs and Spanish Soccer League Laliga Launch Memorable Moments NFT Platform Laliga Golazos Laliga, the premier soccer league organization in Spain, has announced the launch