Fun

News Feed - 2023-08-09 05:08:00

Tristan Greene3 hours agoChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientistsThe scientists developed a tool called "AgentBench" to benchmark LLM models as agents.705 Total views37 Total sharesListen to article 0:00NewsJoin us on social networksNearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.


LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.


Related:OpenAI launches web crawler "GPTBot" amid plans for next model: GPT-5


Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.


Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.


Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.


The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.


According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.Flowchart of AgentBench"s evaluation method. Source: Liu, et al


What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.


These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.


Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”


The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsHow to earn passive income with peer-to-peer lending11 ChatGPT prompts for maximum productivityThe absurd AI mania is coming to an endHow to use ChatGPT for project management7 game-changing uses of ChatGPTElon Musk’s new AI startup is as ambitious as it is doomed

News Feed

Brazilian Asset Manager Kinea Makes Exploratory Investment in Ethereum
Brazilian Asset Manager Kinea Makes Exploratory Investment in Ethereum One of the biggest asset managers in Brazil, Kinea, disclosed it made an exploratory investment in Ethereum.
Bitcoin, Ethereum Technical Analysis: ETH, BTC Surge Over 10% as Big 2 Lead Crypto Rebound
Bitcoin, Ethereum Technical Analysis: ETH, BTC Surge Over 10% as Big 2 Lead Crypto Rebound As LUNA’s life support was all but switched off on Friday, BTC and ETH rallied, wi
Intuit lays off 10% of staff to focus on AI
Tristan Greene3 hours agoIntuit lays off 10% of staff to focus on AIAbout 1,800 people were laid off, but the company plans on hiring about the same number of replacements.625 Total views4 Total sharesListen to article 0
Derivatives Exchange Injective Pro Launches Bored Ape NFT Floor Price Perpetuals
Derivatives Exchange Injective Pro Launches Bored Ape NFT Floor Price Perpetuals During the last year, the Bored Ape Yacht Club (BAYC) non-fungible token (NFT) collection has becom
Analysts come unglued ahead of Nvidia earnings call — $10T within 5 years
Tristan Greene5 hours agoAnalysts come unglued ahead of Nvidia earnings call — $10T within 5 yearsThe Sept. 28 earnings report is being called the most important event of the year for the stock market.4049 Total views4
Bitcoin gears up for a ‘massive’ short squeeze, price could go ‘vertical’
Ciaran Lyons6 hours agoBitcoin gears up for a ‘massive’ short squeeze, price could go ‘vertical’Swyftx lead analyst Pav Hundal says this isn’t a “classic bulls versus bear battle,” while Swan Bitcoin CEO Co
Peter Schiff Says Bitcoin Still Has a Long Way to Fall — Values BTC at $10K
Peter Schiff Says Bitcoin Still Has a Long Way to Fall — Values BTC at $10K Economist and gold bug Peter Schiff says bitcoin still has a long way to fall after the collapse of cr
CoinDCX introduces protection fund to secure Indian crypto users
Josh O"Sullivan30 minutes agoCoinDCX introduces protection fund to secure Indian crypto usersCoinDCX"s Crypto Investors Protection Fund will enhance user security by allocating 2% of brokerage income annually to safeguar
Biden family memecoins tank 60% after Biden exits presidential race
Brayden Lindrea8 hours agoBiden family memecoins tank 60% after Biden exits presidential raceMemecoins linked to Joe, Jill, and Hunter Biden tanked more than 60% while a memecoin tied to US Vice President Kamala Harris r
Tom Mitchelhill9 hours agoThe Metaverse is real: Zuck’s ‘incredible’ photorealistic tech wows crypto TwitterOften roasted for his metaverse tech demos, Zuckerberg appears to have blown away internet users with his
Amaka Nwaokocha15 minutes agoUK to target potential AI threats at planned November summitThe summit scheduled for Nov. 1-2 will place significant emphasis on the potential existential threat that AI represents, a concern
London Crowned World’s Leading Cryptocurrency Hub, According to Study
London Crowned World"s Leading Cryptocurrency Hub, According to Study According to research from Recap, a cryptocurrency tax software company, London has become the world’s