Fun

News Feed - 2023-08-09 05:08:00

Tristan Greene3 hours agoChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientistsThe scientists developed a tool called "AgentBench" to benchmark LLM models as agents.705 Total views37 Total sharesListen to article 0:00NewsJoin us on social networksNearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.


LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.


Related:OpenAI launches web crawler "GPTBot" amid plans for next model: GPT-5


Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.


Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.


Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.


The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.


According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.Flowchart of AgentBench"s evaluation method. Source: Liu, et al


What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.


These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.


Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”


The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsHow to earn passive income with peer-to-peer lending11 ChatGPT prompts for maximum productivityThe absurd AI mania is coming to an endHow to use ChatGPT for project management7 game-changing uses of ChatGPTElon Musk’s new AI startup is as ambitious as it is doomed

News Feed

The Kessler Collection Jumps Into Crypto, 8 Luxury Hotels Now Accept Digital Currencies
The Kessler Collection Jumps Into Crypto, 8 Luxury Hotels Now Accept Digital Currencies During the last few months, a great number of businesses have been adding
Cathie Wood’s ARK resumes Coinbase buying as BTC drops below $50K
Helen Partz1 hour agoCathie Wood’s ARK resumes Coinbase buying as BTC drops below $50KARK Invest is back to buying the Coinbase stock after a long selling period. On Aug. 5, ARK bagged 28,632 COIN shares for $5.4 milli
Crypto Braces For April 2 — The Most Crucial Day Of The Year
Reason to trust Strict editorial policy that focuses on accuracy, relevance, and impartiality Created by industry experts and meticulously reviewed The highest standards in reporting and pu
VulcanVerse Takes NFTs to Next Level With Amazing Graphics, Gameplay and Lore
VulcanVerse Takes NFTs to Next Level With Amazing Graphics, Gameplay and Lore sponsored The NFT industry has been exploding recently with all types of artists, sp
Huobi Japan Raises $4.6 Million From Tokyo-Listed Financial Services Firm
Huobi Japan has received almost 500 million yen ($4.6 million) in investment from a Tokyo-listed company involved in leasing, real estate, insurance brokerage and mergers and acquisitions.
Derek Andersen5 hours agoChinese, Indian investment professionals show strong support for CBDC in new surveyThe CFA Institute asked members around the world how they felt about CBDCs. Their responses might surprise the c
Crypto-Related Lawsuits Rising in Russia, Criminal Cases Increase by 40%
Crypto-Related Lawsuits Rising in Russia, Criminal Cases Increase by 40% Courts in Russia are hearing a growing number of cases around crypto assets, a new study has shown. About t
Marcel Pechman2 hours agoArbitrum (ARB) falls to all-time low as network usage metrics declineARB’s price slumps to a new low as a decline in TVL, a decline in active addresses engaging with its DApps and a general mal
Bitcoin Difficulty Set to Rise 3.82% to All-Time High of 39 Trillion Following Recent Increase
Bitcoin Difficulty Set to Rise 3.82% to All-Time High of 39 Trillion Following Recent Increase The Bitcoin network is set to record another meaningful difficulty increase on Sunday
Derek Andersen6 hours agoNew Zealand dollar stablecoin goes live through local crypto exchangeEasy Crypto launched the aptly named New Zealand Dollar Stablecoin (NZDD) in partnership with an Australian blockchain develop
David Bowie Estate to Drop ‘Bowie on the Blockchain’ NFTs, Sale Receives Backlash From Fans
David Bowie Estate to Drop "Bowie on the Blockchain" NFTs, Sale Receives Backlash From Fans The David Bowie estate recently announced it is launching a series of non-fungible token
What is Kelly criterion betting, and how to use it in crypto trading?
Onkar Singh9 hours agoWhat is Kelly criterion betting, and how to use it in crypto trading?Discover how the Kelly criterion betting strategy can enhance your crypto trading performance by optimizing risk management and m