Fun

News Feed - 2023-08-09 05:08:00

Tristan Greene3 hours agoChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientistsThe scientists developed a tool called "AgentBench" to benchmark LLM models as agents.705 Total views37 Total sharesListen to article 0:00NewsJoin us on social networksNearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.


LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.


Related:OpenAI launches web crawler "GPTBot" amid plans for next model: GPT-5


Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.


Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.


Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.


The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.


According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.Flowchart of AgentBench"s evaluation method. Source: Liu, et al


What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.


These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.


Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”


The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsHow to earn passive income with peer-to-peer lending11 ChatGPT prompts for maximum productivityThe absurd AI mania is coming to an endHow to use ChatGPT for project management7 game-changing uses of ChatGPTElon Musk’s new AI startup is as ambitious as it is doomed

News Feed

Bitcoin, Ethereum Technical Analysis: ETH, BTC Remain Lower as Commodity Prices Hit New Highs
Bitcoin, Ethereum Technical Analysis: ETH, BTC Remain Lower as Commodity Prices Hit New Highs Cryptocurrencies were once again trading lower on Tuesday, as markets were mainly focu
DAO Maker Gears Up to Release Maradona D10S NFT, This April 2022
DAO Maker Gears Up to Release Maradona D10S NFT, This April 2022 press release PRESS RELEASE.GREENSBORO, April 11, 2022 — DAO Maker, a leading launchpad for cryptocurrency pr
ARK sells $52M of Coinbase shares as stock price breaks above $270
Helen Partz14 hours agoARK sells $52M of Coinbase shares as stock price breaks above $270Cathie Wood’s ARK Invest continues taking profits from its Coinbase stash as the stock is hitting multiyear highs.1670 Total view
Technical Analysis Puts XRP Price Above $5 In Next 3 Days, Whales Buy $288 Worth Of XRP
Este artículo también está disponible en español. The XRP price is consolidatingafter a crazy 460% surge within four weeks that saw it breaking above$2.8 on December 3 fo
Brayden Lindrea4 hours agoAI chatbots are illegally ripping off copyrighted news, says media groupAI developers are taking revenue, data and users away from news publications by building competing products, the News Medi
Rakesh Upadhyay6 hours agoPrice analysis 8/28: SPX, DXY, BTC, ETH, BNB, XRP, ADA, DOGE, SOL, DOTThe S&P 500 is attempting a recovery, but Bitcoin and select altcoins are struggling to break above their respective res
Solana struggles to recapture $200, but DApp and derivatives markets remain bullish
Marcel Pechman3 hours agoSolana struggles to recapture $200, but DApp and derivatives markets remain bullishSOL price struggles to rally above $200, but on-chain and derivatives metrics point to a healthy market.1142 Tot
Bitcoin’s $10k Value Pushed Down by CME Futures Price Gap
Bitcoin"s $10k Value Pushed Down by CME Futures Price Gap Just recently BTC prices surpassed the $10,000 zone and held above that region for around 24 hours. Some speculators bel
Marcel Pechman4 hours ago3 reasons why Ether price is still pinned below $1,900PayPal’s stablecoin announcement and a handful of Ether ETF applications are bulls’ biggest hopes for a price trend reversal.1729 Total v
Snowden Puzzled by Bitcoin’s Lack of Scaling and Privacy, Says Devs ‘Had Years to Do It’
Snowden Puzzled by Bitcoin"s Lack of Scaling and Privacy, Says Devs "Had Years to Do It" Just recently, the film producer and well known Youtuber, Naomi Brockwel
Derek Andersen6 hours agoSen. Warren, 100+ legislators write White House, Treasury about crypto and terrorismThe letter was signed by crypto opponents and many lawmakers who had been neutral on crypto until now.1007 Tota
How to securely store crypto in software wallets
Dilip Kumar Patairya14 hours agoHow to securely store crypto in software walletsSecuring your cryptocurrency storage starts with selecting the appropriate software wallet. Using multisig wallets, 2FA and password manager