Fun

News Feed - 2023-08-09 05:08:00

Tristan Greene3 hours agoChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientistsThe scientists developed a tool called "AgentBench" to benchmark LLM models as agents.705 Total views37 Total sharesListen to article 0:00NewsJoin us on social networksNearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.


LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.


Related:OpenAI launches web crawler "GPTBot" amid plans for next model: GPT-5


Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.


Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.


Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.


The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.


According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.Flowchart of AgentBench"s evaluation method. Source: Liu, et al


What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.


These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.


Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”


The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”# AI# Machine Learning# ChatGPTAdd reactionAdd reactionRelated NewsHow to earn passive income with peer-to-peer lending11 ChatGPT prompts for maximum productivityThe absurd AI mania is coming to an endHow to use ChatGPT for project management7 game-changing uses of ChatGPTElon Musk’s new AI startup is as ambitious as it is doomed

News Feed

FSB will standardize global incident reporting for institutions with crypto
Derek Andersen4 hours agoFSB will standardize global incident reporting for institutions with cryptoThe proposed reporting format is part of the international agency’s support for the G20 crypto asset roadmap adopted l
Jesse Coghlan8 hours agoBinance.US cuts third of staff as CEO Brian Shroder leavesThe staff cut and departure comes amid legal action from United States regulators.2815 Total views28 Total sharesListen to article 0:00New
Localethereum Becomes Localcryptos and Adds BTC Trading
Localethereum Becomes Localcryptos and Adds BTC Trading P2P trading site Localethereum has rebranded to Localcryptos, allowing users to buy and sell both ETH and BTC without KYC.
Bitcoin, Ethereum Technical Analysis: ETH Moves Above $1,800 as BTC Nears $28,000
Bitcoin, Ethereum Technical Analysis: ETH Moves Above $1,800 as BTC Nears $28,000 Ethereum rose above $1,800 for the first time since August, as markets continued to react to the F
Judge finds Ripple Labs liable for $125M penalty in SEC case
Turner Wright3 hours agoJudge finds Ripple Labs liable for $125M penalty in SEC caseRipple’s civil case with the SEC has been ongoing since December 2020, when the regulator alleged the blockchain firm used XRP as an u
US Bitcoin ETFs ‘on track’ to top Satoshi’s BTC stack soon
Jesse Coghlan7 hours agoUS Bitcoin ETFs ‘on track’ to top Satoshi’s BTC stack soonUnited States Bitcoin ETFs have added around 37,510 BTC to their holdings each month on average and could soon surpass Satoshi Nakam
BlackRock tokenized treasury fund BUIDL reaches $500M
Brayden Lindrea7 hours agoBlackRock tokenized treasury fund BUIDL reaches $500MBlackRock has hit the milestone less than four months after the launch of BUIDL in April.2758 Total views5 Total sharesListen to article 0:00
Martin Young3 hours agoCelsius seeks court approval to start repaying customers by year-endThe embattled crypto lender is seeking final court approval for a restructuring plan that will start repaying creditors before th
Cboe seeks SEC approval to mix mutual funds with ETFs
Ana Paula Pereira4 hours agoCboe seeks SEC approval to mix mutual funds with ETFsThe exchange has petitioned the Securities and Exchange Commission to approve a broad multi-share class structure.406 Total views5 Total sh
Polygon launches upgraded ZK proving system Plonky3
Josh O"Sullivan11 hours agoPolygon launches upgraded ZK proving system Plonky3Polygon Labs unveils Plonky3, enhancing Ethereum scalability with improved zero-knowledge proof configurations and modular flexibility for dev
Blockchain gaming firm Animoca Brands ‘comes to Bitcoin’
Helen Partz12 hours agoBlockchain gaming firm Animoca Brands ‘comes to Bitcoin’Animoca’s move into Bitcoin involves the Opal Protocol and BLIF token, which were initiated by Animoca and Darewise.3330 Total views10
Stacks (STX) price outperforms the market as interest in layer-2 Bitcoin grows
Nancy Lubale5 hours agoStacks (STX) price outperforms the market as interest in layer-2 Bitcoin growsSTX emerges as a top performer as market participants’ interest in layer-2 Bitcoin continues to grow.542 Total views2