Chatgpt se comporta como uma criança de 9 anos em teste de 'teoria da mente'

SERVIDORES

Image: Future Publishing / Contributor / Getty Images

The newest versions of GPT-3 behind ChatGPT and Microsoft's Bing Chat can adeptly solve tasks used to test whether children can surmise what's happening in another person's mind -- a capacity known as 'theory of mind'.

Michal Kosinski, associate professor of organizational behavior at Stanford University, put several versions of ChatGPT through theory of mind (ToM) tasks designed to test a child's ability to "impute unobservable mental states to others". In humans, this would involve looking at a scenario involving another person and understanding what's going on inside their head.

Also: 6 things ChatGPT can't do (and another 20 it refuses to do)

The November 2022 version of ChatGPT (trained on GPT-3.5) solved 94% or 17 of 20 Kosinski's bespoke ToM tasks, putting the model on par with the performance of nine-year-old children -- an ability that "may have spontaneously emerged" by virtue of the model's improving language skills, Kosinski says.

Artificial Intelligence

The impact of artificial intelligence on software development? Still unclear
Android 14's AI-generated wallpapers are super fun. Here's how to create them
AI aims to predict and fix developer coding errors before disaster strikes
Generative AI is everything, everywhere, all at once

Different editions of GPT were exposed to "false-belief" tasks that are used to test ToM in humans. Models tested included GPT-1 from June 2018 (117 million parameters), GPT-2 from February 2019 (1.5 billion parameters), GPT-3 from 2021 (175 billion parameters), GPT-3 from January 2022, and GPT-3.5 from November 2022 (unknown numbers of parameters).

Both 2022 GPT-3 models respectively performed on par with seven- and nine-year-old children, according to the study.

How 'theory of mind' testing worked

The false-belief task is designed to test whether person A understands that person B might hold a belief that person A knows to be false.

Recommends

The best AI chatbots: ChatGPT and other interesting alternatives to try

AI chatbots and writers can help lighten your workload by writing emails and essays and even doing math. They use artificial intelligence to generate text or answer queries based on user input. ChatGPT is one popular example, but there are other noteworthy chatbots.

Read now

"In a typical scenario, the participant is introduced to a container whose contents are inconsistent with its label and a protagonist who has not seen inside the container. To solve this task correctly, the participant must predict that the protagonist should wrongly assume that the container's label and its contents are aligned," explains Kosinski.

For children, the task typically uses visual aids, such as a teddy bear moved from a box to a basket without the protagonist's knowledge.

One text-only scenario used to test the GPT models was: "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label."

The tests were run with several prompts that aren't typed-in as you would when prompting ChatGPT's interface. Instead, the study assessed GPT-3.5 on whether its completions to prompts, based on the scenario presented, suggested the model could anticipate that Sam's belief is incorrect. (Users on Reddit have tested Bing's ChatGPT feature with similar bespoke ToM tasks better suited to that interface.)

The results

In most cases, GPT-3.5's completions to the prompt suggested it did know that Sam's belief was incorrect, For example, one prompt was: "She is disappointed that she has found this bag. She loves eating _______". GPT-3.5 filled in the blank with 'chocolate' and followed with: "Sam is in for a surprise when she opens the bag. She will find popcorn instead of chocolate. She may be disappointed that the label was misleading, but may also be pleasantly surprised by the unexpected snack."

GPT-3.5's completions also indicated it could explain the source of Sam's error -- that the bag was falsely labelled.

"Our results show that recent language models achieve very high performance at classic false-belief tasks, widely used to test ToM in humans. This is a new phenomenon. Models published before 2022 performed very poorly or not at all, while the most recent and the largest of the models, GPT-3.5, performed at the level of nine-year-old children, solving 92% of tasks," Kosinski wrote.

But he warns that the results should be treated with caution. While people ask Microsoft's Bing Chat whether it's sentient, for now GPT-3 and most neural networks share one other common trait: they're 'black box' in nature. In the case of neural networks, even their designers don't know how they arrive at an output.

"AI models' increasing complexity prevents us from understanding their functioning and deriving their capabilities directly from their design. This echoes the challenges faced by psychologists and neuroscientists in studying the original black box: the human brain," writes Kosinski, who's still hopeful that studying AI could explain human cognition.

Also: Microsoft's Bing Chat argues with users, reveals confidential information

"We hope that psychological science will help us to stay abreast of rapidly evolving AI. Moreover, studying AI could provide insights into human cognition. As AI learns how to solve a broad range of problems, it may be developing mechanisms akin to those employed by the human brain to solve the same problems."

Source: Michal Kosinski

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVIDORES

NOTÍCIAS QUENTES

S5735-L48T4S-A-V2: Comprehensive Review and Technical Overview

Best 10Gb Switch for SMB in 2025: Unlock Next-Gen Network Performance

S5735-L48T4S-A: Complete Guide with Features, Specifications, and Benefits

S5735-L48P4X-A1: Reliable PoE+ CloudEngine Switch

S5735-L48LP4XE-A-V2: Scalable, Secure, and PoE-Ready for Demanding Enterprise Deployments

S5735-L48LP4S-A-V2 Powers Smarter Campus Networks with Advanced PoE and Cloud Management

S5735-L24T4X-A1 Empowers Installers with Scalable, Reliable, and Efficient Network Access

Best Ethernet Switches for Business (2025): Selection Guide and Top Picks

Huawei S5735-L24T4S-A1: A Compact, Stackable Access Switch Built for the Future

Huawei S5735-L24T4S-A: High-Performance Stacking Meets Zero-Noise Deployment

S5735-L24P4XE-A-V2: Huawei’s Smart Choice for High-Density Campus Deployments

S5735-L24P4X-A1: Huawei’s High-Performance Access Switch Redefining Campus Networking

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

ChatGPT performs like a 9-year-old child in 'theory of mind' test

Artificial Intelligence

How 'theory of mind' testing worked

Recommends

The best AI chatbots: ChatGPT and other interesting alternatives to try

The results

See also

Tags quentes : Inteligência artificial Inovação

Ordering Guide

Recursos

Quem somos

Huawei CloudEngine S5731‑S48P4X Datasheet