SERVIDORES

oxygen/Getty Images

Meta, owner of Facebook, Instagram, and WhatsApp, on Tuesday unveiled its latest effort in machine translation, this one geared toward speech translation.

The program, SeamlessM4T, surpasses existing models that are trained specifically for speech-to-speech translation between languages, as well as models that convert between speech and text in multiple language pairs. Hence, SeamlessM4T is an example not just of generality but of what is called multi-modality -- the ability for one program to operate on multiple data types, in this case, both speech and text data.

Also: Meta to release open-source commercial AI model to compete with OpenAI and Google

Previously, Meta has focused on large language models that can translate text between 200 different languages. That focus on text is a problem, say lead author Lo?c Barrault and colleagues at both Meta and UC California at Berkeley.

"While single, unimodal models such as No Language Left Behind (NLLB) push text-to-text translation (T2TT) coverage to more than 200 languages, unified S2ST [speech-to-speech-to-text] models are far from achieving similar scope or performance," write Barrault and team.

The formal paper, "SeamlessM4T -- Massively Multilingual & Multimodal Machine Translation," is posted on Meta's dedicated site for the overall project, Seamless Communication. There is also a companion GitHub site.

Speech has been left behind partly because less speech data is readily available in the public domain to train neural networks, write the authors. But there's a deeper point: Speech data is fundamentally richer as a signal for neural networks.

"The very challenge around why speech is harder to tackle from a machine translation standpoint -- that it encodes more information and expressive components -- is also why it is superior at conveying intent and forging stronger social bonds between interlocutors," they write.

The goal of SeamlessM4T is to create one program that is trained on both speech data and text data at the same time. The "M4T" stands for "Massively Multilingual & Multimodal Machine Translation." Multi-modality is an explicit part of the program.

Also: Meta's latest AI model will make content available in hundreds of languages

Such a program is sometimes referred to as an "end-to-end" program because it doesn't break up the parts that are about text and the parts that are about speech into separate functions, as in the case of "cascaded models," where the program first is trained on one thing, such as speech to text, and then another thing, such as speech to speech.

As the program's authors put it, "most S2ST [speech-to-speech translation] systems today rely heavily on cascaded systems composed of multiple subsystems that perform translation progressively -- e.g., from automatic speech recognition (ASR) to T2TT [text-to-text translation], and subsequently text-to-speech (TTS) synthesis in a 3-stage system."

Instead, the authors built a program that combines multiple existing parts trained together. They included "SeamlessM4T-NLLB a massively multilingual T2TT model," plus a program called w2v-BERT 2.0, "a speech representation learning model that leverages unlabeled speech audio data," plus T2U, "a text-to-unit sequence-to-sequence model," and multilingual HiFi-GAN, a "unit vocoder for synthesizing speech from units."

Also: Meta's 'data2vec' is a step toward One Neural Network to Rule Them All

All four components are plugged together like a Lego set into a single program, also introduced this year by Meta, called UnitY, which can be described as "a two-pass modeling framework that first generates text and subsequently predicts discrete acoustic units."

The whole organization is visible in the diagram below.

The authors built a program that combines multiple existing parts trained together, all of which are plugged together like a Lego set in a single program.

Meta AI Research 2023

The program manages to do better than multiple other kinds of programs on tests of speech recognition, speech translation, and speech-to-text, the authors report. That includes beating both taint programs that are also end-to-end, as well as programs designed for speech explicitly:

We find that SeamlessM4T-Large, the larger model of the two we release, outper- forms the previous state-of-the-art (SOTA) end-to-end S2TT model (AudioPaLM-2-8B- AST [Rubenstein et al., 2023]) by 4.2 BLEU points on Fleurs [Conneau et al., 2022] when translating into English (i.e., an improvement of 20%). Compared to cascaded mod- els, SeamlessM4T-Large improves translation accuracy by over 2 BLEU points. When translating from English, SeamlessM4T-Large improves on the previous SOTA (XLS- R-2B-S2T [Babu et al., 2022]) by 2.8 BLEU points on CoVoST 2 [Wang et al., 2021c], and its performance is on par with cascaded systems on Fleurs. On the S2ST task, SeamlessM4T-Large outperforms strong 3-stage cascaded models (ASR, T2TT and TTS) by 2.6 ASR-BLEU points on Fleurs. On CVSS, SeamlessM4T-Large outperforms a 2-stage cascaded model (Whisper-Large-v2 + YourTTS [Casanova et al., 2022]) by a large margin of 8.5 ASR-BLEU points (a 50% improvement). Preliminary human evalua- tions of S2TT outputs evinced similarly impressive results. For translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5); for into English directions, we see significant improvement over Whisper-Large-v2's baseline for 7 out of 24 languages.

Also: Google's 'translation glasses' were actually at I/O 2023, and right in front of our eyes

The companion GitHub site offers not just the program code but also SONAR, a new technology for "embedding" multi-modal data, and BLASAR 2.0, a new version of a metric by which to automatically evaluate multi-modal tasks.

Artificial Intelligence

Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advancesChatGPT's new web browsing feature is a big disappointment. Use this plugin insteadWhat is Amazon Bedrock? 4 ways it can help businesses use generative AI toolsCan generative AI solve computer science's greatest unsolved problem?

Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advances
ChatGPT's new web browsing feature is a big disappointment. Use this plugin instead
What is Amazon Bedrock? 4 ways it can help businesses use generative AI tools
Can generative AI solve computer science's greatest unsolved problem?

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVIDORES

NOTÍCIAS QUENTES

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

Meta unveils 'Seamless' speech-to-speech translator

Artificial Intelligence

Tags quentes : Inteligência artificial Inovação

Ordering Guide

Recursos

Quem somos

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVIDORES

NOTÍCIAS QUENTES

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

Meta unveils 'Seamless' speech-to-speech translator

Artificial Intelligence

Tags quentes : Inteligência artificial Inovação

Ordering Guide

Recursos

Quem somos

Huawei CloudEngine S5731‑S48P4X Datasheet