Loading...
Discovering amazing AI tools


Real-time speech-to-speech translation system that streams translated audio while preserving speaker voice characteristics and prosody.

Real-time speech-to-speech translation system that streams translated audio while preserving speaker voice characteristics and prosody.
Google's real-time speech-to-speech translation is a research-backed system that converts spoken input in one language into natural-sounding spoken output in another language with low latency. The pipeline combines speech recognition, translation, and a custom text-to-speech generation engine to produce translated audio that preserves speaker voice characteristics and prosody. Google Research has developed direct end-to-end models (Translatotron 2) and unsupervised approaches (Translatotron 3) that enable robust S2S translation trained from paired or monolingual data. The technology is designed for streaming, on-device and product integration, and has been demonstrated in live translation experiences (e.g., headphone live translation beta) and integrated with Google’s broader speech and TTS capabilities.





