Loading...
Discovering amazing AI tools

Generates realistic, lip-synchronized talking videos from a single photo and audio with natural motion and consistent identity.
Generates realistic, lip-synchronized talking videos from a single photo and audio with natural motion and consistent identity.
LongCat Avatar (part of the LongCat-Video family) produces audio-driven, realistic talking-head and character videos from a reference photo or image and an audio track. It emphasizes near-perfect lip synchronization, natural head/face motion, and identity consistency across frames. The model supports single- and multi-character scenarios and native tasks including Audio-Text-to-Video, Audio-Image-to-Video and Video Continuation. The project provides downloadable model weights and demo scripts (hosted on GitHub and Hugging Face) for local or cloud execution and includes optimizations such as configurable attention backends (FlashAttention variants) to improve runtime performance.