1.项目介绍:帮助视频创作者将中文视频快速翻译为多语言视频物料。
2.技术方向:ASR、NLP、TTS
translate video from zh to en
The pipeline for the video translation task includes the following steps:
First, extract the audio from the video, a process that utilizes ffmpeg.
Use spleeter to separate the human voice from the audio, *I think this will improve the accuracy of downstream ASR.
Employ the Whisper encoder-decoder model for ASR voice recognition and generate an SRT subtitle file, *in the example, the "base" model is used.
Translate the SRT file, *using the Helsinki-NLP/opus-mt-zh-en model for Chinese to English translation processing.
After translation, use speecht5_tts for voice generation.
Finally, merge the results from the upstream processing.
Main purpose: To demonstrate the end-to-end process of a video translation task. Optimization space: