Inspiration
There are content creators, lecturer, educators in non-English speaking countries who makes great videos. What if they want to reach more audience? Some use English CC, but not everyone likes to read CC. Some even remake videos in English, which costs too much.
Now with this service, they can generate localized video with their own voice and lip movements to bring the audience a smooth enjoying experience.
What it does
- Accurate translation
- Dubbed with YouTuber's own voice, not a random AI voice
- Lip sync to make it looks natural
How we built it
For the model:
- Speech to text and translation: OpenAI Whisper + OpenAI ChatGPT
- Voice clone: CoquiAI
- Lip Syncing: Wav2lip
For the API service:
- API provided with FastAPI
- API and model services are virtualized with Docker
- Deployed on Lambda GPU servers.
Challenges we ran into
- Customizing models and open source project for this specific use.
- Virtualize the service with Docker. So many environmental issues.
Accomplishments that we're proud of
The video outcome just already looks good without any fine-tuning.
What's next for RecordOnce,WatchEverywhere
- It's just a MVP, need fine-tuning
- Make an UI, then market it to mass users
- Explore possible uses for education or entertainment: Movies, conference recordings, lectures...
Project page
Github
BerkeleyAIHackathon
iamGeoWat • Updated Mar 13, 2024