Berkeley AI Hackathon - Top 15

Berkeley AI Hackathon - Top 15

Project type

Date

Property

Inspiration

There are content creators, lecturer, educators in non-English speaking countries who makes great videos. What if they want to reach more audience? Some use English CC, but not everyone likes to read CC. Some even remake videos in English, which costs too much.

Now with this service, they can generate localized video with their own voice and lip movements to bring the audience a smooth enjoying experience.

What it does

Accurate translation

Dubbed with YouTuber's own voice, not a random AI voice

Lip sync to make it looks natural

How we built it

For the model:

Speech to text and translation: OpenAI Whisper + OpenAI ChatGPT

Voice clone: CoquiAI

Lip Syncing: Wav2lip

For the API service:

API provided with FastAPI

API and model services are virtualized with Docker

Deployed on Lambda GPU servers.

Challenges we ran into

Customizing models and open source project for this specific use.

Virtualize the service with Docker. So many environmental issues.

Accomplishments that we're proud of

The video outcome just already looks good without any fine-tuning.

What's next for RecordOnce,WatchEverywhere

It's just a MVP, need fine-tuning

Make an UI, then market it to mass users

Explore possible uses for education or entertainment: Movies, conference recordings, lectures...

Project page

RecordOnce,WatchAnywhere

Make your videos look like you are speaking natively in all languages.

RecordOnce,WatchAnywhere

https://devpost.com/software/youcanspeakalllangs

RecordOnce,WatchAnywhere

Github

BerkeleyAIHackathon

iamGeoWat • Updated Mar 13, 2024