Skip to content

Latest commit

 

History

History
88 lines (60 loc) · 2.93 KB

README.md

File metadata and controls

88 lines (60 loc) · 2.93 KB

Video Translator

Repository with helper functions for translating videos from one language to another.

Providing a stramlit UI with a web toolkit for downloading, translating, transcribing, generating video.

Created to disseminate educational resources on bitcoin.

## Screenshot

ScreenShot

Run

Using python environment (Docker in future)

$ pip install -r requirements.txt
$ streamlit run start_streamlit.py

Possible Outputs

  • "original_video: .mp4 -> Download video from url (youtube converter) or other libraries,
  • "original_captions_text": str -> Download caption from url or use Speech2Text AI model,
  • "original_audio": .mp3 -> Download from url or other libraries,
  • "translated_video_captions: .mp4" -> Video file with translated captions overwritten,
  • "translated_video: .mp4" -> Video file with new generated audio translated from original, "translated_captions: str" -> New translated captions "translated_audio: mp3", -> Translate audio using S2T -> Transltor -> T2S }

Work in Progress -

Features

  • Extract original captions from a video and save them in JSON or txt (full-text).
  • Download a video from Youtube and save it in mp4 format.
  • Translate captions from one language to another

Installation

An installation of Poetry is required. Poetry is a tool for dependency management and packaging in Python. Take a look to installations.

For OSx / linux / bashonwindows install:

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

Dependencies used:

Install dependencies with:

$ poetry install

Run the app

Modify the config.json file to set the correct values for the following parameters:

  • videos: Youtube video IDs and video names
  • formatter: Formatter for the captions.
    • json (with video timeslots reference)
    • txt (full-text)
  • output_folder: Folder where the captions or videos will be saved.
  • language_from: Language of the captions.
  • language_from: Translation language.

Run script with:

$ cd videotranslate
$ poetry run python main.py

TODO

  • Use pre-trained AI model insted of youtube_transcript_api to translate video captions. Options: local environment (CPU) / cloud-services API, P2P GPU power.

  • Try to generate a speech for translated captions using Text2Speech models. References: DeepSpeech-Italian-Model, Woord, TTS

  • Adapt generated speech to original video timeslots