I am still under the process of cleaning and re-structuring the codebase
https://www.youtube.com/watch?v=iSb1HJXRO04
- The story was taken from internet.
- A 4 line summary of the story was generated using ChatGPT
- For each summary line, a corresponding image was generated using Stable Diffusion
- In each image (4 in our case), the important object were masked using MetaAI SAM
- I have developed a custom image to white-board animation code which converted the images to videos (This was the most time-consuming part of the development process).
- The audio was generated using gTTS (Google Text-to-Speech)
- The sub-titles were generated with the help of OpenAI whisper
- All the different audios, videos and subtitles were somehow synchronized using FFMPEG and OpenCV (This part gave a lot of pain🥲... ALL HAIL FFMPEG🙌)
- text-processing:
- story.txt: The story has been taken from internet
- summary.txt: The 4 line summary was generated using chatGPT (for the story present in story.txt)
- generate-whiteboard-animated-videos:
- Convert any image to white-board animations video
- If object masks are provided in JSON formats, they will be automatically considered while drawing
- Code File: draw-whiteboard-animations.py
- tts (text to speech):
- The story is devided into 4 parts in correspondance to the summary. For each story part, google's text-2-speec used to generate 4 mp3 files.
- The whiteboard animation is refined by drawing one object at a time.
- These object masks are drawn/created using Meta-AI SAM assisted tagging tool named: Anylabeling
- Anylabeling Tool: https://github.com/vietanhdev/anylabeling
- Post tagging, the tool returns a JSON output which can be used for sketching video generation
Please NOTE:
- The videos can also be generated using without the JSON masks