[Notes] OpenAi Whisper for Video Subtitles

on April 6, 2023 IN AI Bot Chat

Google (NWO)

Updated:3 years ago
Reading Time:10Minutes
Post Words:2558Words

This is just my notes that I’m taking for my own use while I was trying to figure out OpenAi Whisper to:

Automatically transcribe videos that have no subtitles
Works well, even if the speakers have accents or speak fast or hard to hear
Translate videos from other languages to English

See this post for a less-cluttered list of useful tools

OpenAi Whisper via Google Drive > Colaboratory

April 5th, 2023 – I just found this tool from a YouTube video and I just followed his instructions and immediately transcribed a podcast while I was watching (confirming his instructions are accurate).

Easily transcribe videos (particularly from sites that have no subtitle files: Telegram/Rumble/Brighteon/BitChute and other censored platforms).
It can also help us automatically translate and transcribe all those awesome non-English truth videos, which opens up the world.

This guy explains exactly step-by-step how to use Google Drive (Colaboratory) to install and use Whisper AI right from your browser without anything to download.

Keep in mind:

) Download the text straight away (it times out and deletes your files after a certain timeframe).
- So don’t load up and leave the house, just download the text/subtitles straight away.
- If you time-out, you have to reload the !Whisper code (30seconds) and re-upload your file.
) Google can decide at any moment that it isn’t going to allow you to use GPU (without payment); thereby, alternatives to Google Colab will be listed as I come across them below.

Links/notes for my own use to find again.

https://drive.google.com/drive/my-drive New > More > Google Colaboratory
- or https://colab.research.google.com/#create=true
Rename Untitled.ipynb to TranscribeAudio.ipynb
Runtime > Change runtime type > GPU
Install Whisper AI:

!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg

Upload audio (drag n drop)
Run Whisper AI: in Medium or Large. (testing the file, medium took about 30 seconds for my 1 hr podcast, and large took 13minutes and 9 seconds, scroll down to view the accuracy differences to see whether it’s worth the extra time).

!whisper "FileName.mp3" --model medium

Download text or subtitles (files will appear to the left when it’s finished transcribing)
Additional Whisper AI arguments to learn about: (useful for people who want to transcribe the video into different languages!!)

!whisper -help

My everyday “go-to” code command that I will use to create English text files:

!whisper "Marvin.mp3" --model medium.en --output_format txt --task transcribe --beam_size 5 --patience 13 --threads 4 --length_penalty 0

Click to expand

Show less

Testing Accuracy using different settings & models

Two speakers with accents, 60 minute mp3

Results of test with two speakers (with accents) from this 1-hr podcast episode (Marvin vs Virology: COVID Taken To Court). Female NZ speaker (Dr. Sam Bailey) and male German speaker (Marvin Haberland).

Medium Setting (took 7 minutes to transcribe) example:

His case should require the virologists to provide evidence that they followed the scientific method when they claimed that SARS-CoV-2 exists. … so you studied engineering in Germany and you got a Fulbright scholarship. Is that right to bootclears? Yeah, exactly. So I did my undergraduate studies in Hamburg, Germany, …

Large Setting (took 13 minutes to transcribe) example:

His case should require the virologists to provide evidence that they followed the scientific method when they claimed that SARS-CoV-2 exists. … so you studied engineering in Germany and you got a Fulbright scholarship. Is that right? To Berkeley? Yeah, exactly. So I did my undergraduate studies in Hamburg, Germany, …

Considering both speakers have an accent, it picked up their conversation extremely well, with only tiny mistakes, so I think the medium setting will be fine for most English-speaking audio (even if the speakers have accents), and just a “read through” to pickup on small mistakes will be the best time-saving setting to stick to.

I just did another test using the fastest setting, and specific parameters, and it completed the 60min file in 2mins 21 secs, and it seems that specifying “–length_penalty 0” did the whole 60min podcast (whereas it seemed to have only outputted only the first 29mins in the previous tests)

!whisper "Marvin.mp3" --model tiny --output_format txt --task transcribe --language en --length_penalty 0

Quickly scrolling through, it seems the only errors made, even on the quickest setting, was Dority Institute instead of Doherty Institute. It spelt his name Harbourland instead of Haberland. It made the same error “boot clears” instead of Berkeley.

Testing the fastest model “Tiny.en”:

!whisper "Marvin.mp3" --model tiny.en --output_format txt --task transcribe --length_penalty 0

This took 2m 25s to transcribe the one hour audio file

Errors:

…since Corona started the Corona Fucking German telegram channel also published
- (woah, oops! lol)
…since Corona started, the Corona Fakten, a German telegram channel, also published

Error comparison using different models:

Tiny:
- and the Dority Institute because this isn’t interesting Australian paper
Tiny.en with above code:
- the dirty institute because this is an interesting Australian paper (lol.. that’s a good name for them!)
Medium:
- Dougherty Institute because this is an interesting Australian paper
Large:
- and the Doherty Institute, because this is an interesting Australian paper.

Using the “Tiny” setting will create errors that might not be worth the extra time saved.

Click to expand

Show less

Using small video clip to test accuracy differences in models

example using a tiny young Beyoncé video…

“I know we’re gonna stay humble. If you ever meet me, I have a little attitude, just slap me. Just slap me right back into shape.“

(Tiny): I know we’re gonna stay home if you ever meet me. I have no abs to just slap me just slap me right back in the shape.
(Tiny.en): I know we’re gonna stay humble if you ever meet me. I have a little attitude. Just slap me. Just slap me right back into shape.
(Small.en): I know we’re gonna stay humble if you ever meet me. I have a little attitude. Just slap me. Just slap me right back into shape.
(Small): I know we’re gonna stay humble if you ever meet me. I have a little attitude. Just slap me Just slap me right back into shape.
(Medium.en): I know we’re gonna stay humble if you ever meet me. I have a little attitude. Just slap me. Just slap me right back into shape.
(Large): I know we’re gonna stay humble. If you ever meet me, I have a little attitude, just slap me. Just slap me right back in the shade.
(Large-v2): I know we’re gonna stay humble. If you ever meet me, I have a little attitude, just slap me. Just slap me right back in the shade.

Click to expand

Show less

Alternatives: Other Online versions (alternatives to Google Colab version)

Replicate: https://replicate.com/openai/whisper (uses Large and Large-v2)

Alternative: for YouTube or really crappy Subtitles:

I’ve been using chatGPT to clean-up crappy YouTube subtitles and make them more accurate:

https://chat.openai.com/chat

When using chatGPT to clean up YouTube’s god-awful auto-generated subtitles, my “go-to” commands are generally one of the following:

Please correct punctuation and spelling of the following subtitles: “your crappy subtitle text here “
Please summarize the following text: ” “
Please provide key highlights of the following subtitles, in bulletpoint form: ” “
Please correct grammar for following subtitles: ” “

Alternative: (for Very Large files) VB-CABLE Virtual Audio & Dictation Pro

An alternative to Whisper AI for larger videos (for example if you don’t want to upload a 3-hour video file, the same guy (Kevin Stratvert) has another video on how to use Window’s own built-in Dictator Pro software with Virtual Audio Cable (both free), which you have to run on your own computer but might be a life-saver for someone. However it looks to me that it transcribes in real time i.e. a 3 hour video will take 3 hours playtime. All the other apps transcribe the video in minutes. but their downside is that you have to sometimes make the file smaller, and it might just be easier to use this method in some circumstances.

Alternative: Computer Apps using Whisper AI to get subtitles

Buzz: Instructions (YouTube)
- Using: https://github.com/chidiwilliams/buzz (no internet connection needed)
Subtitle Edit: Instructions (YouTube)
- Using: https://github.com/SubtitleEdit/subtitleedit/releases/tag/3.6.10
- (tested just now using 37min video, it takes 16 minutes using base model – downside is it keeps the timestamps)
Speech Translate: Extremely Boring Instructions (YouTube)
- Using: https://github.com/Dadangdut33/Speech-Translate/releases/tag/1.1.0
- Finally got this working by being patient. It looks like there is an error, but eventually it spits out a subtitle file.

Python & WhisperAI

Python: Long but detailed walk-through of entire setup (YouTube)
Alternative walk-through that looks easier to follow that I wish I’d found first (YouTube) *video above

(My notes below are for the first link, but I recommend following the video above instead, it’s way clearer and you’ll be up and going much faster)

Using: https://www.python.org/downloads/release/python-399/
- for my pc: Windows installer (64-bit)
- And using: https://github.com/openai/whisper
  - pip install -U openai-whisper
    - (paste into a command prompt)
- And using: https://ffmpeg.org/download.html
- Created new folder c:\Vid-Transcribe and placed a video in the folder test.mp4
  1. In command prompt, cd c:\Vid-Transcribe
  2. whisper “c:\Vid-Transcribe\test.mp4” –model small.en –output_format txt –task transcribe –beam_size 5 –patience 13 –threads 4 –length_penalty 0
    - here it will automatically download the small.en model
- To use GPU: you have to uninstall torch and install the latest torch
  1. pip3 uninstall torch
  2. get code from https://pytorch.org/
  3. For my PC, code is:
    - pip3 install torch torchvision torchaudio –index-url https://download.pytorch.org/whl/cu117
    - (Wow, it’s 3GB!!)
  4. Once you have the above, can use this code to use GPU:
    - whisper “c:\Vid-Transcribe\test.mp4” –model small.en –device cuda

Additional Notes I learnt from 2nd link:

Navigate to folder that has your audio/video files
- Type in cmd in that folder, which will open up command prompt using that directory
- whisper “file name.mp4”
To translate a file from german to english
- whisper “file name.mp4” –language German –task translate

Notes while troubleshooting:

Memory errors.
1. Break up files to be <25mb if getting memory errors
2. Try –patience 2
3. Change mode:
  - small model needs 2GB Ram, base or tiny (1GB)
    - Settings > Display > Advanced > Display adapter properties
    - (shows you how much dedicated video memory you have) (Video)
4. Convert file into mp3 rather than video
  - ffmpeg -i “testvideo.mp4” -q:a 0 -map a testvideo.mp3
    - this also works:
      - ffmpeg -i example.mp4 example.mp3
5. Set NVIDIA as default video driver (Video)
  1. Check if you have an NVIDIA Card (Device Manager > Display Adapters > NVIDIA)
  2. If yes, close out of that, right-click desktop > NVIDIA control panel
    - If NVIDIA control panel is not listed via desktop (Video)
      - Control Panel > Search NVIDIA
    - To fix Error: NVIDIA Display settings are not available. You are not currently using a display attached to an NVIDIA GPU
      1. Device Manager > Display Adapters > NVIDIA GeForce > Properties
      2. Driver tab > Disable device > Yes to warning > Ok out of it
      3. Device Manager > Display Adapters > NVIDIA GeForce > Properties
      4. Driver tab > Enable device > Restart PC
  3. NIVIDIA control panel > Manage 3D settings > Global Settings > Preferred Graphics Processer > NVIDIA > Apply
  4. NIVIDIA control panel > PhysX > Select PhysX processor > GeForce GTX > Apply

Click to expand

Show less

From what I have learnt, I’ll use these commands the most using Python:

To transcribe a video and output just a text file:

whisper "test file.mp4" --model small.en --output_format txt --task transcribe --length_penalty 0 --device cuda

To transcribe & translate a Spanish mp4 into mp3 to translate into English subtitles & text:

ffmpeg -i "deadwhistleblower.mp4" -q:a 0 -map a deadwhistleblower.mp3

whisper "deadwhistleblower.mp3" --model tiny --language Spanish --task translate --patience 2 --device cuda

To transcribe a Spanish video into Spanish text: (to use a different program to translate if the whisper translation is rubbish):

whisper "deadwhistleblower.mp4" --model tiny --language Spanish --output_format txt --task transcribe --patience 2 --device cuda

Convert existing mp4 into mp3 in seconds

Browse to folder with mp4 file
CMD
basic use:
- ffmpeg -i example.mp4 example.mp3

For my use:

ffmpeg -i "testvideo.mp4" -q:a 0 -map a testvideo.mp3

Split Videos in seconds (to less than 25mb for use with whisper)

Might as well learn how to do this too (as a faster option to my video editing software)

Instructions: timestamp 10:20 from first video (YouTube)

Using https://github.com/mifi/lossless-cut/releases
- For me: https://github.com/mifi/lossless-cut/releases/download/v3.54.0/LosslessCut-win-x64.7z
Open LosslessCut exe
Drag video into app
Change timestamp
Export the cut
- (WOW, that was so fast… normally this takes a long time with video editing software, it split the video in seconds!)

WhisperAI Wishlist

chatGPT wrote a whisper script for me but it isn’t supported. but I’m keeping it on here as a must-have wish-list:

 --punctuate

--filter_outputs "uh, um, like, and and, I I, you know, like, basically, actually, sort of, kind of, hmm, mm, mhm, mmm, oh"

Download Videos in seconds

Instructions timestamp 07:48 on this video (YouTube) or Basic instructions 01:35 (YouTube) and GUI instructions (YouTube) and you’ll need FFMPEG as well (listed further up the post in the !whisper instructions)

Also uses: yt-dlp: https://github.com/yt-dlp/yt-dlp
- for my pc: https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp.exe
OR for a GUI version yt-dlp (above)
- & yt-dlp-gui https://github.com/kannagi0303/yt-dlp-gui/releases/tag/2022.11.14
- https://github.com/kannagi0303/yt-dlp-gui/releases/download/2022.11.14/yt-dlp-gui.exe
- Or Alternative GUI: https://github.com/ErrorFlynn/ytdlp-interface
- Or Alternative GUI: https://github.com/BKSalman/ytdlp-gui

To use GUI version

Move all 3 to same folder
Launch yt-dlp-gui
Enter url > analyze
- Select which video resolution you want
- Select which audio quality you want
- Select or uncheck thumbnail
- Select output directory
- Download

To use CMD version

Move all 3 to same folder
CMD from folder
yt-dlp https://youtu.be/7QPbfKDOkO4
Downloads the video in seconds

Filtering/adjusting CMD version – go to this post: [CMD] Download videos & snippets in seconds

Must-use resources for researchers are on these posts:

Please SAVE & SHARE With Other RESEARCHERS (https://pennybutler.com/truthseeker-tools/)
Useful Tools for TruthSeekers (https://pennybutler.com/sleuth-tools/)
Free Online Video Downloader (https://pennybutler.com/video-dl/)
[CMD] Download videos & snippets in seconds (https://pennybutler.com/cmd-download-videos/)

Site Notifications/Chat:

Telegram Post Updates @JourneyToABetterLife (channel)
Telegram Chatroom @JourneyBetterLifeCHAT (say hi / share info)
Gettr Post Updates @chesaus (like fakebook)

Videos:

Aussies

Reveal PayID 323

Or Buy me a Coffee (PayPal)

Penny (PennyButler.com)

Truth-seeker, ever-questioning, ever-learning, ever-researching, ever delving further and deeper, ever trying to 'figure it out'. This site is a legacy of sorts, a place to collect thoughts, notes, book summaries, & random points of interests.

View my other posts