Next I also tried with the vosk-model-en-us-aspire-0.2 which was a 1.4GB download compared to 36MB of vosk-model-small-en-us-0.3 and is listed at : mv model model.vosk-model-small-en-us-0.3 So we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us. The "z" of the before last "zero" sounds a bit like an "s". The "nine oh two one oh" is said very fast, but still clear. The example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as: one zero zero zero one The same directory also contains an srt subtitle output example, which is easier to evaluate and can be directly useful to some users: python3 -m pip install srt The result will be stored in json format. Then install vosk-api with pip: pip3 install vosk It supports 7+ languages and works on variety of platforms including RPi and mobile.įirst you convert the file to the required format and then you recognize it: ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav The software you can use is Vosk-api, a modern speech recognition toolkit based on neural networks.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |