[See English translation after the video]
Uno dei freni più determinanti alla diffusione delle trascrizioni e sottotitolazioni dei video sul Web è il lavoro richiesto per la conversione del parlato in testo. Si tratta di un lavoro molto oneroso in termini di tempo, e poco interessante. Si deve ascoltare ogni singola frase e fermare il video per trascriverla, spesso dovendola ascoltare più volte per la scarsa qualità audio o ...dell'oratore.
Le tecnologie di riconoscimento vocale sono in rapido sviluppo, e promettono di risolvere in un futuro forse vicino il problema, automatizzando la trascrizione del parlato. La loro diffusione potrebbe diminuire i costi delle licenze, attualmente esorbitanti, e consentire finalmente il diffondersi capillare delle pratiche di trascrizione e sottotitolazione dei filmati pubblicati sul Web.
Per avere una panoramica dello stadio attuale delle conoscenze e delle possibilità offerte da questa tecnologia, oggi e in futuro, ne parliamo con Raj Rishi Purohit, uno studioso di sistemi di riconoscimento vocale, che sta portando avanti la sua ricerca presso il Dhirubhai Ambani Institute of Infomation and Communication Technology presso Gandhinagar (India).
Leggi su Webmultimediale una disamina delle tecnologie attualmente disponibili, a cura di Raj Rishi Purohit.
Raj Rishi Purohit è Systems Engineer at Infosys Technologies Ltd. Puoi contattarlo scrivendo a rajrishipurohit[at]gmail.com.
Scorciatoie da tastiera: C disattiva-attiva i sottotitoli, Spazio Play-Pausa, P Play-Pausa, S Stop, F Fullscreen, left 5 secondi indietro, right 5 secondi avanti, up Volume up, down Volume down, + Volume up, - Volume down.
Raj Rishi Purohit is Systems Engineer at Infosys Technologies Ltd. Its research area includes Speech Recognition and Filterbank design using wavelet methods for speech recognition. Presently he is working on Sphinx for speech recognition projects.
You can mail Raj Rishi the queries and your valuable suggestions at rajrishipurohit[at]gmail.com so that we can improve upon the work and make it more robust.
One of the greatest obstacles to the diffusion of transcripts and captioning for video on the Web is the work required to convert speech to text. This activity is very time-consuming, mechanical and boring: you must listen to each sentence and stop the video to transcribe it. Often, you need to listen to the same passage repeatedly, because the audio quality is poor, or because the speaker mumbles.
Voice recognition technologies are rapidly developing and they promise to soon allow automated speech transcription. The diffusion of these systems would reduce the cost of licenses, currently exorbitant, and eventually allow a widespread practice of transcription and subtitling of movies published on the Web.
To get an overview of the present situation of knowledge, and the possibilities offered by this technology, now and in the future, in the video above we speak with Rishi Raj Purohit, a researcher in voice recognition systems, conducting his research at the Dhirubhai Ambani Institute of Information and Communication Technology at Gandhinagar (India).
Read a history of Automatic Speech Recognition Systems, by Raj Rishi Purohit.
Below table shows how different fields of speech recognition:
| Mode of Speech | Vocabulary Size | Task specific Information | Language | Speaker | Environment | |
|---|---|---|---|---|---|---|
| Word recognition-isolated (WR) | Isolated words | 10-300 | Limited use | --------- | cooperative | --------- |
| Connected speech recognition | Connected speech | 30-300 | Limited use | Restricted command language | cooperative | Quite room |
| Speech Understanding-restricted (SU) | Connected speech | 100-2000 | Full use | English-like | Not cooperative | --------- |
| Dictation machine-restricted (DM) | Connected speech | 1000-10000 | Limited use | English-like | cooperative | Quite room |
| Unrestricted speech understanding (USU) | Connected speech | unlimited | Full use | English | Not cooperative | --------- |
| Unrestricted connected speech recognition (UCSR) | Connected speech | unlimited | None | English | Not cooperative | Quite room |
Pagina pubblicata in data 25 settembre 2009.




© 2007-10