nuova sezione dedicata alla comunicazione video del tango argentino
il tempo scorre fra le dita. Tra il pollice e l'indice scorre la sabbia che si accumula al fondo della pagina. La mano rappresenta una delle mudra più usate nel prânâyâma yoga (Jnana Mudra). Questo gesto accompagna anche la meditazione.

History of Automatic Speech Recognition Systems

Problems and Opportunities

The projects in speech recognition are supported by the Advanced Research Projects Agency (ARPA) and since 1971 a lot of things related to acoustic-phonetics, syntax, semantics, and context are more clearly understood. For now we are able to handle relatively small vocabulary size of few hundred words which are trained single time. In case of unrestricted vocabulary size and many speakers we are really far away from the desired results. As for now it seems an intense research with many different domains to handle this task where we just pick up a random person and make his/her recognized.

The primary sources of information in this field are the IEEE Transactions on Acoustics, Speech, and Signal Processing (pertinent special issues: vol. 21, June) and the Journal of Acoustic Society of America (in particular semi annual conference abstracts which appear with January and July issues each year).

Authors who have written in the various branches of Speech Communication includes:

1. Flanagan

2. Fant

3. Lehistic

Other useful sources (along with the researchers) which are presently working in the field of speech recognition includes

1. Bell Telephone Laboratories (Denes, Flanagan)

2. Carnegie- Mellon University (Erman, Newell, Reddy)

3. Research Laboratories of Electronics, M.I.T (Klatt)

4. System Development Corporation (Barnett, Ritea)

5. University of California Berkeley,(O'Malley)

6. Haskins Laboratories (Cooper, Mermelstein)

7. Bolt Beranek and Newman, Inc. (Makhoul, Wolf, Woods)

8. Xerox Palo Alto Research Center (White)

9. Threshold Technologies (Martin)

10. Stanford Research Institute (Walker)

11. Speech Communication Research Laboratories (Broad, Markel, Shoup)

12. IBM Research Laboratories (Bahl, Dixon, Jelinek)

13. Department of Speech Communication, KTH, Stockholm (Fant)

Speech Recognition itself what I feel can be better understood if we know the fundamental structure of each individual unit of speech, i.e., phoneme sounds in each language. Thus, though presently we have started working with recognition of continuous words, it becomes a rather important and difficult task to achieve even phoneme level recognition when the environment is noisy or the phonemes sounds are spoken by people of different geographical locations with different native language.

To make the system speaker independent and of a very large vocabulary size it needs to cater research issues of different fields such as knowing the domain, i.e., the area from where the speaker belongs, the different environments in which the speech is recorded which greatly affects the audio quality and hampers the recognition a lot.

Reference:

1. R. Reddy, "Speech Recognition by Machine: A Review", IEEE Proceedings 64(4),April

1976, Pg 502-531

Hi every one, I am Raj Rishi Purohit from Gandhinagar(Gujarat). I have just completed my B.Tech in ICT. My research area includes Speech Recognition and Filterbank design using wavelet methods for speech recognition. Presently I am working on Sphinx for my speech recognition projects.

For further details on the topic covered, you can contact me at http://www.webmultimediale.org by posting your views on this short article.

You can also mail me the queries and your valuable suggestions at rajrishipurohit[at]gmail.com so that we can improve upon the work and make it more robust

Problemi e opportunità

I progetti di riconoscimento vocale sono supportati dalla Advanced Research Projects Agency (ARPA), e dal 1971 molti aspetti acustico-fonetici, sintattici, semantici e contestuali sono diventati più comprensibili.

Per ora siamo in grado di gestire un vocabolario relativamente di piccole dimensioni, di poche centinaia di parole, che vengono apprese una sola volta.

Nel caso di vocabolari molto estesi e con molti oratori, siamo veramente lontani dai risultati attesi. E' ancora di là da venire la possibilità di ottenere il riconoscimento di un utente a caso, questo richiederà una intensa ricerca in diversi ambiti.

In questo campo, le fonti principali di informazione sono le IEEE Transactions on Acoustics, Speech, and Signal Processing (pertinent special issues: vol. 21, June) e il Journal of Acoustic Society of America (in particulare gli estratti della semi annual conference che si pubblicano in gennaio e in luglio ogni anno).


Autori che hanno scritto nei vari rami della Speech Communication:

1. Flanagan

2. Fant

3. Lehistic


Altre fonti utili (insieme con i ricercatori) che trattano la situazione in progress della ricerca:

1. Bell Telephone Laboratories (Denes, Flanagan)

2. Carnegie- Mellon University (Erman, Newell, Reddy)

3. Research Laboratories of Electronics, M.I.T (Klatt)

4. System Development Corporation (Barnett, Ritea)

5. University of California Berkeley,(O’Malley)

6. Haskins Laboratories (Cooper, Mermelstein)

7. Bolt Beranek and Newman, Inc. (Makhoul, Wolf, Woods)

8. Xerox Palo Alto Research Center (White)

9. Threshold Technologies (Martin)

10. Stanford Research Institute (Walker)

11. Speech Communication Research Laboratories (Broad, Markel, Shoup)

12. IBM Research Laboratories (Bahl, Dixon, Jelinek)

13. Department of Speech Communication, KTH, Stockholm (Fant)


La Speech Recognition può essere meglio compresa se si conosce la struttura fondamentale di ogni singola unità di discorso, i fonemi in ogni linguaggio. Anche se al momento abbiamo iniziato a lavorare con il riconoscimento di parole in tempo reale, rimane un compito alquanto decisivo e difficile da ottenere il riconoscimento a livello di fonemi, quando l'ambiente è rumoroso o i fonemi sono parlati da persone di diverse provenienze geografiche, con diverse lingue madre.

Per fare in modo che i sistemi siano indipendenti dal parlante e capaci di gestire vocabolari estesi, sarà necessario coordinare i diversi aspetti della ricerca nei vari campi coinvolti (l'area a cui l'oratore appartiene, i diversi ambienti in cui il discorso è registrato, la qualità audio etc.).

Reference:

1. R. Reddy, “Speech Recognition by Machine: A Review”, IEEE Proceedings 64(4),April

1976, Pg 502-531

Raj Rishi Purohit è Systems Engineer at Infosys Technologies Ltd. Puoi contattarlo scrivendo a rajrishipurohit[at]gmail.com.

For further details on the topic covered, you can contact me at http://www.webmultimediale.org by posting your views on this short article.

You can also mail me the queries and your valuable suggestions at rajrishipurohit[at]gmail.com so that we can improve upon the work and make it more robust.

25-settembre-2009

E' vietata la riproduzione totale o parziale o in remix del layout e dei contenuti (testi, immagini, filmati) su carta, supporti tecnologici (CD-ROM, DVD, siti Web, ...) e altro per ricavarne lucro, o per distribuirlo come proprio o per qualsiasi altra motivazione, senza il consenso scritto di Claude Almansi, responsabile editoriale del sito
It is prohibited to reproduce, modify, adapt or otherwise use any of the images movies and information in any manner or form without the written permission of the Website owner. email: claude.almansi@bluewin.ch
PRIVACY  | NOTE LEGALI 
 xhtml  | css 2  | wcag aa  | tableless layout di Roberto Ellero  | rss