Research >> Current Projects
Turkish Dictation
Ebru Arısoy, Levent Arslan, Murat Saraçlar
We are working on the design of Turkish dictation system. Dictation is one of the most challenging areas in automatic speech recognition. There is a large demand for speech-to-text systems because speaking is faster than typing in most of the languages. However, today most dictation systems do not perform at desired recognition rates, since the vocabulary size can be huge for any given language. In addition to that, Turkish is a challenging language for speech recognition applications. Turkish is an agglutinative language with free word order. These characteristics of the language result in the vocabulary explosion and the complexity of the N-gram language models in speech recognition. In order to alleviate this problem, firstly, we propose a task-specific, Radiology Dictation System. Using words as recognition units, we achieve 87.06 % recognition performance with a small vocabulary size in a speaker independent system. Secondly, we try a large vocabulary dictation system, Dictation for Newspaper Content. In that case we faced with the problems of the agglutinative nature of the language. Therefore, rather than words, we are searching for new recognition units, units which may cover most of the language and achive better recognition performance. This project is supported by SIMILAR Network of Excellence within EU's 6th Framework Programme, WP 9.
One common example of task specific dictation systems is dictation for radiologists who are often eyes and hands-busy at work. In Turkey, in most of the hospitals, radiologist perform their task by recording the diagnosis about the X-ray photograph or the MRI of the patient and then a secretary converts these recordings into written form. Therefore using a dictation system can make the life easier from the point of the radiologist. Different than the agglutinative nature of Turkish, the specific vocabulary of radiological terminology and systematic arrangement of words in sentence formation, make the radiology area suitable for the dictation applications. In Turkish Radiological dictation system, the vocabulary size can be reduced to only several thousand words, and the perplexity can be very small. Below is the GIU of our radiological dictation. HTK is used for the speech recognition system. Also a Radiology Text and Speech Corpus is collected.

Newspaper Content Transcription System:
In this research, we focused on the selection of base recognition units for Large Vocabulary Continuous Speech Recognition (LVCSR) applications, especially for agglutinative languages. There is a high tendency to select words as recognition units. However, the selection decision has to be changed according to the characteristics of the language. For English, words are good choices, however for agglutinative languages, words as recognition units will be failed due to the productive morphology of the language. The criterion for appropriate base recognition units is that, the units have to be longer enough in terms of acoustic information to make a reliable decision. Also the units will be able to cover the language with the moderate vocabulary size. Our research and experiments on different recognition units are for Turkish; however, indications of this research can be generalized to other agglutinative languages, like Finnish, Korean, etc…
Firstly, we try a combined model where recognition units like words, stems and endings and morphemes are used together. This model takes the advantages of each units and compensate the drawbacks by using all the models together. In this model, the most frequent words are left as stems and these words have more chance to be recognized correctly. This model solves the problems of large number of OOV-words and perplexity, however no significant improvement is achieved in recognition performance.
Previous Research: "Statistical Language Models For Large Vocabulary Turkish Speech Recognition"
Helin Dutağacı, Levent Arslan
In this project, we have compared four statistical language models for large vocabulary Turkish speech recognition. Turkish is an agglutinative language and has a productive morphotactics. This property of Turkish results in a vocabulary explosion and misestimation of N-gram probabilities while designing speech recognition systems. The solution is to parse the words, in order to get smaller base units that are capable of covering the language with relatively small vocabulary size. Three different ways of decomposing words into base units are described: Morpheme-based model, stem-ending-based model and syllable-based model. These models with the word-based model are compared with respect to vocabulary size, text coverage, bigram perplexity and speech recognition performance. We have constructed a Turkish Text Corpus of size 10 million words, containing various texts collected from the Web. These texts have been parsed into their morphemes, stems, endings and syllables and statistics of these base units are estimated. Finally we have performed speech recognition experiments with models constructed with these base units.
Publications
1. Arisoy, E., Dutagaci, H. and Arslan, L. M., 2006, "A Unified Language Model for Large Vocabulary Continuous Speech Recognition of Turkish", Signal Processing, 86(10):2844-2862, October 2006.
2. Arisoy, E., and Arslan, L. M., 2005, "Turkish Dictation System for Broadcast News Applications", 13th European Signal Processing Conference - EUSIPCO 2005, Antalya, Turkey. (pdf)
3. Arısoy, E., and Arslan, L. M., 2004, "Turkish Radiology Dictation System", 9th International Conference Speech and Computer - SPECOM 2004, St. Petersburg, Russia. (pdf)
4. H. Dutagaci and L.M. Arslan “Comparison of Statistical Language Models for Turkish Speech Recognition” in Proceedings of the 7th International Conference On Spoken Language Processing (ICSLP-2002), Denver, Colorado, USA, Sep. 2002, pp. 729-732.
Publications (in Turkish)
1. Arisoy, E., and Arslan, L. M, 2005, "Türkçe Gazete Haberleri Dikte Sistemi", SIU 2005 (IEEE 13. Sinyal İşleme ve İletişim Uygulamaları Kurultayı) , Kayseri, Turkey. (pdf)
2. H. Dutagaci and L.M. Arslan “Türkçe Konuşma Tanıma için İstatistiksel Dil Modelleri”, 2002 Sinyal İşleme ve İletişim Uygulamaları Kurultayı (SIU-2002), Denizli, Türkiye, pp. 64-69.
Thesis
1. “Turkish Dictation System for Radiology and Broadcast News Applications”, M.S. Thesis, by Ebru Arısoy, 2004
2. “Statistical Language Models For Large Vocabulary Turkish Speech Recognition”, M.S. Thesis, by Helin Dutağacı,2002