Deliverables >> Databases
Databases of BUSIM SPG
Contact us to obtain databases
Radiology Text Corpus: 507 radiological reports are collected from Hacettepe University Radiology Department. All of them are ultrasonography reports belonging to 28 different areas. (Number of words:91469, Number of distinct words:1562). In our experiments we use 463 reports for training text corpus and 60 reports for testing.
Newspaper Text Corpus: The text materials are the articles of Milliyet newspaper belonging to different domains like World News, Economics, Contemporary News, Politics, Daily Life, collected in a one month period. (Number of words:355497, Number of distinct words:55931).
General Text Corpus: The training corpus contains texts from literature (novels, stories, poems, essays), law, politics, social sciences (history, sociology, economy), popular science, information technology, medicine, newspapers and magazines. (Number of words:9,142,750, Number of distinct words:457,684). The test corpus contains text from literature, news, law, medical, politics, sociology, history, newspaper, newspaper articles, economy. (Number of words:1,091,804) (See Text_corpus.pdf)
Radiology Speech Database: 95 sentences covering the most frequent triphones from radiology reports are selected and speech data from these sentences are recorded from 16 different speakers. Also 60 radiological reports are recorded from six female and four male speakers, only two of the speakers are doctors.
Turkish Voice Conversion Database #1: 20 speakers (10 male, 10 female) uttering identical material (20 naturally spoken utterances describing a recording studio, 9 read utterances, 2 repetitions of sustained Turkish vowels /a/, /i/, and /u/) with simultaneous EGG recordings.
Turkish Voice Conversion Database #2: 17 speakers (9 male, 8 female) uttering identical material (36 read utterances, 60 isolated words)
Turkish Voice Conversion database #3: 5 speakers (3 male, 2 female) uttering identical material (paragraphs from a novel containing 20 utterances)
Turkish Voice Conversion database #4: 5 speakers (4 male, 1 female) uttering identical material (157 read utterances from Internet news). Two male speakers recorded the material at three different speaking rates: fast, normal, and slow.
Turkish Database of Children Voices (Isolated Words): 23 speakers (16 male, 7 female) uttering identical material (49 isolated words).
Turkish Database of Adult Voices (Isolated Words): 11 speakers (7 male, 4 female) uttering identical material (126 isolated words, 14 isolated Turkish phonemes including vowels, /s/,/S1/,/j/,/z/, /f/, and /v/). The database covers all words in the Children Voices database.
An emotional data corpus of 484 utterances is used with 121 utterances per emotional state. 11 different people (8 females, 3 males) recorded 11 different Turkish sentences, and each sentence is recorded four times; each time with a different emotional state: neutral (or normal), sad, angry, and happy. Each utterance was recorded at 16-kHz, and 256kbps.
Turkish Speech and Text Driven 3D Face Synthesis Databases:
Audio-visual database: One female speaker, 200 utterances from a phonetically balanced Turksih speech corpus, audio data: 16 bits, 16 kHz, visual data: 29 3-D face points, 30 Hz.
Audio database: 7 male and 7 female speakers, same utterances and format.