The data is organized in three folders: Recording_1, all_sounds_session_1, Recording_2. Additionally, two csv spreadsheets are provided: a metadata table and a sentence list.
Folder Recording_1 contains all automatically and manually processed files. It is organized by speaker and consists of one folder for each speaker (N= 59). Each speakers folder contains all sentences as wav and TextGrid files (N=396). This folder contains additionally, a further folder all_sounds_session_1 with all the extracted fricatives of the first recording session (N= 17287). A second folder Recording_2 contains 18 recording of the second session, which are only automatically processed.
To avoid showing the speakers’ identity, the recordings were anonymized. The names of folders e.g. 01_F_22_1 are coded as follows: The first two digits refer to speaker id (01-59). The characters F and M refer to speaker's gender. The following two digits indicate the age (18-30) of the participant at the time of the recording (2018). The last number indicates the recording session (1, 2). The filenames of the sentences e.g. 01_F_22_1_001 represent the same information as the folder names and additionally the sentence number in the last three digits. The single sound files e.g. 01_F_1_001_11_v contain information on speaker id,
gender, recording session, sentence no, interval number on the third tier and the fricative label in Sampa.
The sentence list contains all recorded sentences. The first column SentenceNo corresponds to the sentence number in the file name. The second column presents the sentence in Russian and the third gives a transcription in IPA. A fourth column lists separately the target words from which the fricatives were extracted in IPA. The last column comments contains comments on the omitted sentences for the fricative extraction. More specifically, some of the sentences turned out to be miss-constructed and are excluded from the fricative database. A few sentences were exchanged in the middle of the experiment, and are therefore different between the participants (N=3). These sentences are marked in the sentence list and were excluded from the investigations so far (see next section). Some of the natural sentences turned out to be useless for the defined research purposes and were also excluded from the database (N=11). To fill the gaps of missing fricative examples from words embedded in natural sentences, these fricatives were extracted from other words. These cases are also marked in the sentence list. Moreover, in the process of automatically generating the TextGrid tiers and the transcription, some Russian vowels were wrongly transcribed.
Only some vowels in words containing a target fricative have been corrected.
The metadata table gives information on the target fricatives. Columns SentenceNo and IntervalNo indicate the sentence and interval number by which the fricatives can be identified. SentenceTyp refers to one of the two types of sentences, CS stands for carrier sentence and NC for none-carrier sentence. The column Position specifies the location of the target fricative in a carrier sentence and 1 stands for the”X” position and 2 for ”Y” position. Column fricative position shows the word-position of the fricative and B stands for word-initial position, M for word-middle, and E for word-final. Columns Sampa and IPA indicate the transcription format. The columns PrecedingSound and FollowingSound contain as the column names indicate the the preceding and following sound of the target fricatives. The columns Voicing and Palatalization show whether the target fricative is
voiced or palatalized. The sentence number, interval number and fricative label of each fricative filename corresponds to the columns SentenceNo, IntervalNo and Sampa in the metadata table