Szerző dc.contributor.author | Sztahó Dávid | |
Szerző dc.contributor.author | Fejes Attila | |
Elérhetőség dátuma dc.date.accessioned | 2023-06-23T08:52:22Z | |
Rendelkezésre állás dátuma dc.date.available | 2023-06-23T08:52:22Z | |
Kiadás dc.date.issued | 2023 | |
Issn dc.identifier.issn | 1556-4029 | |
Uri dc.identifier.uri | http://hdl.handle.net/20.500.12944/20635 | |
Kivonat dc.description.abstract | In forensic voice comparison, deep learning has become widely popular recently. It is mainly used to learn speaker representations, called embeddings or embedding vectors. Speaker embeddings are often trained using corpora mostly containing widely spoken languages. Thus, language dependency is an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different from that the model is trained on. In the case of a low-resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre-trained on multilingual (mostly English) corpus can be used on a target low-resource language (here, Hungarian), not represented by the model. Often multiple samples are not available from the offender (unknown speaker). Samples are therefore compared pairwise with and without speaker enrollment for suspect (known) speakers. Two corpora are used that were developed especially for forensic purposes and a third that is meant for traditional speaker verification. Speaker embedding vectors are extracted by the x-vector and ECAPA-TDNN techniques. Speaker verification was evaluated in the likelihood-ratio framework. A comparison is made between the language combinations (modeling, LR calibration, and evaluation). The results were evaluated by Cllrmin and EER metrics. It was found that the model pre-trained on a different language but on a corpus with a significant number of speakers can be used on samples with language mismatch. Sample duration and speaking style also seem to affect the performance. | |
Nyelv dc.language | en | |
Kulcsszó dc.subject | AusEng | |
Kulcsszó dc.subject | ECAPA | |
Kulcsszó dc.subject | forensic voice comparison | |
Kulcsszó dc.subject | ForVoice120 | |
Kulcsszó dc.subject | language dependency | |
Kulcsszó dc.subject | speaker verification | |
Kulcsszó dc.subject | speaking style | |
Kulcsszó dc.subject | VoxCeleb | |
Kulcsszó dc.subject | x-vector | |
Cím dc.title | Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings | |
Típus dc.type | folyóiratcikk | |
Változtatás dátuma dc.date.updated | 2023-06-20T11:34:08Z | |
Változat dc.description.version | kiadói | |
Hozzáférés dc.rights.accessRights | nyílt hozzáférésű | |
Doi azonosító dc.identifier.doi | 10.1111/1556-4029.15250 | |
Tudományág dc.subject.discipline | Társadalomtudományok | |
Tudományterület dc.subject.sciencebranch | Rendészet tudományok | |
Mtmt azonosító dc.identifier.mtmt | 33754183 | |
Folyóirat dc.identifier.journalTitle | Journal of Forensic Sciences | |
Évfolyam dc.identifier.journalVolume | 68 | |
Füzetszám dc.identifier.journalIssueNumber | 3 | |
Terjedelem dc.format.page | 871-883 | |
Scopus azonosító dc.identifier.scopus | 85152039991 | |
Folyóiratcím rövidítve dc.identifier.journalAbbreviatedTitle | J FORENSIC SCI | |
Kiadás éve dc.description.issuedate | 2023 | |
Szerző intézménye dc.contributor.department | Távközlési és Médiainformatikai Tanszék | |
Szerző intézménye dc.contributor.department | Rendészettudományi Doktori Iskola | |
Szerző intézménye dc.contributor.department | Távközlési és Médiainformatikai Tanszék | |
Szerző intézménye dc.contributor.department | Rendészettudományi Doktori Iskola |