Download full text
(external source)
Citation Suggestion
Please use the following Persistent Identifier (PID) to cite this document:
https://doi.org/10.5117/CCR2021.3.001.LIND
Exports for your reference manager
Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora
[journal article]
Abstract Employing supervised machine learning for text classification is already a resource-intensive endeavor in a monolingual setting. However, facing the challenge to classify a multilingual corpus, the cost of producing the required annotated documents quickly exceeds even generous time and financial co... view more
Employing supervised machine learning for text classification is already a resource-intensive endeavor in a monolingual setting. However, facing the challenge to classify a multilingual corpus, the cost of producing the required annotated documents quickly exceeds even generous time and financial constraints. We show how tools like automated annotation and machine translation can not only efficiently but also effectively be employed for the classification of a multilingual corpus with supervised machine learning. Our findings demonstrate that good results can already be achieved with the machine translation of about 250 to 350 documents per category class and language and a dictionary in just one language, which we perceive as a realistic scenario for many projects. The methodological strategy is applied to study migration frames in seven languages (news discourse in seven European countries) and discussed and evaluated for its usability in comparative communication research.... view less
Keywords
content analysis; communication research; classification; translation
Classification
Other Fields of the Science of Communication
Free Keywords
comparative communication research; machine translation; multilingual content analysis; supervised machine learning; text classification
Document language
English
Publication Year
2021
Page/Pages
p. 1-30
Journal
Computational Communication Research, 3 (2021) 3
ISSN
2665-9085
Status
Published Version; peer reviewed