Volltext herunterladen
(1.291 MB)
Zitationshinweis
Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgenden Persistent Identifier (PID):
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-91666-3
Export für Ihre Literaturverwaltung
Optimized Dictionaries: A Semi-Automated Workflow of Concept Identification in Text-Data
Optimierte Wörterbücher: Ein teilautomatisierter Arbeitsablauf zur Identifizierung von Konzepten in Textdaten
[Arbeitspapier]
Abstract Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries th... mehr
Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries that are based on the complementary use of expert dictionaries, machine learning, and topic modeling. We demonstrate our case by identifying the concept of "territorial politics" in leading newspapers vis-à-vis parliamentary speeches in Spain (1976-2018) and the UK (1900-2018). We show that our optimized dictionaries outperform singular text-identification techniques with F1-scores around 0.9 for unseen data, even if the unseen data comes from a different political domain (media vs. parliaments). Optimized dictionaries have increasing returns and should be developed as a common good for researchers overcoming costly particularism.... weniger
Thesaurusschlagwörter
Textanalyse; Massenmedien; Parlamentsdebatte; Wörterbuch; Aufmerksamkeit; politische Agenda
Klassifikation
Forschungsarten der Sozialforschung
Freie Schlagwörter
text-as-data; agenda-setting; salience
Sprache Dokument
Englisch
Publikationsjahr
2024
Seitenangabe
24, 18 S.
Status
Preprint; nicht begutachtet