Volltext herunterladen
(1.459 MB)
Zitationshinweis
Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgenden Persistent Identifier (PID):
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-93576-2
Export für Ihre Literaturverwaltung
Large language models as a substitute for human experts in annotating political text
[Zeitschriftenartikel]
Abstract Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annot... mehr
Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a 'hybrid' coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.... weniger
Thesaurusschlagwörter
Textanalyse; Automatisierung; künstliche Intelligenz; Sprache; Modell; Codierung
Klassifikation
Erhebungstechniken und Analysetechniken der Sozialwissenschaften
Freie Schlagwörter
GPT; Large language models; machine learning; text-as-data
Sprache Dokument
Englisch
Publikationsjahr
2024
Zeitschriftentitel
Research and Politics, 11 (2024) 1
DOI
https://doi.org/10.1177/20531680241236239
ISSN
2053-1680
Status
Veröffentlichungsversion; begutachtet (peer reviewed)