Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources
Nguyen HL, Tsolak D, Karmann A, Knauff S, Kühne S (2022)
Frontiers in Sociology 7: 910111.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
fsoc-07-910111.pdf
1.47 MB
Autor*in
Einrichtung
Abstract / Bemerkung
More and more, social scientists are using (big) digital behavioral data for their research. In this context, the social network and microblogging platform Twitter is one of the most widely used data sources. In particular, geospatial analyses of Twitter data are proving to be fruitful for examining regional differences in user behavior and attitudes. However, ready-to-use spatial information in the form of GPS coordinates is only available for a tiny fraction of Twitter data, limiting research potential and making it difficult to link with data from other sources (e.g., official statistics and survey data) for regional analyses. We address this problem by using the free text locations provided by Twitter users in their profiles to determine the corresponding real-world locations. Since users can enter any text as a profile location, automated identification of geographic locations based on this information is highly complicated. With our method, we are able to assign over a quarter of the more than 866 million German tweets collected to real locations in Germany. This represents a vast improvement over the 0.18% of tweets in our corpus to which Twitter assigns geographic coordinates. Based on the geocoding results, we are not only able to determine a corresponding place for users with valid profile locations, but also the administrative level to which the place belongs. Enriching Twitter data with this information ensures that they can be directly linked to external data sources at different levels of aggregation. We show possible use cases for the fine-grained spatial data generated by our method and how it can be used to answer previously inaccessible research questions in the social sciences. We also provide a companion R package, `nutscoder`, to facilitate reuse of the geocoding method in this paper.
Erscheinungsjahr
2022
Zeitschriftentitel
Frontiers in Sociology
Band
7
Art.-Nr.
910111
Urheberrecht / Lizenzen
eISSN
2297-7775
Finanzierungs-Informationen
Open-Access-Publikationskosten wurden durch die Universität Bielefeld gefördert.
Page URI
https://pub.uni-bielefeld.de/record/2964027
Zitieren
Nguyen HL, Tsolak D, Karmann A, Knauff S, Kühne S. Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources. Frontiers in Sociology. 2022;7: 910111.
Nguyen, H. L., Tsolak, D., Karmann, A., Knauff, S., & Kühne, S. (2022). Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources. Frontiers in Sociology, 7, 910111. https://doi.org/10.3389/fsoc.2022.910111
Nguyen, Hoang Long, Tsolak, Dorian, Karmann, Anna, Knauff, Stefan, and Kühne, Simon. 2022. “Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources”. Frontiers in Sociology 7: 910111.
Nguyen, H. L., Tsolak, D., Karmann, A., Knauff, S., and Kühne, S. (2022). Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources. Frontiers in Sociology 7:910111.
Nguyen, H.L., et al., 2022. Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources. Frontiers in Sociology, 7: 910111.
H.L. Nguyen, et al., “Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources”, Frontiers in Sociology, vol. 7, 2022, : 910111.
Nguyen, H.L., Tsolak, D., Karmann, A., Knauff, S., Kühne, S.: Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources. Frontiers in Sociology. 7, : 910111 (2022).
Nguyen, Hoang Long, Tsolak, Dorian, Karmann, Anna, Knauff, Stefan, and Kühne, Simon. “Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data Sources”. Frontiers in Sociology 7 (2022): 910111.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
fsoc-07-910111.pdf
1.47 MB
Access Level
Open Access
Zuletzt Hochgeladen
2022-07-06T17:04:52Z
MD5 Prüfsumme
c880357b6c84441bf81ce6b5e6672c6a
Link(s) zu Volltext(en)
Access Level
Open Access
Daten bereitgestellt von European Bioinformatics Institute (EBI)
Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
References
Daten bereitgestellt von Europe PubMed Central.
Material in PUB:
In sonstiger Relation
Corrigendum: Efficient and reliable geocoding of German Twitter data to enable spatial data linkage to official statistics and other data sources
Nguyen HL, Tsolak D, Karmann A, Knauff S, Kühne S (2022)
Frontiers in sociology 7: 995770.
Nguyen HL, Tsolak D, Karmann A, Knauff S, Kühne S (2022)
Frontiers in sociology 7: 995770.
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 35755485
PubMed | Europe PMC
Suchen in