Publikationen Korpus- und Computerlinguistik
2019
- Proisl, Thomas. The cooccurrence of linguistic structures. Erlangen: FAU University Press, 2019.
URL: https://nbn-resolving.org/urn:nbn:de:bvb:29-opus4-111251
2015
- Kabashi, Besim. Automatische Verarbeitung der Morphologie des Albanischen. Erlangen: FAU University Press, 2015.
URL: https://opus4.kobv.de/opus4-fau/files/6859/Dissertation_Besim_Kabashi_OPUS.pdf
2021
- Peters, Joachim, et al. "Presenting palliative care units and specialized outpatient palliative care teams on the internet – a corpus-based meta-analysis of websites." Zeitschrift für Palliativmedizin 21.1 (2021).
2020
- Dykes, Natalie, et al. "Reconstructing Arguments from Noisy Text." Datenbank-Spektrum 20 (2020): 123-129.
URL: https://link.springer.com/article/10.1007/s13222-020-00342-y - Dykes, Natalie, and Joachim Peters. "Reconstructing argumentation patterns in German newspaper articles on multidrug-resistant pathogens: a multi-measure keyword approach." Journal of Corpora and Discourse Studies 3 (2020): 51-74.
URL: https://jcads.cardiffuniversitypress.org/articles/abstract/35/
2019
- Peters, Joachim, et al. "A Linguistic Model of Communication Types in Palliative Medicine: Effects of Multidrug-Resistant Organisms (MDRO) Colonization or Infection and Isolation Measures in End of Life on Family Caregivers’ Knowledge, Attitude and Practices." Journal of Palliative Medicine 22.8 (2019).
URL: https://www.liebertpub.com/doi/pdf/10.1089/jpm.2019.0027 - Evert, Stefan, et al. "Combining Machine Learning and Semantic Features in the Classification of Corporate Disclosures." Journal of Logic, Language and Information (2019): 309-330.
- Peters, Joachim, et al. "Metaphors for multidrug-resistant bacteria in German newspaper articles, 1995-2015. A computer-assisted qualitative study." Metaphor and the Social World 9.2 (2019): 221-241.
2017
- Büttner, Andreas, et al. "»Delta« in der stilometrischen Autorschaftsattribution." Zeitschrift für digitale Geisteswissenschaften (2017).
URL: http://www.zfdg.de/2017_006 - Evert, Stefan, et al. "Understanding and explaining Delta measures for authorship attribution." Digital Scholarship in the Humanities 32.suppl_2 (2017): ii4–ii16.
- Schäfer, Fabian, Stefan Evert, and Philipp Heinrich. "Japan's 2014 General Election: Political Bots, Right-Wing Internet Activism and PM Abe Shinzō’s Hidden Nationalist Agenda." Big Data 5.4 (2017): 1 - 16.
2016
- Evert, Stefan, et al. "A Distributional Approach to Open Questions in Market Research." Computers in Industry 78 (2016): 16-28.
2021
- Keuchen, Michael, et al. "Anonymisierung von Gerichtsurteilen – Eine wesentliche Voraussetzung für E-Justice –." Cybergovernance - Tagungsband des 24. Internationalen Rechtsinformatik Symposions IRIS 2021. Hrg. Schweighofer E, Eder S, Hanke P, Kummer F, Saarenpää A, Editions Weblaw, 2021. 137 - 149.
2019
- Dimpel, Friedrich Michael, and Thomas Proisl. "Gute Wörter für Delta: Verbesserung der Autorschaftsattribution durch autorspezifische distinktive Wörter." DHd 2019. Digital Humanities: multimedial & multimodal. Konferenzabstracts. Ed. Patrick Sahle, 2019. 296–299.
URL: https://zenodo.org/record/2596095
2018
- Uhrig, Peter, Stefan Evert, and Thomas Proisl. "Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes." Lexical Collocation Analysis: Advances and Applications. Ed. Cantos-Gómez P, Almela-Sánchez M, Cham: Springer International Publishing, 2018. 111–140.
2017
- Evert, Stefan, and Stella Neumann. "The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German." Empirical Translation Studies. New Theoretical and Methodological Traditions. Ed. De Sutter G, Lefer M, Delaere I, Berlin: Mouton de Gruyter, 2017. 47-80.
URL: http://www.stefan-evert.de/PUB/EvertNeumann2017/
2020
- Proisl, Thomas, and Gabriella Lapesa. "KLUMSy@KIPoS: Experiments on Part-of-Speech Tagging of Spoken Italian." Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020), Online Ed. Basile V, Croce D, Di Maro M, Passaro L, CEUR-WS.org, 2020.
URL: http://ceur-ws.org/Vol-2765/paper140.pdf - Blombach, Andreas, et al. "A Corpus of German Reddit Exchanges (GeRedE)." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 6310-6316.
URL: https://www.aclweb.org/anthology/2020.lrec-1.774 - Proisl, Thomas, et al. "EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 6142-6148.
URL: https://www.aclweb.org/anthology/2020.lrec-1.754 - Evert, Stefan, et al. "Corpus query lingua franca part II: Ontology." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 3346-3352.
2019
- Dykes, Natalie, Philipp Heinrich, and Stefan Evert. "Reconstructing Twitter arguments with corpus linguistics." Presented at ICAME40: Language in Time, Time in Language, Neuchâtel 2019.
- Dykes, Natalie, Philipp Heinrich, and Stefan Evert. "Arguing Brexit on Twitter. A corpus linguistic study." Presented at European Conference on Argumentation 2019, Groningen 2019.
- Gracia, Jorge, et al. "Results of the translation inference across dictionaries 2019 shared task." Proceedings of the 2nd TIAD Shared Task - Translation Inference Across Dictionaries, TIAD 2019, Leipzig Ed. Jorge Gracia, Besim Kabashi, Besim Kabashi, Ilan Kernerman, CEUR-WS, 2019. 1-12.
- Proisl, Thomas, et al. "The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet." Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019), Trento Association for Computational Linguistics, 2019. 73-79.
URL: https://www.aclweb.org/anthology/2019.nsurl-1.11 - Kabashi, Besim. "Collecting collocations for the Albanian language." Proceedings of the 6th Biennial Conference on Electronic Lexicography in the 21st Century: Smart Lexicography, eLex 2019, Sintra Ed. Iztok Kosem, Tanara Zingano Kuhn, Margarita Correia, Jose Pedro Ferreira, Maarten Jansen, Isabel Pereira, Jelena Kallas, Milos Jakubicek, Simon Krek, Carole Tiberius, Lexical Computing CZ s.r.o., 2019. 478-489.
2018
- Proisl, Thomas, et al. "EmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions." Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brüssel Ed. Balahur A, Mohammad SM, Hoste V, Klinger R, Brussels: Association for Computational Linguistics, 2018. 235–242.
URL: http://aclweb.org/anthology/W18-6234 - Heinrich, Philipp. "Stylistic Features in Corporate Disclosures and their Predictive Power." Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), Takamatsu Ed. Yukio Tono & Hitoshi Isahara, 2018. 129 - 134.
- Heinrich, Philipp, and Fabian Schäfer. "Extending Corpus-Based Discourse Analysis for Exploring Japanese Social Media." Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), Takamatsu Ed. Yukio Tono & Hitoshi Isahara, 2018. 135 - 140.
- Proisl, Thomas. "SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts." Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki Ed. Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T, Miyazaki: European Language Resources Association, 2018. 665–670.
URL: http://www.lrec-conf.org/proceedings/lrec2018/pdf/49.pdf - Kabashi, Besim, and Thomas Proisl. "Albanian Part-of-Speech Tagging: Gold Standard and Evaluation." Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki Ed. Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T, Miyazaki: European Language Resources Association, 2018. 2593–2599.
URL: http://www.lrec-conf.org/proceedings/lrec2018/pdf/89.pdf - Heinrich, Philipp, et al. "A Transnational Analysis of News and Tweets about Nuclear Phase-Out in the Aftermath of the Fukushima Incident." Proceedings of the Workshop on Computational Impact Detection from Text Data, Miyazaki Ed. Andreas Witt, Jana Diesner, Georg Rehm, Paris: ELRA, 2018. 8 - 16.
- Proisl, Thomas, et al. "Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods." Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki Ed. Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T, Miyazaki: European Language Resources Association, 2018. 3309–3314.
URL: http://www.lrec-conf.org/proceedings/lrec2018/pdf/835.pdf - Evert, Stefan, Natalie Dykes, and Joachim Peters. "A quantitative evaluation of keyword measures for corpus-based discourse analysis." 2018.
URL: http://www.stefan-evert.de/PUB/EvertEtc2018_CAD_slides.pdf - Peters, Joachim, and Natalie Dykes. "From keywords to discourse - towards a keyword operationalisation model in discourse linguistics." Proceedings of the Corpora and Discourse International Conference Lancaster, 2018.
2017
- Evert, Stefan, et al. "Combining Machine Learning and Semantic Features in the Classification of Corporate Disclosures." Proceedings of the Logic and Algorithms in Computational Linguistics 2017 (LACompLing2017), Stockholm Ed. Loukanova R, Liefke K, Stockholm: Stockholm University, 2017. 47 - 62.
URL: http://su.diva-portal.org/smash/get/diva2:1140018/FULLTEXT03.pdf - Proisl, Thomas, et al. "Translation Inference across Dictionaries via a Combination of Graph-based Methods and Co-occurrence Statistics." Proceedings of the Shared Task on Translation Inference Across Dictionaries, Galway Ed. McCrae J, Bond F, Buitelaar P, Cimiano P, Declerck T, Gracia J, Kernerman I, Ponsoda E, Ordan N, Piasecki M, CEUR, 2017. 94–102.
URL: http://ceur-ws.org/Vol-1899/TIAD17_paper_1.pdf - Lapesa, Gabriella, and Stefan Evert. "Large-scale evaluation of dependency-based DSMs: Are they worth the effort?" Proceedings of the Proceedings of the 15th Annual Meeting of the European Association for Computational Linguistics (EACL 2017): Volume 2, Short Papers Valencia, Spain, 2017. 394-400.
URL: http://www.linguistik.fau.de/dsmeval/ - Evert, Stefan, et al. "E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification." Proceedings of the eLex 2017, Leiden Ed. Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B, Brno: Lexical Computing, 2017. 531–549.
URL: https://elex.link/elex2017/wp-content/uploads/2017/09/paper32.pdf - Evert, Stefan, Sebastian Wankerl, and Elmar Nöth. "Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch." Presented at Proceedings of the Corpus Linguistics 2017 Conference, Birmingham Birmingham, UK, 2017.
URL: http://purl.org/stefan.evert/PUB/EvertWankerlNoeth2017.pdf
2016
- Evert, Stefan, et al. "EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin Berlin, Germany, 2016. 44-56.
URL: https://sites.google.com/site/empirist2015/ - Wankerl, Sebastian, Elmar Nöth, and Stefan Evert. "An Analysis of Perplexity to Reveal the Effects of Alzheimer's Disease on Language." Proceedings of the ITG-Fachbericht 267: Speech Communication Paderborn, Germany, 2016. 254-259.
- Kabashi, Besim, and Thomas Proisl. "A Proposal for a Part-of-Speech Tagset for the Albanian Language." Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož Ed. Calzolari Nicoletta, Choukri Khalid, Declerck Thierry, Grobelnik Marko, Maegaard Bente, Mariani Joseph, Moreno Asuncion, Odijk Jan, Piperidis Stelios, Paris: European Language Resources Association (ELRA), 2016. 4305–4310.
URL: http://www.lrec-conf.org/proceedings/lrec2016/pdf/1066_Paper.pdf - Proisl, Thomas, and Peter Uhrig. "SoMaJo: State-of-the-art tokenization for German web and social media texts." Proceedings of the 10th Web as Corpus Workshop (WAC-X), Berlin Ed. Cook P, Evert S, Schäfer R, Stemle E, Berlin: Association for Computational Linguistics (ACL), 2016. 57-62.
URL: http://aclweb.org/anthology/W16-26 - Evert, Stefan. "CogALex-V Shared Task: Mach5 – A traditional DSM approach to semantic relatedness." Proceedings of the Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) Osaka, Japan, 2016. 92-97.
URL: http://www.collocations.de/data/#mach5 - Santus, Enrico, et al. "The CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations." Proceedings of the Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) Osaka, Japan, 2016. 69-79.
URL: https://sites.google.com/site/cogalex2016/home/shared-task - Evert, Stefan, et al. "„Delta“ in der stilometrischen Autorschaftsattribution." Präsentiert bei DHd 2016, Leipzig Leipzig: Nisaba, 2016.
URL: http://www.dhd2016.de/abstracts/sektionen-002.html
2015
- Evert, Stefan, and Antti Arppe. "Some theoretical and experimental observations on naïve discriminative learning." Proceedings of the Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6) Tübingen, Germany, 2015.
- Evert, Stefan, and Andrew Hardie. "Ziggurat: A new data model and indexing format for large annotated text corpora." Proceedings of the Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3) Lancaster, UK, 2015. 21--27.
- Evert, Stefan, et al. "Towards a better understanding of Burrows's Delta in literary authorship attribution." Proceedings of the Proceedings of the Fourth Workshop on Computational Linguistics for Literature Denver, CO, 2015. 79--88.
URL: http://www.aclweb.org/anthology/W15-0709 - Plotnikova, Nataliia, et al. "KLUEless: Polarity Classification and Association." Proceedings of the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) Denver, Colorado, 2015. 619--625.
URL: http://www.aclweb.org/anthology/S15-2103 - Plotnikova, Nataliia, et al. "SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching." Proceedings of the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) Denver, Colorado, 2015. 111--116.
URL: http://www.aclweb.org/anthology/S15-2020