Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset
Institute for Language and Folklore, Språkrådet.ORCID iD: 0000-0001-6573-4636
Institute for Language and Folklore, Språkrådet.ORCID iD: 0000-0001-6949-6380
Institute for Language and Folklore, Språkrådet.
2021 (English)In: Selected Papers from the CLARIN Annual Conference 2020, 2021Conference paper, Published paper (Refereed)
Abstract [en]

We here demonstrate how a set of tools that are being maintained and further developed within the Språkbanken Sam and SWE-CLARIN infrastructures can be employed for creating manually labelled training data in a low-resource setting. As example text, we used the “COVID-19 Open Research Dataset”, and created manually annotated training data for its associated Kaggle task,“What do we know about COVID-19 risk factors?”. We first used our topic modelling tool to i) select a text set for manual annotation, ii) classify the texts into preliminary classification categories, and iii) analyse the texts in search for potential refinements of the annotation categories. We then annotated the text set on a more granular level by labelling the token sequences that indicated the existence of the refined categories in the text. Finally, we used the granularly annotated text set as a seed set, and applied our active learning tool for actively selecting additional texts for annotation. For the token-sequence annotations, we used our text annotation tool, which includes support for incorporating automatic pre-annotations.

Place, publisher, year, edition, pages
2021.
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:sprakochfolkminnen:diva-2075OAI: oai:DiVA.org:sprakochfolkminnen-2075DiVA, id: diva2:1604777
Conference
CLARIN Annual Conference 2020
Funder
Swedish Research Council, 2017-00626Available from: 2021-10-21 Created: 2021-10-21 Last updated: 2023-12-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

https://ecp.ep.liu.se/index.php/clarin/article/view/23/23

Authority records

Skeppstedt, MariaAhltorp, MagnusEriksson, GunnarDomeij, Rickard

Search in DiVA

By author/editor
Skeppstedt, MariaAhltorp, MagnusEriksson, GunnarDomeij, Rickard
By organisation
Språkrådet
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 102 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf