Change search
Link to record
Permanent link

Direct link
Domeij, Rickard
Publications (10 of 28) Show all publications
Ahltorp, M., Hessel, J., Eriksson, G., Skeppstedt, M. & Domeij, R. (2022). A Digital Swedish–Yiddish/Yiddish–Swedish Dictionary: A Web-Based Dictionary that is also Available Offline. In: Proceedings of the EURALI Workshop @LREC2022: . Paper presented at LREC 2022.
Open this publication in new window or tab >>A Digital Swedish–Yiddish/Yiddish–Swedish Dictionary: A Web-Based Dictionary that is also Available Offline
Show others...
2022 (English)In: Proceedings of the EURALI Workshop @LREC2022, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Yiddish is one of the national minority languages of Sweden, and one of the languages for which the Swedish Institute for Language and Folklore is responsible for developing useful language resources. We here describe the web-based version of a Swedish–Yiddish/Yiddish–Swedish dictionary. The single search field of the web-based dictionary is used for incrementally searching all three components of the dictionary entries (the word in Swedish, the word in Yiddish with Hebrew characters and the transliteration in Latin script). When the user accesses the dictionary in an online mode, the dictionary is saved in the web browser, which makes it possible to also use the dictionary offline.

National Category
Specific Languages
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2471 (URN)
Conference
LREC 2022
Funder
Swedish Research Council, 2017-00626
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2023-12-01Bibliographically approved
Skeppstedt, M., Mattson, M., Ahltorp, M. & Domeij, R. (2022). Converting from the Nordic Terminological Record Format to the TBX Format. In: Proceedings of the TERM21 Workshop, Language Resources and Evaluation Conference (LREC 2022): . Paper presented at Language Resources and Evaluation Conference (LREC 2022).
Open this publication in new window or tab >>Converting from the Nordic Terminological Record Format to the TBX Format
2022 (English)In: Proceedings of the TERM21 Workshop, Language Resources and Evaluation Conference (LREC 2022), 2022Conference paper, Published paper (Refereed)
Abstract [en]

Rikstermbanken (Sweden’s National Term Bank), which was launched in 2009, uses the Nordic Terminological Record Format (NTRF) for organising its terminological data. Since then, new terminology formats have been established as standards, e.g., the Termbase eXchange format (TBX). We here describe work carried out by the Institute for Language and Folklore within the Federated eTranslation TermBank Network Action. This network develops a technical infrastructure for facilitating sharing of terminology resources throughout Europe. To be able to share some of the term collections of Rikstermbanken within this network and export them to Eurotermbank, we have implemented a conversion from the Nordic Terminological Record Format, as used in Rikstermbanken, to the TBX format.

National Category
Languages and Literature
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2472 (URN)
Conference
Language Resources and Evaluation Conference (LREC 2022)
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2023-12-01Bibliographically approved
Skeppstedt, M., Domeij, R., Eriksson, G. & Öqvist, J. (2022). Digital humanities for the spreadsheet nerd: Presenting the output of a topic modelling tool as tabular data. In: DHNB 2022 Conference: Book of Abstracts. Paper presented at Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022).
Open this publication in new window or tab >>Digital humanities for the spreadsheet nerd: Presenting the output of a topic modelling tool as tabular data
2022 (English)In: DHNB 2022 Conference: Book of Abstracts, 2022Conference paper, Oral presentation with published abstract (Refereed)
National Category
Other Humanities not elsewhere specified
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2470 (URN)
Conference
Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022)
Projects
Tilltal
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2022-07-29Bibliographically approved
Skeppstedt, M., Ahltorp, M., Eriksson, G. & Domeij, R. (2021). A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset. In: Selected Papers from the CLARIN Annual Conference 2020: . Paper presented at CLARIN Annual Conference 2020.
Open this publication in new window or tab >>A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset
2021 (English)In: Selected Papers from the CLARIN Annual Conference 2020, 2021Conference paper, Published paper (Refereed)
Abstract [en]

We here demonstrate how a set of tools that are being maintained and further developed within the Språkbanken Sam and SWE-CLARIN infrastructures can be employed for creating manually labelled training data in a low-resource setting. As example text, we used the “COVID-19 Open Research Dataset”, and created manually annotated training data for its associated Kaggle task,“What do we know about COVID-19 risk factors?”. We first used our topic modelling tool to i) select a text set for manual annotation, ii) classify the texts into preliminary classification categories, and iii) analyse the texts in search for potential refinements of the annotation categories. We then annotated the text set on a more granular level by labelling the token sequences that indicated the existence of the refined categories in the text. Finally, we used the granularly annotated text set as a seed set, and applied our active learning tool for actively selecting additional texts for annotation. For the token-sequence annotations, we used our text annotation tool, which includes support for incorporating automatic pre-annotations.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2075 (URN)
Conference
CLARIN Annual Conference 2020
Funder
Swedish Research Council, 2017-00626
Available from: 2021-10-21 Created: 2021-10-21 Last updated: 2023-12-01Bibliographically approved
Skeppstedt, M., Ahltorp, M., Domeij, R., Eriksson, G. & Öqvist, J. (2021). Mining for Recurring Themes in Speech Recording Descriptions. In: : . Paper presented at The 9th Swedish Workshop on Data Science.
Open this publication in new window or tab >>Mining for Recurring Themes in Speech Recording Descriptions
Show others...
2021 (English)Conference paper, Poster (with or without abstract) (Refereed)
National Category
Language Technology (Computational Linguistics)
Research subject
Language Technology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2217 (URN)
Conference
The 9th Swedish Workshop on Data Science
Projects
TilltalNationella språkbanken
Funder
Riksbankens Jubileumsfond, SAF16-0917:1
Available from: 2021-12-12 Created: 2021-12-12 Last updated: 2023-12-01Bibliographically approved
Skeppstedt, M., Domeij, R. & Skott, F. (2021). Snippets of Folk Legends: Adapting a Text Mining Tool to a Collection of Folk Legends. In: Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020): . Paper presented at 5th Conference Digital Humanities in the Nordic Countries (DHN 2020).
Open this publication in new window or tab >>Snippets of Folk Legends: Adapting a Text Mining Tool to a Collection of Folk Legends
2021 (English)In: Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), 2021Conference paper, Published paper (Refereed)
Abstract [en]

A topic modelling tool was adapted to requirements for a collection of Swedish folk legends. To offer an overview of a list of folk legend texts, which had been automatically extracted by the topic modelling tool, snippet text versions of the folk legends were displayed. The snippets were constructed from the full-text versions of the legends using the sentences most relevant to the topics extracted by the topic modelling algorithm. In addition, collection-adapted data was constructed for performing a pre-processing of the folk legend texts, before they were submitted to the topic modelling algorithm. This data consisted of a collection-adapted stop word list and word lists for improving the quality of clusters of semantically similar words.

National Category
Language Technology (Computational Linguistics)
Research subject
Language Technology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-2074 (URN)
Conference
5th Conference Digital Humanities in the Nordic Countries (DHN 2020)
Projects
Nationella språkbanken
Funder
Swedish Research Council, 2017-00626
Available from: 2021-10-21 Created: 2021-10-21 Last updated: 2021-12-29Bibliographically approved
Skeppstedt, M., Domeij, R. & Skott, F. (2020). Adapting a Topic Modelling Tool to the Task of Finding Recurring Themes in Folk Legends. In: Reinsone et al. (Ed.), Proceedings of the Digital Humanities in the Nordic Countries 5th Conference (DHN 2020): . Paper presented at Digital Humanities in the Nordic Countries 5th Conference (DHN 2020) (pp. 388-392).
Open this publication in new window or tab >>Adapting a Topic Modelling Tool to the Task of Finding Recurring Themes in Folk Legends
2020 (English)In: Proceedings of the Digital Humanities in the Nordic Countries 5th Conference (DHN 2020) / [ed] Reinsone et al., 2020, p. 388-392Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

A topic modelling tool, which was originally developed for performing text analysis on very short texts written in English, was adapted to the text genre of Swedish folk legends. The topic modelling tool was configured to use a word space model trained on a Swedish corpus, as well as a Swedish stop word list. The stop word list consisted of standard Swedish stop words, as well as 380 additional stop words that were tailored to the content of the corpus and therefore also included older spelling versions and grammatical forms of Swedish words. The adapted version of the tool was applied on a corpus consisting of around 10,000 Swedish folk legends, which resulted in the automatic extraction of 20 topics. Future versions of the tool will be extended with text summarisation func- tionality, in order to retain the text overview provided by the tool also when it is applied on longer folk legends.

National Category
Languages and Literature
Research subject
Language Technology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-1813 (URN)
Conference
Digital Humanities in the Nordic Countries 5th Conference (DHN 2020)
Available from: 2020-12-10 Created: 2020-12-10 Last updated: 2021-12-29Bibliographically approved
Skeppstedt, M., Ahltorp, M., Eriksson, G. & Domeij, R. (2020). Annotating risk factor mentions in the COVID-19 Open Research Dataset. In: Costanza Navarretta and Maria Eskevich (Ed.), Proceedings of CLARIN Annual Conference 2020: . Paper presented at CLARIN Annual Conference (pp. 52-55).
Open this publication in new window or tab >>Annotating risk factor mentions in the COVID-19 Open Research Dataset
2020 (English)In: Proceedings of CLARIN Annual Conference 2020 / [ed] Costanza Navarretta and Maria Eskevich, 2020, p. 52-55Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

We here describe the creation of manually annotated training data for the Kaggle task “What do we know about COVID-19 risk factors?”. We applied our text mining tool on the “COVID-19 Open Research Dataset” to i) select data for manual annotation, ii) classify the data into initially established classification categories, and iii) analyse our data set in search for potential refinements of the annotation categories. The process resulted in a corpus consisting of 50,000 tokens, for which each token is annotated as to whether it is part of an expression that functions as a “risk factor trigger”. Two types of risk factor triggers were annotated, those indicating that the text describes a risk factor, and those indicating that something could not be shown to be a risk factor.

National Category
Languages and Literature
Research subject
Language Technology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-1817 (URN)
Conference
CLARIN Annual Conference
Available from: 2020-12-17 Created: 2020-12-17 Last updated: 2023-12-01Bibliographically approved
Domeij, R., Edlund, J., Eriksson, G., Fallgren, P., David, H., Lindström, E., . . . Öqvist, J. (2020). Exploring the archives for textual entry points to speech: Experiences of interdisciplinary collaboration in making cultural heritage accessible for research. In: Steven Krauwer & Darja Fišer (Ed.), Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020: . Paper presented at DHN 2020 (pp. 45-55). Riga, 2717
Open this publication in new window or tab >>Exploring the archives for textual entry points to speech: Experiences of interdisciplinary collaboration in making cultural heritage accessible for research
Show others...
2020 (English)In: Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020 / [ed] Steven Krauwer & Darja Fišer, Riga, 2020, Vol. 2717, p. 45-55Conference paper, Published paper (Other academic)
Abstract [en]

Tilltal (Tillgängligt kulturarv för forskning i tal, ‘Accessible cultural heritage for speech research’) is a multidisciplinary and methodological project undertaken by the Institute of Language and Folklore, KTH Royal Institute of Technology, and The Swedish National Archives in cooperation with the National Language Bank and SWE-CLARIN [1]. It aims to provide researchers better access to archival audio recordings using methods from language technology. The project comprises three case studies and one activity and usage study. In the case studies, actual research agendas from three different fields (ethnology, sociolinguistics, and interaction analysis) serve as a basis for identifying procedures that may be simplified with the aid of digital tools. In the activity and usage study, we are applying an activity-theoretical approach with the aim of involving researchers and investigating how they use – and would like to be able to use – the archival resources at ISOF. Involving researchers in participatory design ensures that digital solutions are suggested and evaluated in relation to the requirements expressed by researchers engaged in specific research tasks[2].In this paper, we focus on one of the case studies, which investigates the process by which personal experience narratives are transformed into cultural heritage [3], and account for our results in exploring how different types of text material from the archives can be used to find relevant sections of the audio recordings. Finally, we discuss what lessons can be learned, and what conclusions can be drawn, from our experiences of interdisciplinary collaboration in the project.

Place, publisher, year, edition, pages
Riga: , 2020
Series
CEUR Workshop Proceedings, ISSN 1613-0073
National Category
Humanities and the Arts Engineering and Technology
Research subject
Language Technology; Folklore; Dialectology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-1816 (URN)
Conference
DHN 2020
Funder
Riksbankens Jubileumsfond, SAF16-0917:1
Available from: 2020-12-17 Created: 2020-12-17 Last updated: 2022-06-07Bibliographically approved
Skeppstedt, M., Ahltorp, M., Eriksson, G. & Domeij, R. (2020). Line-a-line: A Tool for Annotating Word-Alignment. In: Reinhard Rapp, Pierre Zweigenbaum och Serge Sharoff (Ed.), Proceedings of the 13th Workshop on Building and Using Comparable Corpora: . Paper presented at 13th Workshop on Building and Using Comparable Corpora, LREC (pp. 1-5).
Open this publication in new window or tab >>Line-a-line: A Tool for Annotating Word-Alignment
2020 (English)In: Proceedings of the 13th Workshop on Building and Using Comparable Corpora / [ed] Reinhard Rapp, Pierre Zweigenbaum och Serge Sharoff, 2020, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

We here describe line-a-line, a web-based tool for manual annotation of word-alignments in sentence-aligned parallel corpora. The graphical user interface, which builds on a design template from the Jigsaw system for investigative analysis, displays the words from each sentence pair that is to be annotated as elements in two vertical lists. An alignment between two words is annotated by drag-and-drop, i.e. by dragging an element from the left-hand list and dropping it on an element in the right-hand list. The tool indicates that two words are aligned by lines that connect them and by highlighting associated words when the mouse is hovered over them. Line-a-line uses the efmaral library for producing pre-annotated alignments, on which the user can base the manual annotation. The tool is mainly planned to be used on moderately under-resourced languages, for which resources in the form of parallel corpora are scarce. The automatic word-alignment functionality therefore also incorporates information derived from non-parallel resources, in the form of pre-trained multilingual word embeddings from the MUSE library.

National Category
Languages and Literature
Research subject
Language Technology
Identifiers
urn:nbn:se:sprakochfolkminnen:diva-1812 (URN)
Conference
13th Workshop on Building and Using Comparable Corpora, LREC
Available from: 2020-12-10 Created: 2020-12-10 Last updated: 2023-12-01Bibliographically approved
Organisations

Search in DiVA

Show all publications