Ändra sökning
Avgränsa sökresultatet
1 - 11 av 11
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Ahltorp, Magnus
    et al.
    Institutet för språk och folkminnen, Språkrådet.
    Dürlich, Luise
    Uppsala universitet.
    Skeppstedt, Maria
    Textual Contexts for "Democracy": Using Topic- and Word-Models for Exploring Swedish Government Official Reports2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    We here demonstrate how two types of NLP models - a topic model and a word2vec model - can be combined for exploring the content of a collection of Swedish Government Reports. We investigate if there are topics that frequently occur in paragraphs mentioning the word "democracy". Using the word2vec model, 530 clusters of semantically similar words were created, which were then applied in the pre-processing step when creating a topic model. This model detected 15 reoccurring topics among the paragraphs containing "democracy". Among these topics, 13 had closely associated paragraphs with a coherent content relating to some aspect of democracy.

    Ladda ner fulltext (pdf)
    fulltext
  • 2.
    Ahltorp, Magnus
    et al.
    Institutet för språk och folkminnen, Språkrådet.
    Hessel, Jean
    Institutet för språk och folkminnen, Språkrådet.
    Eriksson, Gunnar
    Institutet för språk och folkminnen, Språkrådet.
    Skeppstedt, Maria
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    A Digital Swedish–Yiddish/Yiddish–Swedish Dictionary: A Web-Based Dictionary that is also Available Offline2022Ingår i: Proceedings of the EURALI Workshop @LREC2022, 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Yiddish is one of the national minority languages of Sweden, and one of the languages for which the Swedish Institute for Language and Folklore is responsible for developing useful language resources. We here describe the web-based version of a Swedish–Yiddish/Yiddish–Swedish dictionary. The single search field of the web-based dictionary is used for incrementally searching all three components of the dictionary entries (the word in Swedish, the word in Yiddish with Hebrew characters and the transliteration in Latin script). When the user accesses the dictionary in an online mode, the dictionary is saved in the web browser, which makes it possible to also use the dictionary offline.

  • 3.
    Marie, Mattson
    et al.
    Institutet för språk och folkminnen, Språkrådet.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Lexikografiska resursers betydelse i utvecklingen av språkteknologiska verktyg för minoritetsspråk2023Ingår i: LexicoNordica, ISSN 0805-2735, E-ISSN 1891-2206, Vol. 30, s. 75-94Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The Swedish Language Act states that the public sector is responsible for protecting the Swedish national minority languages. Since these languages typically have few speakers, commercial actors do not create the language technology necessary in an increasingly digitalised society. One way for the public sector to facilitate creation of such technology is by making lexicographic resources available to the public. In this article, we present a survey of existing resources and guidelines for new resources, such as the early inclusion of a computational linguistic perspective.

    Ladda ner fulltext (pdf)
    fulltext
  • 4. Skeppstedt, Maria
    et al.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Andreas, Kerren
    Department of Computer Science and Media Technology, Linnaeus University.
    Rzepka, Rafal
    Graduate School of Information Science and Technology, Hokkaido University.
    Araki, Kenji
    Graduate School of Information Science and Technology, Hokkaido University.
    Application of a Topic Model Visualisation Tool to a Second Language2019Ingår i: Book of Abstracts of the CLARIN Annual Conference, 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    We explored adaptions required for applying a topic modelling tool to a language that is very different from the one for which the tool was originally developed. The tool, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used for indicating word boundaries in Japanese, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation, before the texts could be imported into the tool. The tool was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese.

  • 5. Skeppstedt, Maria
    et al.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    Eriksson, Gunnar
    Institutet för språk och folkminnen, Språkrådet.
    Öqvist, Jenny
    Institutet för språk och folkminnen, Avdelningen för arkiv och forskning i Uppsala (AFU).
    Mining for Recurring Themes in Speech Recording Descriptions2021Konferensbidrag (Refereegranskat)
    Ladda ner fulltext (pdf)
    Mining for Recurring Themes in Speech Recording Descriptions 2021
  • 6. Skeppstedt, Maria
    et al.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Eriksson, Gunnar
    Institutet för språk och folkminnen, Språkrådet.
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset2021Ingår i: Selected Papers from the CLARIN Annual Conference 2020, 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    We here demonstrate how a set of tools that are being maintained and further developed within the Språkbanken Sam and SWE-CLARIN infrastructures can be employed for creating manually labelled training data in a low-resource setting. As example text, we used the “COVID-19 Open Research Dataset”, and created manually annotated training data for its associated Kaggle task,“What do we know about COVID-19 risk factors?”. We first used our topic modelling tool to i) select a text set for manual annotation, ii) classify the texts into preliminary classification categories, and iii) analyse the texts in search for potential refinements of the annotation categories. We then annotated the text set on a more granular level by labelling the token sequences that indicated the existence of the refined categories in the text. Finally, we used the granularly annotated text set as a seed set, and applied our active learning tool for actively selecting additional texts for annotation. For the token-sequence annotations, we used our text annotation tool, which includes support for incorporating automatic pre-annotations.

  • 7.
    Skeppstedt, Maria
    et al.
    Institutet för språk och folkminnen.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Eriksson, Gunnar
    Institutet för språk och folkminnen, Språkrådet.
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    Annotating risk factor mentions in the COVID-19 Open Research Dataset2020Ingår i: Proceedings of CLARIN Annual Conference 2020 / [ed] Costanza Navarretta and Maria Eskevich, 2020, s. 52-55Konferensbidrag (Refereegranskat)
    Abstract [en]

    We here describe the creation of manually annotated training data for the Kaggle task “What do we know about COVID-19 risk factors?”. We applied our text mining tool on the “COVID-19 Open Research Dataset” to i) select data for manual annotation, ii) classify the data into initially established classification categories, and iii) analyse our data set in search for potential refinements of the annotation categories. The process resulted in a corpus consisting of 50,000 tokens, for which each token is annotated as to whether it is part of an expression that functions as a “risk factor trigger”. Two types of risk factor triggers were annotated, those indicating that the text describes a risk factor, and those indicating that something could not be shown to be a risk factor.

  • 8.
    Skeppstedt, Maria
    et al.
    Institutet för språk och folkminnen.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Eriksson, Gunnar
    Institutet för språk och folkminnen, Språkrådet.
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    Line-a-line: A Tool for Annotating Word-Alignment2020Ingår i: Proceedings of the 13th Workshop on Building and Using Comparable Corpora / [ed] Reinhard Rapp, Pierre Zweigenbaum och Serge Sharoff, 2020, s. 1-5Konferensbidrag (Refereegranskat)
    Abstract [en]

    We here describe line-a-line, a web-based tool for manual annotation of word-alignments in sentence-aligned parallel corpora. The graphical user interface, which builds on a design template from the Jigsaw system for investigative analysis, displays the words from each sentence pair that is to be annotated as elements in two vertical lists. An alignment between two words is annotated by drag-and-drop, i.e. by dragging an element from the left-hand list and dropping it on an element in the right-hand list. The tool indicates that two words are aligned by lines that connect them and by highlighting associated words when the mouse is hovered over them. Line-a-line uses the efmaral library for producing pre-annotated alignments, on which the user can base the manual annotation. The tool is mainly planned to be used on moderately under-resourced languages, for which resources in the form of parallel corpora are scarce. The automatic word-alignment functionality therefore also incorporates information derived from non-parallel resources, in the form of pre-trained multilingual word embeddings from the MUSE library.

  • 9.
    Skeppstedt, Maria
    et al.
    Institutet för språk och folkminnen.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Kucher, Kostiantyn
    Kerren, Andreas
    Rzepka, Rafal
    Araki, Kenji
    Topic modelling applied to a second language: A language adaption and tool evaluation study2020Ingår i: Selected Papers from the CLARIN Annual Conference 2019 / [ed] Kiril Simov and Maria Eskevich, 2020, s. 145-156Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was originally developed. To apply Topics2Themes to Japanese texts, in which white space is not used for indicating word boundaries, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation. Topics2Themes was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese. To evaluate the adaptation to a second language, as well as the reading support, we applied the tool to a corpus consisting of short Japanese texts. Twelve different topics were automatically identified, and a total of 183 texts representative for the twelve topics were extracted. A learner of Japanese carried out a manual analysis of these representative texts, and identified 35 reoccurring, fine-grained themes.

  • 10.
    Skeppstedt, Maria
    et al.
    Centre for Digital Humanities and Social Sciences Uppsala, Department of ALM, Uppsala University, Uppsala, Sweden.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet. Language Council of Sweden, Institute for Language and Folklore, Stockholm, Sweden.
    Kucher, Kostiantyn
    Department of Science and Technology, Linköping University, Norrköping, Sweden.
    Lindström, Matts
    Centre for Digital Humanities and Social Sciences Uppsala, Department of ALM, Uppsala University, Uppsala, Sweden.
    From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts2024Ingår i: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Word Rain is a development of the classic word cloud. It addresses some of the limitations of word clouds, in particular the lack of a semantically motivated positioning of the words, and the use of font size as a sole indicator of word prominence. Word Rain uses the semantic information encoded in a distributional semantics-based language model – reduced into one dimension – to position the words along the x-axis. Thereby, the horizontal positioning of the words reflects semantic similarity. Font size is still used to signal word prominence, but this signal is supplemented with a bar chart, as well as with the position of the words on the y-axis. We exemplify the use of Word Rain by three concrete visualization tasks, applied on different real-world texts and document collections on climate change. In these case studies, word2vec models, reduced to one dimension with t-SNE, are used to encode semantic similarity, and TF-IDF is used for measuring word prominence. We evaluate the technique further by carrying out domain expert reviews.

  • 11. Skeppstedt, Maria
    et al.
    Mattson, Marie
    Institutet för språk och folkminnen, Språkrådet.
    Ahltorp, Magnus
    Institutet för språk och folkminnen, Språkrådet.
    Domeij, Rickard
    Institutet för språk och folkminnen, Språkrådet.
    Converting from the Nordic Terminological Record Format to the TBX Format2022Ingår i: Proceedings of the TERM21 Workshop, Language Resources and Evaluation Conference (LREC 2022), 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Rikstermbanken (Sweden’s National Term Bank), which was launched in 2009, uses the Nordic Terminological Record Format (NTRF) for organising its terminological data. Since then, new terminology formats have been established as standards, e.g., the Termbase eXchange format (TBX). We here describe work carried out by the Institute for Language and Folklore within the Federated eTranslation TermBank Network Action. This network develops a technical infrastructure for facilitating sharing of terminology resources throughout Europe. To be able to share some of the term collections of Rikstermbanken within this network and export them to Eurotermbank, we have implemented a conversion from the Nordic Terminological Record Format, as used in Rikstermbanken, to the TBX format.

1 - 11 av 11
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf