Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Topic modelling applied to a second language: A language adaption and tool evaluation study
Institute for Language and Folklore.
Institute for Language and Folklore, Språkrådet.ORCID iD: 0000-0001-6573-4636
Show others and affiliations
2020 (English)In: Selected Papers from the CLARIN Annual Conference 2019 / [ed] Kiril Simov and Maria Eskevich, 2020, p. 145-156Conference paper, Published paper (Refereed)
Abstract [en]

The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was originally developed. To apply Topics2Themes to Japanese texts, in which white space is not used for indicating word boundaries, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation. Topics2Themes was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese. To evaluate the adaptation to a second language, as well as the reading support, we applied the tool to a corpus consisting of short Japanese texts. Twelve different topics were automatically identified, and a total of 183 texts representative for the twelve topics were extracted. A learner of Japanese carried out a manual analysis of these representative texts, and identified 35 reoccurring, fine-grained themes.

Place, publisher, year, edition, pages
2020. p. 145-156
National Category
Languages and Literature
Research subject
Language Technology
Identifiers
URN: urn:nbn:se:sprakochfolkminnen:diva-1810OAI: oai:DiVA.org:sprakochfolkminnen-1810DiVA, id: diva2:1508599
Conference
CLARIN Annual Conference
Available from: 2020-12-10 Created: 2020-12-10 Last updated: 2023-12-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Topic Modelling Applied to a Second Language: A Language Adaptation and Tool Evaluation Study

Authority records

Skeppstedt, MariaAhltorp, Magnus

Search in DiVA

By author/editor
Skeppstedt, MariaAhltorp, Magnus
By organisation
Institute for Language and FolkloreSpråkrådet
Languages and Literature

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 110 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf