Nasution, Arbi Haza and Murakami, Yohei (2019) Visualizing Language Lexical Similarity Clusters: A Case Study of Indonesian Ethnic Languages. Journal of Data Science and Its Applications (JDSA), 2 (2). pp. 45-59. ISSN 2614-7408
Text
J8.pdf - Published Version Download (1MB) |
Abstract
Language similarity clusters are useful for computational linguistic researches that rely on language similarity or cognate recognition. The existing language similarity clustering approach which utilizes hierarchical clustering and k-means clustering has difficulty in creating clusters with a middle range of language similarity. Moreover, it lacks an interactive visualization that user can explore. To address these issues, we formalize a graph-based approach of creating and visualizing language lexical similarity clusters by utilizing ASJP database to generate the language similarity matrix, then formalize the data as an undirected graph. To create the clusters, we apply a connected components algorithm with a threshold of language similarity range. Our interactive online tool allows a user to dynamically create new clusters by changing the threshold of language similarity range and explore the data based on language similarity range and number of speakers. We provide an implementation example of our approach to 119 Indonesian ethnic languages. The experiment result shows that for the case of low system execution burden, the system performance was quite stable. For the case of high system execution burden, despite the fluctuated performance, the response times were still below 25 seconds, which is considered acceptable.
Item Type: | Article |
---|---|
Subjects: | T Technology > T Technology (General) |
Divisions: | > Teknik Informatika |
Depositing User: | Monika Winda Monika |
Date Deposited: | 28 Mar 2023 07:03 |
Last Modified: | 28 Mar 2023 07:03 |
URI: | http://repository.uir.ac.id/id/eprint/21150 |
Actions (login required)
View Item |