Similarity Cluster of Indonesian Ethnic Languages

Nasution, Arbi Haza and Murakami, Yohei and Ishida, Toru (2017) Similarity Cluster of Indonesian Ethnic Languages. In: International Conference on Science Engineering and Technology (ICoSET) and International Conference on Social Economic Education and Humaniora (ICoSEEH), 08 - 10 November 2017, Pekanbaru, Indonesia. (In Press)


Download (1MB) | Preview


Lexicostatistic and language similarity clusters are useful for computational linguistic researches that depends on language similarity or cognate recognition. Nevertheless, there are no published lexicostatistic/language similarity cluster of Indonesian ethnic languages available. We formulate an approach of creating language similarity clusters by utilizing ASJP database to generate the language similarity matrix, then generate the hierarchical clusters with complete linkage and mean linkage clustering, and further extract two stable clusters with high language similarities. We introduced an extended k-means clustering semi-supervised learning to evaluate the stability level of the hierarchical stable clusters being grouped together despite of changing the number of cluster. The higher the number of the trial, the more likely we can distinctly find the two hierarchical stable clusters in the generated k-clusters. However, for all five experiments, the stability level of the two hierarchical stable clusters is the highest on 5 clusters. Therefore, we take the 5 clusters as the best clusters of Indonesian ethnic languages. Finally, we plot the generated 5 clusters to a geographical map.

Item Type: Conference or Workshop Item (Paper)
Subjects: T Technology > T Technology (General)
Divisions: > Teknik Informatika
> Teknik Informatika
Depositing User: Admin Adm PerpusUIR
Date Deposited: 31 Oct 2019 03:47
Last Modified: 31 Oct 2019 03:47

Actions (login required)

View Item View Item