Abstract:
The problem of quantifying semantic relatedness level of two words is a fundamental sub-task for many natural language processing systems. While there is a large body of research on measuring semantic relatedness in the English language, the literature lacks detailed analysis for these methods in agglutinative languages. In this research, two new evaluation resources for the Turkish language are constructed. An extensive set of experiments involving multiple tasks: word association, semantic categorization, and automatic WordNet relationship discovery are performed to evaluate different semantic relatedness measures in the Turkish language. As Turkish is an agglutinative language, the morphological processing component is important for distributional similarity algorithms. For languages with rich morphological variations and productivity, methods ranging from simple stemming strategies to morphological disambiguation exists. In our experiments, different morphological processing methods for the Turkish language are investigated. © Springer International Publishing AG, part of Springer Nature 2018.