Calculation of textual similarity using semantic relatedness function

Kairaldeen, Ammar Riadh

Calculation of textual similarity using semantic relatedness function

Kairaldeen, Ammar Riadh

Bağlantı: http://hdl.handle.net/20.500.12416/363

Tarih: 2015-12

Özet:

Finding the similarity between two sentences is an essential task in different fields such as natural language processing (NLP) and information retrieval (IR). Semantic relatedness similarity between two sentences is concerned with measuring how two sentences share the same meaning. Over the last decade, different methods for measuring sentence similarity have been proposed in the literature. Some methods use word semantic relatedness function in sentence similarity calculations. This thesis aims to compare these methods using four data sets selected from different fields, providing a testable of a various range of writing expressions to challenge the selected methods. Results show that the use of corpus-based word semantic similarity function has significantly outperformed that of WordNet-based word semantic similarity function in sentence similarity methods. Moreover, we propose a new sentence similarity measure method by extending an existing method in the literature called Overall similarity. Furthermore, the results show that the proposed method has significantly improved the performance of the Overall method. All the selected methods are tested and compared with other state-of-the-art methods.

İki cümle arasındaki benzerliklerin bulunması,(NLP) Doğal Dil İşleme ve (IR) Bilgi Alma gibi değişik alanlarda önemli bir görevdir. Semantik (Anlamsal) Benzerlik iki cümlenin nasıl aynı anlamları paylaştığının ölçülmesiyle ilgilidir. Son 10 yıl içerisinde, değişik cümle benzerlik ölçüm yöntemleri literatürde önerilmiştir. Bazı yöntemler cümle benzerlik hesaplamalarında Kelimenin Semantik Benzerliği işlevini kullanmaktadır. Bu tez, farklı alanlardan seçilen dört veri setini kullanarak seçilen yöntemlerle karşılaştırılabilecek test edilebilir ve çeşitli aralıklardaki yazım ifadelerini sağlamayı ve bu yöntemleri karşılaştırmayı amaçlar. Sonuçlar kelime benzerlik yöntemleri içerisinden Corpus-tabanlı Kelime Benzerlik işlevinin WordNet-tabanlı Kelime Semantik Benzerlik işlemine göre daha iyi bir performans çıkardığını gösterir. Buna ek olarak, literatürde mevcut olan Overall Similarity yöntemi genişletilerek yeni kelime benzerlik ölçüm yöntemi önerilmiştir. Ayrıca, sonuçları önerilen bu yeni yöntem, mevcut olan Overall Similarity yönteminin performansını arttırmıştır. Böylece seçilmiş tüm yöntemler test edilmiş ve diğer en son teknolojiler ile karşılaştırılmıştır.

Tüm öğe kaydını göster