Stance detection in Turkish dataset on Russia-Ukraine war

Fırat, Eray

dc.contributor.author	Fırat, Eray
dc.date.accessioned	2024-02-14T07:50:48Z
dc.date.available	2024-02-14T07:50:48Z
dc.date.issued	2023
dc.identifier.citation	Fırat, Eray (2023). Stance detection in Turkish dataset on Russia-Ukraine war / Rusya-Ukrayna savaşı hakkında Türkçe verisetinde duruş tespiti. Yayımlanmış yüksek lisans tezi. Ankara: Çankaya Üniversitesi, Fen Bilimleri Enstitüsü.	tr_TR
dc.identifier.uri	http://hdl.handle.net/20.500.12416/7212
dc.description.abstract	Sosyal medya son yıllarda çeşitli konulardaki kamuoyu görüşlerini anlamak için temel bir bilgi kaynağı haline gelmiştir. Bu nedenle, sosyal medyadan elde edilen verilerden otomatik bilgi çıkarmanın önemi artmıştır. Doğal dil işleme alt görevlerinden biri olan duruş tespiti de, otomatik bilgi çıkarımı için önemli bir konudur. Duruş tespiti, kullanıcının belirli bir konu, olay veya kişiye karşı tutumunu otomatik olarak belirler. Bu çalışmada, Rusya-Ukrayna Savaşı'na ilişkin sosyal medya kullanıcılarının duruşlarını tespit etmeye odaklanan Türkçe etiketlenmiş veri seti oluşturulmuş ve bu veri seti üzerinde çeşitli makine öğrenimi yöntemleri test edilmiştir. Bu çalışma için Twitter'dan toplanmış Türkçe metinler içinden Rusya ve Ukrayna olmak üzere iki hedefle etiketlenmiş 8215 tane metin-hedef çifti ile yeni bir veri seti oluşturulmuştur. Bu veri setine Destek Vektör Makineleri, Rastgele Orman, k-En Yakın Komşu, XGBoost, Uzun-Kısa Süreli Bellek (LSTM) ve Kapı Özyinelemeli Geçitler (GRU) modelleri GloVe ve Fastext kelime gömme yöntemi ile uygulanmıştır. Veri seti hedefler arasında dengesiz olduğu için, bu algoritmalarla eksik örnekleme ve aşırı örnekleme yöntemleri de kullanılmıştır. Destek Vektör Makineleri yöntemi ile, Rusya için 0.73 ve Ukrayna için 0.81 F1 puanıyla en iyi sonuçlaın alındığı görülmüştür. Bu sonuçlara ek olarak, LSTM ve GRU yöntemlerinden elde edilen sonuçlar Destek Vektör Makineleri algoritmasının sonuçlarına oldukça yakındır. Yeni oluşturulan bu Türkçe veri seti, duruş tespiti araştırma alanı için değerli bir kaynak olarak değerlendirilebilir ve gelecek çalışmalarda bu veri seti ile transformer tabanlı yaklaşımlar kullanılabilir. Genel olarak, bu çalışma Türkçe metin kullanarak duruş tespiti araştırma alanını katkıda bulunmaktadır.	tr_TR
dc.description.abstract	Social media has evolved into a crucial informational resource to understand public opinion on various issues in recent years. Therefore, the importance of automatic information extraction from these data has increased. Stance detection, one of the subtasks of natural language processing, is also a crucial issue for automatic information extraction. Stance detection automatically determines the user's side regarding a particular subject, event, or person. In this study, a Turkish-labelled data set focusing on the stance determination task to determine social media users' attitudes towards the Russia-Ukraine War was created, and various machine learning methods were evaluated on this data set. For this study, 8215 tweets were collected on Twitter and cleaned. The dataset then was tagged with two targets Russia, and Ukraine. Support Vector Machines, Random Forest, k-Nearest Neighbour, XGBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) models are employed with GloVe and Fastext word embedding. Since the dataset is unbalanced between the targets, undersampling and oversampling methods were also used with these algorithms. With an F1 score of 0.73 for Russia and 0.81 for Ukraine, the results showed the Support Vector Machines algorithm to produce the best outcomes. In addition to these results, LSTM and GRU also produced outcomes that were highly comparable to those of the Support Vector Machines algorithm. The newly created Turkish corpus can be regarded as a valuable resource for this research area and in the future, transformer-based approach can be used with this corpus. Therefore, this study advances the field of stance detection research using Turkish text.	tr_TR
dc.language.iso	eng	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Stance Detection	tr_TR
dc.subject	Natural Language Processing	tr_TR
dc.subject	Turkish Dataset	tr_TR
dc.subject	Duruş Tespiti	tr_TR
dc.subject	Doğal Dil İşleme	tr_TR
dc.subject	Türkçe Veri Seti	tr_TR
dc.title	Stance detection in Turkish dataset on Russia-Ukraine war	tr_TR
dc.title.alternative	Rusya-Ukrayna savaşı hakkında Türkçe verisetinde duruş tespiti	tr_TR
dc.type	masterThesis	tr_TR
dc.identifier.startpage	1	tr_TR
dc.identifier.endpage	50	tr_TR
dc.contributor.department	Çankaya Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Bölümü	tr_TR