Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text

Arslan, Serdar

DSpace Ana Sayfası
→
Mühendislik Fakültesi
→
Bilgisayar Mühendisliği Bölümü
→
Bilgisayar Mühendisliği Bölümü Yayın Koleksiyonu
→
Öğe Göster

dc.contributor.author	Arslan, Serdar
dc.date.accessioned	2024-05-28T13:28:20Z
dc.date.available	2024-05-28T13:28:20Z
dc.date.issued	2024-05
dc.identifier.citation	Arslan, Serdar (2024). "Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text", Neural Computing and Applications, Vol. 36, No. 15, pp. 8371-8382.	tr_TR
dc.identifier.issn	0941-0643
dc.identifier.uri	http://hdl.handle.net/20.500.12416/8424
dc.description.abstract	Named entity recognition (NER) plays a pivotal role in Natural Language Processing by identifying and classifying entities within textual data. While NER methodologies have seen significant advancements, driven by pretrained word embeddings and deep neural networks, the majority of these studies have focused on text with well-defined grammar and structure. A significant research gap exists concerning NER in informal or unstructured text, where traditional grammar rules and sentence structure are absent. This research addresses this crucial gap by focusing on the detection of product names within unstructured Turkish text. To accomplish this, we propose a deep learning-based NER model which combines a Bidirectional Long Short-Term Memory (BiLSTM) architecture with a Conditional Random Field (CRF) layer, further enhanced by FastText embeddings. To comprehensively evaluate and compare our model’s performance, we explore different embedding approaches, including Word2Vec and Glove, in conjunction with the Bidirectional Long Short-Term Memory and Conditional Random Field (BiLSTM-CRF) model. Furthermore, we conduct comparisons against BERT to assess the efficacy of our approach. Our experimentation utilizes a Turkish e-commerce dataset gathered from the internet, where traditional grammatical and structural rules may not apply. The BiLSTM-CRF model with FastText embeddings achieved an F1 score value of 57.40%, a precision value of 55.78%, and a recall value of 59.12%. These results indicate promising performance in outperforming other baseline techniques. This research contributes to the field of NER by addressing the unique challenges posed by unstructured Turkish text and opens avenues for improved entity recognition in informal language settings, with potential applications across various domains.	tr_TR
dc.language.iso	eng	tr_TR
dc.relation.isversionof	10.1007/s00521-024-09532-1	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	BERT	tr_TR
dc.subject	Bilstm-CRF	tr_TR
dc.subject	Deep Learning	tr_TR
dc.subject	Fasttext	tr_TR
dc.subject	Named Entity Recognition	tr_TR
dc.title	Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text	tr_TR
dc.type	article	tr_TR
dc.relation.journal	Neural Computing and Applications	tr_TR
dc.contributor.authorID	325411	tr_TR
dc.identifier.volume	36	tr_TR
dc.identifier.issue	15	tr_TR
dc.identifier.startpage	8371	tr_TR
dc.identifier.endpage	8382	tr_TR
dc.contributor.department	Çankaya Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	tr_TR