Language Identification of Text

Aishwary; Mahajan, Ruhi [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634

Full metadata record

DC Field	Value	Language
dc.contributor.author	Aishwary	-
dc.contributor.author	Mahajan, Ruhi [Guided by]	-
dc.date.accessioned	2022-09-24T07:10:23Z	-
dc.date.available	2022-09-24T07:10:23Z	-
dc.date.issued	2017	-
dc.identifier.uri	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634	-
dc.description.abstract	Language Identification refers to the process of detecting the language(s) of the text in the document based on the script used for writing and observing the diacritics particular to a language. This research area has always fascinated researchers as early as 1970 and till now due to varied applications and increased demands of this field. In this work, I address the problem of detecting language of textual documents. I have introduced a method which is able to detect language of text more efficiently and accurately by determining their respective proportions and finding the greatest of them which represents the language of the text. I have demonstrated the performance comparison of three different approaches which are using n-gram approach (word-wise), using n-gram approach (character-wise) and using a combination of word search and stop words detection. My project currently contains language models for 4 languages. On an average the accuracy of my program is about 96.5%.	en_US
dc.language.iso	en	en_US
dc.publisher	Jaypee University of Information Technology, Solan, H.P.	en_US
dc.subject	Pluricentric languages	en_US
dc.title	Language Identification of Text	en_US
dc.type	Project Report	en_US
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Language Identification of Text.pdf		1.27 MB	Adobe PDF	View/Open

Show simple item record