Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAishwary-
dc.contributor.authorMahajan, Ruhi [Guided by]-
dc.date.accessioned2022-09-24T07:10:23Z-
dc.date.available2022-09-24T07:10:23Z-
dc.date.issued2017-
dc.identifier.urihttp://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634-
dc.description.abstractLanguage Identification refers to the process of detecting the language(s) of the text in the document based on the script used for writing and observing the diacritics particular to a language. This research area has always fascinated researchers as early as 1970 and till now due to varied applications and increased demands of this field. In this work, I address the problem of detecting language of textual documents. I have introduced a method which is able to detect language of text more efficiently and accurately by determining their respective proportions and finding the greatest of them which represents the language of the text. I have demonstrated the performance comparison of three different approaches which are using n-gram approach (word-wise), using n-gram approach (character-wise) and using a combination of word search and stop words detection. My project currently contains language models for 4 languages. On an average the accuracy of my program is about 96.5%.en_US
dc.language.isoenen_US
dc.publisherJaypee University of Information Technology, Solan, H.P.en_US
dc.subjectPluricentric languagesen_US
dc.titleLanguage Identification of Texten_US
dc.typeProject Reporten_US
Appears in Collections:B.Tech. Project Reports

Files in This Item:
File Description SizeFormat 
Language Identification of Text.pdf1.27 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.