Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6742
Title: N Gram Based Algorithm for Distinguishing Between Hindi and Sanskrit Texts
Authors: Kumar, Manoj
Agnihotri, Radha
Mohana, Rajni [Guided by]
Keywords: Sanskrit text
N- gram
Issue Date: 2017
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: Language identification (LI) is an essential and integral part of “natural language processing”. “Several machine learning approaches have been proposed so far for addressing this sort of a problem.” “Language Identification “can be defined as the process of automatically determining the language(s) in which the content has been written in any document (web page, text document). Due to the rampant use of internet, identification of language has become a necessary pre-processing step for a variety of applications such as machine translation, linguistic corpus creation, Part-of-Speech tagging, accessibility of social media or user-generated content, search engines, supporting low-density languages and information extraction in addition to processing multilingual documents. In a multilingual country like India,“Language Identification” “has wider scope to bridge the digital rift between different language users. This project presents a brief overview of the challenges involved in the automatic identification of language as well as existing methodologies and some of the tools available identification. The process of” “Text categorization” “is a fundamental task in document processing that allows the automated handling of large streams of documents in the electronic form. It must work in a reliable manner” on all inputs, and therefore must tolerate problems of auto-identification up to some extent. Here, we describe an “N-gram-based approach” “to text categorization that is capable of distinguishing between Hindi and Sanskrit words. The system is small, speedy and robust. It has worked well for language classification, achieving an accuracy of 94.8%.
URI: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6742
Appears in Collections:B.Tech. Project Reports

Files in This Item:
File Description SizeFormat 
N Gram Based Algorithm for Distinguishing Between Hindi and Sanskrit Texts.pdf1.1 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.