Hash Tag Prediction

Thakur, Shruti; Singh, Sanjana [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/7346

Title:	Hash Tag Prediction
Authors:	Thakur, Shruti Singh, Sanjana [Guided by]
Keywords:	Prediction Hash tag
Issue Date:	2016
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	Social media has demonstrated quick growth, in both directions of becoming the most popular activities in internet and of attracting scientific researchers to get better insights into the understanding of the underlying sociology. Real time micro-blogging sites such as Twitter use tags as an alternative to traditional forms of navigation and hypertext browsing. The tag system of those micro-blogging sites has unique features in that they change so frequently that it is hard to identify the number of clusters and so effectively carry out classification when new tags can come out at any time. We basically take the example of twitter for the discussion purposes. Twitter is one popular web application nowadays. Twitter allows users to use “Hash tags” to classify their tweets. It is called a micro-blog because people can post short, quasi-public messages up to 140 characters in length. People create lists of others and are shown a list of all of the posts of those people. The substantive nature of the social tie on Twitter is attention-based . In addition to paying attention to one another by “following,” Twitter users can address tweets to other users and can mention others obliquely in their tweets. Another common practice is “retweeting,” or rebroadcasting someone else’s message (with attribution) so as to direct attention toward that person’s tweets. Twitter differs from other online social networking services in that ties are asymmetric. Consider friendship ties in LinkedIn, Facebook, or MySpace; in these services, when two people share a friendship tie, the tie is symmetrical; A being friends with B implies B is friends with A. This is not the case in Twitter; A can “follow” B, but B needs not follow A. People who are popular, such as basketball players or actors, can be followed by millions of people, but can barely pay attention to all of those who follow them. Hashtags (single tokens often composed of natural language n-grams or abbreviations, prefixed with the character ‘#’) are ubiquitous on social networking services, particularly in short textual documents (a.k.a. posts). Authors use hashtags to diverse ends, many of which can be seen as labels for classical tasks: disambiguation (chips #futurism vs. chips #junkfood); identification of named entities (#sf49ers); sentiment (#dislike); and topic annotation (#yoga). The hash tag enables Twitter users to create searchable subject groups and so to be able to navigate the hypertext structures of the whole site. The power of the hash tag is that it creates very vii specific sets of content. If you want to know what other people think of the superbowl that just came on you can find it easier by searching for the hash tag than by searching for something similar in a normal search engine. Every day, many new hash tags are formed and this process can happen right before your eyes-heck. The frequent creation of new tags makes the prediction of tags challenging. Hashtag prediction is the task of mapping text to its accompanying hashtags. Hash tag prediction is different from normal texts classification. Here we don’t know how many clusters we need to find. In addition, the tag set changes so frequently that it is almost impossible to effectively carry out classification or clustering, since a new tag would force us to establish a new class and a new classification rule. Our intuition is: if we can measure the correlation between various tweets as the mathematical metric we can treat the collected tweets as points in a high dimensional space, and construct a network by the latent space model. We show that simple techniques are sufficient to extract key semantic content from tags and also filter out extraneous noise. We demonstrate the efficacy of this approach by comparing it with other classification functions and show that our model maintains a false positive rate lower than 15%.
URI:	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/7346
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Hash Tag Prediction.pdf		1.32 MB	Adobe PDF	View/Open

Show full item record