Apache Hadoop on Openstack Cloud

Sharma, Himanshi; Ghrera, Satya Prakash [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/5847

Title:	Apache Hadoop on Openstack Cloud
Authors:	Sharma, Himanshi Ghrera, Satya Prakash [Guided by]
Keywords:	Big data Apache hadoop OpenStack cloud Cluster OpenStack swift Container
Issue Date:	2015
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	Big Data at its core is simply a way of describing problems which are not solvable using traditional tools. Big Data is defined by its three Vs i.e. Volume, Velocity and Variety. This Big Data coming from wide variety of source can help make transnational decisions, and to be able to make these decisions on data of any scale it is important to access the right kind of tools. Apache Hadoop is such a framework designed to store, process and manage such a form of data accross cluster of computers. It is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. At its core, Apache Hadoop is a framework for storing data on large clusters of commodity hardware— everyday computer hardware that is affordable and easily available— and running applications against that data. A cluster is a group of interconnected computers (known as nodes) that can work together on the same problem. Using networks of affordable compute resources to acquire business insight is the key value proposition of Hadoop. But of course, the business model is generally spilt across various dimensions, and each dimension may have their own Hadoop Clusrer, which means deploying different storage for each of the different cluster of which some data can be same in different cluster also if there is a need of merging of two dimensions for some analysis it is required to load the combined data into a new cluster which is again a time consuming and expensive task. This is where OpenStack Swift container comes into picture, Swift provides a Object storage which allows to store files or objects. Swift architectures is build in such a way that can be used directly with Hadoop clusters, which allows it provides a 'centralized' storage, which can be accessed by any cluster directly and any cluster can use it to store the data, also projects like pig, hive that are used with Hadoop can also access the Swift storage directly so as to process the data, hence removing the need of storing redundant data and help encouraging innovations because of its low cost and low complexity. Both Apache Hadoop and OpenStack Swift represent open source projects, Swift servers as
URI:	http://ir.juit.ac.in:8080/jspui//xmlui/handle/123456789/5847
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Apache Hadoop on Openstack Cloud.pdf		1.71 MB	Adobe PDF	View/Open

Show full item record