Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/9935
Title: Healthcare Data Pipeline
Authors: Dabral, Shivam
Mohana, Rajni [Guided by]
Keywords: Python programming
Apache spark
Pseudocode
Healthcare
Issue Date: 2023
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: Big Data Processing is a matter of interest for many companies around the globe as they try to harness the true power of data. Similarly Nference labs private limited is trying to make use of healthcare data to provide people with better medical support. This project aims at exploring such various techniques that employ engines and frameworks that can generate useful data from raw data effectively and efficiently. Various techniques were examined based upon many research papers and compared. The results suggested the use of Apache Spark as an engine for computation. The data files were stored in parquet format with snappy compression, so that data occupies less space. Hence the aim was to come up with an efficient data generation pipeline that can handle Terabytes of data.
Description: Enrolment No. 191273
URI: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/9935
Appears in Collections:B.Tech. Project Reports

Files in This Item:
File Description SizeFormat 
Healthcare Data Pipeline.pdf1.64 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.