of x

Processing of Real Time Big Data For Using High Availability of Hadoop NameNode PDF

Published on 5 days ago | Categories: Documents | Downloads: 0 | Comments: 0




 International Journal of Computer Systems (ISSN: 2394-1065), Volume 03 –  Issue  Issue 05, May, 2016  Available at http://www.ijcsonline.com/


Processing of Real Time Big Data for Using High Availability of Hadoop NameNode Prashant V. Dhakad, Krishnakant Kishore Department of Computer Science & Engineering, Jagadguru Dattatray College of Technology, Technology, Indore, Madhya Pradesh, India

 Absttract  Abs ract  Property of real-time a digital world day-to-day crank out substantial level of real-time info (mainly referred to the definition of “Big Data”), exactly where perception info features a probable importance in case accumulated as well as aggregated correctly. correctly. In today’s age, there's a whole lot included with real -time -time remote control realizing Big Files when compared with it appears initially, as well as removing the valuable info in the efficient fashion prospects something towards a serious Your computational troubles, like to evaluate, combination, as well as shop, exactly where info are usually remotely accumulated. Preserving because the aforementioned components, there's a requirement for building  something architectural mastery which embraces equ equally ally real time, and also real world iinfo nfo finalizing. Thus, within this  paper, many of us recommend real-time Big Files analytical architectural mastery pertaining to finalizing this kind of info environment.

K eywo yworr ds:  Big Data, Hadoop, HDFS, Cloud Computing, Data analysis.



Recently, a great deal of interest in neuro-scientific Large Facts as well as evaluation provides increased,  primarily powered through considerable number of exploration problems strappingly related to confide  programs, such as modeling, running, querying, exploration, and also circulating large-scale repositories. The phrase “Big Data” classifies specific types of files units composed formless files, which usually very well throughout files coating regarding technological  processing. Applications and also the World-wide-web the information located within the root coating coming from all these kind of technological processing software examples  possess some correct individuality in common, such as 1. Substantial range: files, which usually describes this dimension and also the files manufacturing facility. A couple of Scalability difficulties: which usually consider this application’s probably be jogging with substantial range (e. gary., Large Data). 3. Maintain extraction change for better running (ETL) approach: through low, fresh files to very well thought-out files approximately a number of magnitudes. 5. Advancement regarding straightforward interpretable: analytical over Large Facts warehouses that has a seem to deliver a sensible and also important expertise for them. Large Info are generally earned simply by on the web exchange, video/audio, video/audio, mail, along with quantity of mouse clicks, fire wood, threads, facebook and myspace information, scientific information, out of the way admittance sensory, mobile phones, along with the  purposes. These kinds of information are generally accumulated in directories in which develop immensely and be complicated to limit, form, store, deal with, reveal, method, evaluate, along with imagine through common database computer software equipment. Advancement in

Large Info sensing along with personal computer technology revolutionizes how out of the way information gathered, ready-made, studied, along with managed. These kinds of programs are available in various shapes via computer software and then analytical services in which run in third-party hosted natural environment. With out of the way admittance networks, in which the repository including receptors could, generate a good frustrating volume of uncooked information. All of us send the item to the first task, i. age., information order, by which high of the info are generally connected with simply no interest that could be TV or even squeezed simply by orders connected with specifications. Which has a view to employing this sort of filter systems; they just don't discard valuable details. In particular, in concern connected with  brand-new reviews, will it be adequate and keep in which details which is pointed out while using organization  brand? Additionally Additionally,, will it be important that we could need your entire report, or simply a little portion round the  pointed out brand? The second concern is usually automatically era connected with precise metadata in which identify the actual make up connected with information  plus the method it had been gathered along along with studied. In this challenge, many of us referred the actual excessive  pace ongoing steady stream connected with information or even excessive quantity real world information to Large Info that's primary you to some “new world " connected with challenges. This sort of consequences connected with change for better connected with remotely sensed information for the scientific knowing are a critical activity. For this reason the actual price of which level of the actual out of the way admittance information is usually growing, many individual end users along with agencies are actually strenuous a simple yet effective mechanism to recover, method, along with evaluate, along with store these kind of information and its particular means.

419 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, 03 , Issue 05, May, 2016


 Prasahnt V. Dhakad Dhakad et al


Processing of of Real Time Big Data for Using High Ava Availability ilability of Hadoop Hadoop NameNode


Your improve in the info rates generated around the electronic digital whole world will be escalating significantly.. Using a look at inside hiring current tools and significantly engineering to research and retailer, a massive amount of info are not up to the mark [2], since they are struggling to get essential small sample info models. Thus, we must  pattern a great industrial industrial system regarding study studying ing the two out of the way access real-time and traditional info. If a  business may pull-out every one of the practical data to be found in the Massive Data rather than small sample involving the info set, if that's the case, they have a great important profit within the industry competitors. Massive Data analytics assists people to achieve perception and make better selections. Thus, while using objectives involving employing Massive Data, improvements inside  paradigms are in highest. To guide our inspirations, we have identified several regions in which Massive Data may engage in a crucial purpose. Inside healthcare situations, medical practitioners collect significant amount of info regarding sufferers, history, prescription drugs, along with particulars. Your above-mentioned info are generally gathered inside drugmanufacturing businesses. The character of this info is extremely intricate, and often the professionals can't show some sort of connection together with different data, that ends in missing out on involving important information. Using a look at inside hiring enhance analytic process processes es for  planning and taking out practical data via Massive Data ends in customized drugs, the enhance Massive Data analytic strategies allow perception in to hereditarily factors that cause the disease. III.  PROBLEM DEFINITION Big Info evaluation is in some manner any complicated undertaking compared to finding, determining, being familiar with, in addition to citing facts [3]. Which has a large-scale facts, doing this should transpire in the mechanized way mainly because it involves varied facts structure together with semantics to get articulated with sorts of computer-readable formatting. Nevertheless, through considering simple facts obtaining just one fact fixed, anytoprocess necessary associated tips on how style becomes any databases. There can bewith choice strategies to keep all of the very same facts. Such disorders, your pointed out style probably have an advantage over some others without a doubt process in addition to probable drawbacks for most various other functions. So as to deal with most of these requirements, a variety of analytical tools are provided by relational directories sellers [4]. These kinds of tools can be found in a variety of shapes through software program simply to analytical services that manage with third-party published surroundings. Within rural entry sites, the location where the data  bank such as sensors may make a great overwhelming number of natural facts. All of us send the item for you to the first task, my partner and I age Facts exchange, where much of the results are associated with not any attention which can be blocked as well as squeezed through order  placed associated associated with magnitude. That has a check out for you to using like filters, they don't discard helpful facts. As an illustration, with concern associated with new accounts, is it adequate to maintain that a fact that’s pointed out

while using firm identify? Additionally, is it necessary that individuals could need the entire record, or simply just a small item throughout the pointed out identify? Your second concern is automatically era associated with appropriate metadata that summarize your composition associated with facts and also the means it turned out collected in addition to examined. Like type of metadata is tricky to investigate because all of us might need to recognize the source for every single facts with rural entry. Generally, the results collected through rural regions are certainly not in the formatting completely ready intended for evaluation. For that reason, your second move refers you for you to facts removal, which usually drags away your helpful facts from the actual places in addition to provides the item in the set up formation well suited for evaluation. As an illustration, the result fixed is decreased for you to single-class brand for you to aid evaluation, while the very first thing that individual applied to take into consideration Big Info as always expounding on the fact.  Nevertheless, this can be far through simple fact; sometimes all of us have to endure invalid facts too, as well as a lot of the facts could be imprecise. To deal with these requirements, this specific document  provides a real time period Big Info analytical architecture, architecture, which is used to analyze real time, together with offline facts. the results arefrom remotely preprocessed, which is afterInitially, that understandable the equipment. Later, this specific helpful fact is transported towards the foundation method intended for even more facts running. The beds  base Technique functions functions a pair of sorts of runnin running, g, such as running associated with real-time in addition to offline facts. In case there is your offline fact, the results are transported for you to offline data-storage product. This incorporation associated with offline data-storage product allows with in the future using the results, although your real-time facts is specifically transported towards the filtration in addition to heap balancer server, where by filtration criteria is required, which usually removes your helpful facts from the Big Info. However, the load balancer amounts your running power through equal submission from the real-time facts towards the hosts. This filtration in addition to load-balancing server not just filters in addition to amounts load, although it is usually used to facts improve the method the proficiency. Moreover, your blocked are after that refined from the parallel hosts and they are delivered to facts aggregation system (if necessary, they can keep your refined facts inside outcome safe-keeping device) intended for evaluation functions from the decision in addition to considering server. This proposed architecture welcomes welcomes rural entry facts together with direct access multilevel facts (e. h., GPRS, 3G, xDSL, as well as WAN). This proposed architecture and also the algorithms are executed with Hadoop using MapReduce development  by utilizing real real time fa facts cts sensing.

IV.  PROPOSED SOLUTION We have divided real time Big Data processing architecture into three parts, i.e., 1) data acquisition unit 2) data processing unit and 3) data analysis and decision unit. The functionalities and working of the said parts are described as below.

420 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, 03 , Issue 05, May, 2016


 Prasahnt V. Dhakad Dhakad et al

Processing of of Real Time Big Data for Using High Ava Availability ilability of Hadoop Hadoop NameNode

 A.   Data Acquisition Unit

C.   Data Analysis and Decision Unit.

Your need for parallel control with the significant variety of files had been essential, that could proficiently evaluate the Huge Data. Consequently, the suggested device can be released within the real time Huge Data control structures in which gathers the results via various readily available files accumulating device world wide. All of us assume which the files acquiring device can easily correct the incorrect files. Pertaining to powerful files analysis, the base Program preprocesses files below many conditions to help incorporate the results via diverse options, which in turn besides reduces storage cost, but in addition helps analysis accuracy. Some relational files  preprocessing methods are usually files integration, files cleaning, as well as redundancy removal. The information should be fixed in numerous ways to take out distortions triggered as a result of motion with the software. All of us separated the results control course of action straight into a  pair of actions, including real-time Huge Data control as well as not online Huge Data control. When it comes to not online files control, the base Program transfers the results towards the files center for storage. That files can be and then useful for foreseeable future studies. However, in realtime files control, the results are usually directly sent towards the purification as well as fill balancer server, considering that keeping of inward real-time files degrades the performance of real-time r eal-time control.  B.   Data Processing Unit Throughout data processing device, the particula particularr filters as well as weight balancer server include a couple of basic tasks, such as filter of data as well as weight evening out of  processing power. Filtering identifies the particular  beneficial data for investigation because it just permits  beneficial details, although other data are impeded and therefore are dumped. That's why; the item brings about improving the particular efficiency with the full suggested system. Unsurprisingly, the particular load-balancing  perhaps the servers afford the ability of splitting the complete television data straight into components as well as assign them to numerous processing servers. The filter as well as load-balancing protocol can vary from investigation for you to investigation; electronic. h., if you experience simply a requirement of investigation of marine trend as well as temperature data, the particular measurement of such defined data is usually television available, and it is segmented straight into components. Each processing server has their protocol execution for  processing incoming segment of data from weight  balancer. Each processing server helps make record data, almost any measurements, as well as functions various other mathematical or even plausible chores to get advanced effects versus just about every segment of data. Because these kinds of servers carry out chores separately as well as throughout parallel, the particular efficiency suggested system is usually significantly superior, and the effects versus just about every segment are produced instantly. The outcomes produced by means of just about every server are and then delivered to the particular aggregation server for compilation, firm, as well as stocking for more processing.

Figure 1: Flowchart of the real time Big Data Processing architectu architecture. re.

This unit contains three major portions, such as aggregation and compilation server, results storage server(s), and decision making server. When the results are ready for compilation, the processing servers in data  processing unit send the partial results to the aggregation and compilation server, since the aggregated results are not in organized and compiled form. Therefore, there is a need to aggregate the related results and organized them into a  proper form for further processing and to store them. them. In the  proposed architecture architecture,, aggregation and compilation server is supported by various algorithms that compile, organize, store, and transmit the results. Again, the algorithm varies from requirement to requirement and depends on the analysis needs. Aggregation server stores the compiled and organized results into the result’s storage with the intention that any server can use it as it can process at any time. The aggregation server also sends the same copy of that result to the decision-making server to process that result for making decision. The decision-making server is supported  by the decision algorithm, which inquire Aggregation server stores the compiled and organized results into the result’s storage with the intention that any server can use it as it can process at any time. The aggregation server also sends the same copy of that result to the decision-making server to process that result for making decision. The decision-making server is supported by the decision algorithm, which inquire different things from the result, and then make various decisions.

421 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, 03 , Issue 05, May, 2016


 Prasahnt V. Dhakad Dhakad et al

Processing of of Real Time Big Data for Using High Ava Availability ilability of Hadoop Hadoop NameNode

The decision algorithm must be strong and correct enough that efficiently produce results to discover hidden things and make decisions. The decision part of the architecturee is significant since any small error in decisionarchitectur making can degrade the efficiency of the whole analysis. The flowchart supporting the working of the proposed architecturee is depicted in Fig. 1. architectur

[7]  Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li and Ying Li,“Hadoop High Availability through Metadata Replication” CloudDB'09 Proceedings of the first international workshop on Cloud data management. [8]  Ekpe Okorafor1 and Mensah Kwabena Patrick,”Availability of Jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters”, Advanced Computing: An International Journal (ACIJ ),Vol.3, No.3, May 2012. [9]  Tarunpreet Chawla, Neeraj Mangalani, Tarannum Sheikh, “Map Reducing in Big Data Using Hadoop”, In International Journal of Computer Systems (ISSN: 2394-1065), Volume 02- Issue 05,

V.  Sr No

Hash tags (#)


No of Tweets

Positive tweets (%)

Negative tweets (%)


#narendramodi #narendram odi















#madhyapradesh #madhyaprade sh









 pages: 201-205, May, 2015. [10]  Dhruba Borthakur, “The Hadoop Distributed File System:Architecture and Design,” in Apache Software foundation,http:/hadoop.apache.o foundation,htt p:/hadoop.apache.org/common/do rg/common/docs/r0.18.0/hdfs cs/r0.18.0/hdfs  _design.pdf. [11]  Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo!, “The Hadoop Distributed File System,” IEEE  NASA storage conference, http:// storageconference.org/2010/Papers/M storageconference.o rg/2010/Papers/MSST/Shvachko SST/Shvachko.pdf. .pdf. [12]  Aaron Myers, “High Availability for the Hadoop Distributed File System (HDFS),” Article at Cloudera, March 07, 2012.  2012. 

VI.  CONCLUSION In this paper, we proposed architecture for real-time Big Data Processing. The proposed architecture efficiently  processed and analyzed real-time and offline Big Data for decision-making. The architecture of real-time Big is generic (application independent) that is used for any type of real time Big Data processing. Furthermore, the capabilitiess of filtering, dividing, and parallel processing of capabilitie only useful information are performed by discarding all other extra data. These processes make a better choice for real-time Big Data analysis. The technique proposed in this  paper for each unit and subunits are used to analyze real time data sets, which helps in better understanding of data. The proposed architecture welcomes researchers and organizations organizati ons for any ttype ype of real time Big Data analysis by developing algorithms for each level of the architecture depending on their analysis requireme requirement. nt.

R EFERENCES EFERENCES   [1]  Muhammad Mazhar Ullah Rathore, Anand Paul, Awais Ahmad, Bo-Wei Bo-Wei Chen, Bormin Huang, and Wen Ji, “Real -Time Big Data Analytical Architecture for Remote Sensing Application”, IEEE  IEEE   [2]  Wikibon Blog. (Oct. 14, 2014). [2310]. Big Data Statistics [Online]. Available: wikibon.org/b wikibon.org/blog/big-data-statistics/ log/big-data-statistics/ [3]  M. Clerc, “The Swarm and the Queen: Towards a Deterministic and Adaptive Particle Swarm Optimization,” In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 19511957, 1999. [4]  A. Labrinidis and H. V. Jagadish, “Challenges and opportunities with Big Data,” in Proc. 38th Int. Conf. Very Large Data Bases Endowment, Endowme nt, Istanbul, Turkey, Aug. 27 – 31, 31, 2012, vol. 5, no. 12, pp. 2032 – 2033. 2033. [5]  P. Chandarana and M. Vijayalakshmi, “Big Data analytics frameworks,” in  in  Proc. Int. Conf. Circuits Syst. Commun. Inf. Technol. Appl. (CSCITA), 2014, pp. 430 – 434. 434. [6]  Mohammad Asif Khan, Zulfiqar A. Memon and Sajid Khan “Highly Available Hadoop NameNode Architecture” in 2012 International Conference on Advanced Computer Science Applications and Technologies. Technologies.

422 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, 03 , Issue 05, May, 2016

Sponsor Documents


Forgot your password?

Or register your new account on INBA.INFO


Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in