Next Big Thing on Data : Big Data

Published on June 2016 | Categories: Documents | Downloads: 57 | Comments: 0 | Views: 477
of 7
Download PDF   Embed   Report

We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today has been created in the last two years alone. The sources of data can be sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals etc. This data is Big Data

Comments

Content


1

Next Big Thing on Data : Big Data

Manoranjan Kr. Singh
(Department of Mathematics, Magadh University, Bodh Gaya)
[email protected]
Deepak Mitra
(Department of Computer Applications, Gaya College Gaya, Bihar)
[email protected]
Introduction
We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today
has been created in the last two years alone. The sources of data can be sensors used to gather
climate information, posts to social media sites, digital pictures and videos, purchase transaction
records, and cell phone GPS signals etc. This data is Big Data[1].
Big data is used to describe a massive volume of both structured and unstructured data that is
so large that it's difficult to process using traditional database and software techniques. In most
enterprise case the data is too big or it moves too fast or it exceeds current processing capacity. Big
data also refers to the technology (which includes tools and processes) that an organization requires
handling the large amounts of data and storage facilities.
Big data Technology
[2]

Big data technology must support search, development, governance and analytics services for
all data types—from transaction and application data to machine and sensor data to social, image and
geospatial data, and more.
Systems
Infrastructure must capitalize on real-time information flowing through the organization. It must be
optimized for analytics to respond dynamically—with automated business processes, better agility and
improved economics—to the increasing demands of big data.
Privacy
To protect organization’s reputation and brand, the platform must implement strict policies and
practices around privacy and data protection, safeguarding all of the data and insights on which the
business relies.
Governance
It controls how information is created, shared, cleansed, consolidated, protected, maintained,
retired and integrated within the enterprise.
2

Storage
To achieve economies and efficiencies, certain analytics must run close to the data, while it is in
motion. But for data to store, the infrastructure must embody a defensible disposal strategy that reduces
the run rate of storage, legal expense and risk.
Security
As analytics is infused into organization, data security becomes more central. Infrastructure
must have strong security measures built in to guard organization against internal and external threats.
Cloud
To relieve the pressure that big data is placing on IT infrastructure, big data and analytics
solutions can be hosted on the cloud to achieve the scalability, flexibility, expandability and economics
that will provide competitive advantage into the future.
Difference between Big Data and Open Data
[3]

Big data and the new phenomenon open data are closely related but they're not the same.
Open data brings a perspective that can make big data more useful, more democratic, and less
threatening.
While big data is defined by size, open data is defined by its use. Big data is the term used to
describe very large, complex, rapidly-changing datasets. But those judgments are subjective and
dependent on technology: today's big data may not seem so big in a few years when data analysis and
computing technology improve.
Open data is accessible public data that people, companies, and organizations can use to
launch new ventures, analyses patterns and trends, make data-driven decisions, and solve complex
problems. All definitions of open data include two basic features: the data must be publicly available for
anyone to use, and it must be licensed in a way that allows for its reuse. Open data should also be
relatively easy to use, although there are gradations of "openness". And there's general agreement that
open data should be available free of charge or at minimal cost.






3

The relationship between big data and open data

This Venn diagram maps the relationship between big data and open data, and how they relate
to the broad concept of open government.
Both big data and open data can transform business, government, and society – and a
combination of the two is especially potent. Big data gives us unprecedented power to understand,
analyse, and ultimately change the world we live in. Open data ensures that power will be shared – and
that the world we change will, with luck, become a fairer and more democratic one.
As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now
mainstream definition of big data as the three V’s: volume, velocity and variety
[4]

Volume. : Many factors contribute to the increase in data volume. A typical PC might have had
10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day; a
Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the
proliferation of smart phones, the data they create and consume; sensors embedded into everyday
objects will soon result in billions of new, constantly-updated data feeds containing environmental,
location, and other information, including video.
Velocity : Clickstreams and ad impressions capture user behavior at millions of events per
second; high-frequency stock trading algorithms reflect market changes within microseconds; machine
to machine processes exchange data between billions of devices; infrastructure and sensors generate
massive log data in real-time; on-line gaming systems support millions of concurrent users, each
producing multiple inputs per second
4

Variety : Data today comes in all types of formats. Big Data data isn't just numbers, dates, and
strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log
files and social media. Traditional database systems were designed to address smaller volumes of
structured data, fewer updates or a predictable, consistent data structure. Big Data databases, such as
MongoDB, solve these problems and provide companies with the means to create tremendous business
value.
At SAS two additional dimensions are considered about big data:
Variability : In addition to the increasing velocities and varieties of data, data flows can
be highly inconsistent with periodic peaks. Daily, seasonal and event-triggered peak data
loads can be challenging to manage. Even more so with unstructured data involved.
Complexity : Today's data comes from multiple sources. And it is still an undertaking to
link, match, cleanse and transform data across systems. However, it is necessary to connect
and correlate relationships, hierarchies and multiple data linkages or your data can quickly
spiral out of control.
Importance of Big Data
[5]

Organizations will be able to take data from any source, harness relevant data and analyze it to
find answers that enable
 cost reductions,
 time reductions,
 new product development and optimized offerings
 smarter business decision making.
Big Data for the Enterprise
With Big Data databases, enterprises can save money, grow revenue, and achieve
many other business objectives, in any vertical.
Build new Applications: Big data might allow a company to collect billions of real-time data points on
its products, resources, or customers – and then repackage that data instantaneously to optimize
customer experience or resource utilization.
Improve the effectiveness and lower the cost of existing applications: Big data technologies can
replace highly-customized, expensive legacy systems with a standard solution that runs on commodity
hardware. And because many big data technologies are open source, they can be implemented far
more cheaply than proprietary technologies.
5

Realize new sources of competitive advantage: Big data can help businesses act more nimbly,
allowing them to adapt to changes faster than their competitors.
Increase customer loyalty: Increasing the amount of data shared within the organization – and the
speed with which it is updated – allows businesses and other organizations to more rapidly and
accurately respond to customer demand.
Big Data is Big Business for Commerce
[6]

Three ways big data can benefit your business
Detect, prevent and remediate financial fraud
Across consumer and B2B industries, every day around the world, criminals are busily at work
trying to defraud companies through a constantly evolving portfolio of schemes and strategies.
As the volume and sophistication of these schemes increases, many organizations are turning
to powerful analytics to sift through massive data volumes and uncover hidden patterns, trends and
suspicious events that can indicate criminal fraud.
Calculate risk on a large portfolio of loans
An industry wide failure to properly assess the latent risks lurking in thousands of substandard
loans led to billions of dollars of losses.
Execute high-value marketing campaigns
Companies face big data challenges in its marketing operations as well. Company operates a
sophisticated marketing operation, running campaigns to millions of targets. However, as the data
volumes grew and the campaigns began to target 10 million to 15 million recipients, it couldn't physically
process the data, preventing the company from maximizing its customer lifetime value and executing
more efficient and effective cross-sell/up-sell campaigns.
Using high-performance analytics, the company has achieved tremendous gains in the
throughput of its database marketing – as much as 215 times faster – dramatically compressing the
model development life cycle and enabling its teams to test and validate additional variables for greater
reliability in their models.
Big Data and the Market in India
[8]

IDC predicts that this year Big Data will reach $16.1 billion and grow six times faster than the IT
market overall. India itself is also moving at a phenomenal pace with escalating volumes of consumers
moving online.
6

According to IDC, by 2020 the world is set to generate 50 times the amount of information and
75 times the number of information containers and new information taming technologies that would be
aimed at driving down the cost of creating, capturing, managing, and storing information.
Abhijit Potnis, Director – Technology Services, EMC India & SAARC highlights that the Digital
Universe in India alone is set to grow to 2.9 Zettabytes by 2020 and that the liability for 84 percent of
this digital universe rests with the enterprises. This percentage gives a fair idea of how the enterprise
storage infrastructure across various industry verticals would experience the need of a consistent
overhaul to accommodate the escalating enterprise data requirements.
Big Data Tools
[1]

Open Source tools for big data, divided into four arenas: Data stores, development platforms,
development tools, and integration, analytics and reporting tools.
Data Stores
 Apache Hadoop – Cloud Foundry (VMware), Hortonworks, Hadapt
 NoSql Databases – MongoDB, Cassandra, Hbase
 SQL Databases – MySql (Oracle), MariaDB, PostgreSQL, TokuDB
Development Platforms
 On Apache Hadoop – Impala (Massively Parallel Processing (MPP) query engine that runs
natively); Lingual (ANSI SQL); Pattern (analytics); Cascading (an application framework for Java
developers for data analytics and data Management app’s)
 On Apache Lucene and Solr – Search from LucidWorks and ElasticSearch
 OpenStack (open source software for building private and public clouds.)
 Red Hat (Hadoop Servers’ standard Linux distro)
 REEF (Microsoft’s Hadoop development platform)
 Storm (integrates with any queuing system and any database system)
Development Tools
 Apache Mahout (programming language for machine learning)
 Python and R (programming language for predictive analytics)
Integration, Analytics and Reporting Tools
 Jaspersoft (reporting and analytics server)
 Pentaho (data integration and business analytics)
 Splunk (platform for IT analytics)
7

 Talend (big data integration, data management and application integration)

References:
1. www.webopedia.com.
2. http://www.ibm.com/big-data/us/en/technology/
3. http://www.theguardian.com
4. www.sas.com
5. http://www.mongodb.com/big-data-explained
6. http://www.ibm.com/big-data/us/en/big-data-and-analytics/
7. http://www.idgconnect.com/abstract/5680/indian-startups-big-data-analytics-ex-oracle-staff
8. Doug Levin January 30, 2014 ; Posted in: Business, Community

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close