Research on Cloud Data Storage

Published on May 2016 | Categories: Documents | Downloads: 27 | Comments: 0 | Views: 254
of 5
Download PDF   Embed   Report

Research on Cloud Data Storage

Comments

Content

Available online at www.sciencedirect.com
Available online at www.sciencedirect.com

Procedia
Engineering

Procedia Engineering
00 (2011)29000–000
Procedia Engineering
(2012) 133 – 137
www.elsevier.com/locate/procedia

2012 International Workshop on Information and Electronics Engineering (IWIEE)

Research on Cloud Data Storage Technology and Its
Architecture Implementation
Kun Liua, Long-jiang Donga*,b
a

College of Oriental Application & Technology Beijing Union University, Beijing 102200, China
b
Software Development Department Adobe (Beijing) Corporation, Beijing 100085, China

Abstract
The concept of cloud computing becomes more and more popular in latest years. Data storage is a very important and
valuable research field in cloud computing. This paper introduces the concept of cloud computing and cloud storage
as well as the architecture of cloud storage firstly. Then we analyze the cloud data storage technology--GFS(Google
File System)/HDFS(Hadoop Distributed File System) towards concrete enterprise examples. In the last part, we
illustrate how to improve the traditional file storage method based on eyeOS Web operating system which realizes
file distributed storage and fault-tolerant control though HDFS technology of Hadoop.

© 2011 Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
Keywords: Cloud Computing; Cloud Storage; Web Operating System; Distributed File System;

1. Introduction
In latest years, the concept of cloud computing becomes more and more popular. Cloud computing as a
new business model is developed from distributed processing, parallel processing and grid computing. At
present, Google, Amazon, IBM, Microsoft, Sun and other IT giants are all seeking to develop cloud
computing technologies and products. For example, Google has been dedicated to promoting application
engines based on the techniques of GFS [1] (Google File System), MapReduce [2], BigTable[3] and so on,
which provide users methods and means to process massive data. In this paper, we introduce the concept
of cloud computing and cloud storage as well as the architecture of cloud storage firstly, analyze the
cloud data storage technology—GFS and HDFS (Hadoop Distributed File System) under the specific

* Corresponding author. Tel.: +86-135-8168-3162.
E-mail address: [email protected].

1877-7058 © 2011 Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
doi:10.1016/j.proeng.2011.12.682

2134

Kun
Liu and
Long-jiang
Dong
/ Procedia00Engineering
29 (2012) 133 – 137
Author
name
/ Procedia
Engineering
(2011) 000–000

cases of enterprises, and build the cloud storage architecture through eyeOS Web operating system in our
computer.
2. Cloud computing and cloud storage
2.1. Cloud computing definition
Cloud computing arises from the combination of the traditional computer technology and network
technology, such as grid computing, distributed computing, parallel computing, utility computing,
virtualization. One of the core concept of cloud computing is reducing the processing burden on user’s
terminals through continuously enhancing the clouds’ handling capacity. Eventually user’s terminals are
simplified into a simple input and output devices. Users can use the powerful computing and processing
function on clouds and they can order their service from the cloud according to their own needs.
2.2. Cloud storage definition and it’s architecture
Cloud storage is a system that provides functions such as data storage and business access. It
assembles a large number of different types of storage devices through the application software which are
based on the functions of the cluster applications, grid techniques, distributed file systems, etc. Cloud
storage can be simply understood as the storage in cloud computing, and also can be considered to be a
cloud computing system equipped with large capacity storage. Cloud storage system architecture mainly
includes storage layer, basic management layer, application interface layer and access layer .
3. Cloud storage technology of enterprises
3.1. GFS [1]
1) System Architecture
A GFS cluster consists of a single master, mutiple chunkservers and mutiple clients, as shown in
Figure 1(a). Each of these is typically a commodity Linux [1].
• GFS Master: Master manages all file system metadata and the files directory structure. GFS uses
single master policy which means in the same time only one master providing services so that it can
avoid extra costs for coordinating between multiple masters synchronously. A client interacts with the
master only for metadata, and interacts with the chunkservers directly for all other data.
• Chunkserver: GFS files are divided into fixed-size chunks stored on each chunkserver and the default
block size is 64M. Each chunk is identified by an immutable and globally unique 64 bit chunk handle
assigned by the master as soon as the chunk is created. Each block is replicated on three chunkservers.
Users can set different replication levels for each regions of the file namespace. As shown in Figure
1(a), there are four chunkservers and five chunks as C0-C4. Each chunk is saved on three chunkservers.
• Client: GFS client code linked into each application implements the file system API and
communicates with the master and chunkservers to read or write the master for metadata operations,
but all data-bearing communication goes directly to the chunkservers[1].
2) Workflow
As shown in Figure 1(a), thin solid lines represent the control information between clients and master
or between master and chunkservers, thick solid lines represent the data communication between
chunkservers and client, dashed lines indicate the control information between clients and chunkservers.

Kun Liu Author
and Long-jiang
Dong / Procedia
Engineering
(2012) 133 – 137
name / Procedia
Engineering
00 (2011)29000–000

Firstly, clients compute chunk index from files structure and chunk size, then send file name and
chunk index to master (mark①). Secondly, master sends chunk handle and chunk locations to clients
(mark②). Thirdly, clients send chunk handle and byte range to the nearest chunkserver (mark⑤). Finally
chunkserver sends data to client (mark⑥).Once clients get chunk locations from master, clients do not
interact with master any more. Master does not permanently save the mapping from chunkserver to chunk.
Instead, it asks each chunkserver about its chunks at master startup or whenever a chunkserver joins the
cluster (③④). The master periodically communicates with each chunkserver in HeartBeat message to
give it instructions and collect its state (③④).
①FileName,Index

Client

Master

② Chunk Handle

Client
Client

⑤ Chunk Handle

Master
③ Instructions

Client

④Chunkserver
state

Hadoop DataNodes
C0

C1

C0

C3

C4

C3

Chunkserver
1

C2

Chunkserve
r2




C0

C2

C2

C1

C1

C4

C3

C4

Chunkserver
M

Chunkserve
rN

WebOS--eyeOS
Clients

⑥ Data

Fig.1.(a) GFS Architecture; (b) System Architecture

3.2. HDFS
Hadoop is hosted by the Apache Software Foundation, which provides support for a community of
open source software projects. Although Hadoop is best known for MapReduce and its distributed file
system (HDFS), the other subprojects provide complementary services, or build on the core to add higherlevel abstractions. The detailed contents refer to document [4].
The full name of HDFS [5] is Hadoop Distributed File System. HDFS is run on large clusters of
commodity hardware and is like GFS of Google. The architecture of HDFS is master/slave and a HDFS
cluster has one namenode and multiple datanodes. Namenode is the central server, equivalent to master in
GFS. It is responsible for the namespace operation of file systems. Datanode is similar to chunkserver of
GFS which is responsible for managing storage on datanodes, creating block, deleting block, copying
block and etc. The files in HDFS are divided into one or multiple blocks which are stored in datanodes.
Namenode and datanodes can be run on the low-cost Linux computer. HDFS is developed by java
language.
4. Cloud storage architecture based on hadoop
4.1. EyeOS
EyeOS is a web desktop environment with office software and personal information management
systems, and it enables the online storage, mobile office. Document management in eyeOS is simply
stored in a single server, without fault-tolerant backup feature and reliability is poor. Accessing files is a

3135

4136

Kun
Liu and
Long-jiang
Dong
/ Procedia00Engineering
29 (2012) 133 – 137
Author
name
/ Procedia
Engineering
(2011) 000–000

single thread and access performance is not high. In this paper, we improve the traditional file storage
method and achieve file distributed storage as well as fault-tolerant control using HDFS technology.
4.2. System implementation
1)Architecture
The storage system we designed is shown in Figure 1(b) which includes clients, web operating system
eyeOS, cloud server (NameNode), cloud storage center (DataNode).
• Clients: Each client is only pre-installed with web browser and users log in this cloud storage through
web browser. Clients are the interface between users and cloud storage system.
• Web Operating System: Web operating system receives users’ access requests, verifies the users’
validity, and interacts directly with the clients. It is based on eyeOS which offers a large number of
applications to users. Users can download their required applications and achieve a personalized
system. EysOS is also the file access interface for users and files can be saved in the cloud storage
clusters by this interface.
• Cloud server (Cloud NameNode): Cloud storage cluster based on Hadoop includes cloud server
(NameNode) and cloud storage center (DataNode). Cloud server is the namenode in Hadoop which
manages file system namespace, computes the mapping from files to datanodes, allocates datanodes to
save file blocks, and controls external clients’ access.
• Cloud Storage center (Cloud DataNode): Cloud storage center is datanode in Hadoop. It is in charge of
saving files, realizing file distributed storage, ensuring load balancing, files fault-tolerant and etc.
2)Operation Process
Users’ operations based on eyeOS are writing files and reading files. When reading a file, we
download the file to the local computer, then handle or display the file using the application software in
web operating system. When the files are modified and saved, web operating system uploads them to
cloud storage system from local computer.
• Reading files process: ①Users log in the web OS from client through clients’ browser and doubleclick a file icon on the web OS. Then eyeOS requests the file from the Hadoop namenode. ②
Namenode finds the related information of files, and computes the file’s location. Datanodes which
saved the blocks of the file send the blocks to the clients.③Clients download the file blocks from the
datanodes and merge these blocks into a file.④Applications associated with the file in the web
operating system auto start and display the file.
• Writing files process: ①Users log in web OS from client’s web browser modify and save files using
the selected application. EyeOS requests uploading files to Hadoop namenode. ②Namenode allocates
storage space to datanodes according to the file size and the datanodes’ storage condition after it
received the uploading request. ③Clients upload file. Namenode divides it into one or multiple blocks
and saved in the allocated datanodes.
5. Experiments
These experiments use five computers. Three are used as client, eyeOS, namenode respectively and the
other two are used as datanodes. We assume the datanodes are Da and Db. There are files named FileX
and FileY in Da and Db.
The experiments are done when Da and Db are always normal. As shown in Table 1, when creating
File1, this file is saved in Da and Db at the same time. When deleting FileY, this file in Da and Db are all
deleted. These means datas in invalid datanode will be updated automatically when the datanode recovers
normal and data in datanode are always latest.

Kun Liu and
Long-jiang
Dong / Procedia
Engineering
29000–000
(2012) 133 – 137
Author
name / Procedia
Engineering
00 (2011)

We do experiments when Db is always normal but Da is invalid and the results are showed in Line 3 of
Table 1. Then when Da recovers normal and the results are shown in Line 4 of Table 1.
• Creating Files: When creating File1, we can find the file in Db but can’t find it in Da. If Da recovers
normal this moment, we can also find File1 in Da.
• Deleting Files: When deleting FileY, the FileY can’t be found in Db but can be found in Da. If Da
recovers normal now, FileY is deleted from Da immediately.
Table 1.Experiments’ Results

Operation
Da(Normal) Db(Normal)
Da(Invalid) Db(Normal)
Da(recovers normal after
invalid) Db(Normal)

Creating File1

Deleting FileY

File1 can be found in Da and Db.
File1 is saved in Db, but isn’t saved in Da.
File1 is saved as a duplicate file in Da
automatically.

FileY are deleted from Da and Db.
FileY is deleted from DB, but is not deleted form Da.
FileY is deleted from Da.

6. Conclusions
Cloud computing is the inevitable product with the development of the internet, and it also brings more
rich applications to the internet. Cloud data storage technology is the core area in cloud computing and
solves the data storage mode of cloud environment. In this paper, we introduce the related concepts of
cloud computing and cloud storage. Then we pose a cloud storage architecture based on eyeOS web
operating system in our computers. Experiments verified the system is well.
Acknowledgements
This work is supported by a grant from the Research Program of Beijing Union University
(No.ZK2009606)
References
[1]Sanjay Ghemawat, Howard Gobioff,Shun-Tak Leung. The Google file system[C]. Proceedings of the 19th ACM Symposium
on Operating Systems Principles. New York: ACM Press, 2003:29-43.
[2]Jeffrey Dean, Sanjay Ghemawat. MapReduce:Simplied data processing on large clusters[C]. Proceedings of the 6th
Symposium on Operating System Design and Implementation. New York: ACM Press. 2004:137-150.
[3]Fay Chang, Jeffrey Dean,et al. Bigtable:A Distributed Storage System for Structured Data[J]. ACM Transactions on
Computer Systems. 2008,26(2):1-26.
[4]Tom White. Hadoop:The Definitive Guide[M]. United States of America: O’Reilly Media, Inc. 2009.
[5]Dhruba Borthakur. The Hadoop Distributed File System: Architecture and Design [EB/OL]. (2008-09-02) [2010-08-25].
http://hadoop.apache.org/common/docs/r0.16.0/hdfs_design.html.
[6]Hbase Development Team. HBase: Bigtable-like structured storage for Hadoop HDFS[EB/OL]. (2010-08-10) [2010-08-25].
http://wiki.apache.org/hadoop/Hbase.
[7]Mike Burrows. The chubby lock service for looselycoupled distributed systems[C]. Proceedings of the 7th Symposium on
Operating Systems Design and Implementation,2006.

137
5

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close