SecCSIE: A Secure Cloud Storage Integrator for [604381]

SecCSIE: A Secure Cloud Storage Integrator for
Enterprises
Ronny Seiger
T-Systems Multimedia Solutions Dresden and
Dresden University of Technology
01062 Dresden, Germany
[anonimizat] Groß and Alexander Schill
Faculty of Computer Science
Dresden University of Technology
01062 Dresden, Germany
{Stephan.Gross, Alexander.Schill }@tu-dresden.de
Abstract—Cloud computing services eliminate the need for
local storage thereby lowering operational and maintenanc e
costs. However, security and privacy concerns regarding th e
out-sourced data prevail. Especially in enterprise enviro nments,
sensitive internal and customer data accumulate, which are
usually subject to strong legal regulations. Therefore, al l the files
and information need to be protected when leaving a company’ s
intranet. In this work, we describe a work in progress and
propose a flexible system architecture for integrating vari ous
types of cloud storage providers into an employee’s desktop
computer without giving up data security. The system is cent ered
around a proxy server which will apply encryption and infor-
mation dispersion to all out-sourced files before they leave the
internal network. This architecture turns out to be very ver satile
and provides high levels of data confidentiality, integrity , and
availability.
I. INTRODUCTION
More recent developmentsof cloud service technologyhave
shown that cloud computing is much more than just a hype.
An increasing number of enterprises are moving parts of
their businesses “into the cloud” with the goal of increasin g
revenue,lowering operationalcosts, and improvingthe qua lity
of their services. One major concern, though, lies within th e
outsourcing of in-house and costumer data to external cloud
storage providers because this usually means a loss of contr ol
over data security and privacy. In order to solve this proble m,
a lot of online storage services, such as Dropbox, promise to
encrypt their clients’ data and store it at heavily secured l oca-
tions. Nevertheless,recurringnews reports(e.g. [1], [2] ) about
security breaches, unauthorized access to private files, an d
otherformsof data leakagehave underminedthe trust in clou d
service providers.Therefore,we proposea system architec ture
that allows the company-wide integration of external cloud
storage resources, requiring only a minimum level of trust b ut
guaranteeing high confidentiality, integrity, and availab ility of
data. In addition, we leverage the heterogeneity of the curr ent
cloud storage market and decrease the probability of gettin g
locked in with a specific storage vendor as it is mostly the
case nowadays.
This paper describes a joint research project of the
FlexCloud research group1at Dresden University of
1http://www.flexcloud.eu/Technology and the T-Systems Multimedia Solutions GmbH2.
The current status presented is a work in progress.
The main objective of this work is to seamlessly extend
internal enterprise IT resources by highly scalable, easil y
accessible, and durable external storage services as they a re
widely offered by current cloud computing providers. The
focus will be put on achieving high security properties as we ll
as good usability and extensibility.
The rest of this paper is structured as follows: section II
discusses very briefly some of the terms, technologies, and
algorithms applied in the system to be proposed in section
III, as well as the main issues and scope of this research
work. Section III presents a general overview of the system’ s
architecture, followed by specific implementation details and
a short evaluation of its security properties. Section IV in tro-
duces related work for further reading, including papers on
theoretical foundationsand practical implementations. S ection
V briefly discusses open questions concerning theoretical a nd
technical aspects as well as future work. Section VI conclud es
this paper.
II. BASICS
A. Cloud Computing
Despite a large variety of definitions, the term “Cloud Com-
puting” usually comprises a distributed system architectu re
featuring virtualized and dynamically-scalable resource s, e.g.,
computing power, storage, platforms, and services, which a re
delivered on demand to external customers over the Internet
[3]. Regarding the services offered to the clients, the tren d is
clearly towards the “Everything as a Service” model, i.e., t he
three standard infrastructure, platform, and software ser vice
categories [3] are extended to more fine-grained provisioni ng
models such as “Database as a Service”, “Security as a
Service”, “Storage as a Service”, etc.
Two major cloud deployment models can be found nowa-
days. On the one hand, there are public clouds which allow
paying customersto access their services via common Intern et
protocols,web applications, or applicationprogrammingi nter-
faces (APIs).Private clouds , on the other hand, offer services
only to a limited number of clients by restricting the access
2http://www.t-systems-mms.com
2011 IEEE Conference on Commerce and Enterprise Computing
978-0-7695-4535-6/11 $26.00 © 2011 IEEE
DOI 10.1109/CEC.2011.45252

methods, e.g., only within a company’s intranet. SecCSIE
tries to merge both models into what is known as a hybrid
cloud. Public storage services are combined with enterprise-
wide storage space. The resulting resourcesare made availa ble
toallclientsconnectedtothecorrespondingintranet(inc luding
VPN users).
B. Cloud Storage
Cloud storage servicesprovidevirtual onlinedisk space th at
canbe used asa normalharddriveforstoringall typesofdata .
Access to these external resources is usually provided by (i )
standard network/file transfer protocols, (ii) proprietar y APIs,
or (iii) vendor-specific client software. Normally, cloud d isk
space offered to customers is drawn from a vast pool of vir-
tualized hard drives which are redundantly distributed acr oss
several data centers. Due to the heavy use of virtualization
in the field of cloud computing/storage, it may happen that a
coherentfile, whenstored in the cloud,will be scattered acr oss
multiple hard drives in multiple global storage locations. The
usershouldexperiencethiscloudstorageprocedureasifit was
done locally on his/her client computer though, without any
notable latencies or additional user interaction requirem ents.
C. Current Issues with Cloud Storage
Off-site data storage raises several security and privacy
concerns. Due to the virtualized nature of the provided disk
space, files are distributed among a cluster of machines ofte n
spanning national boundaries. Therefore, it is not always
possible to say under which jurisdiction and data protectio n
laws the out-sourced information falls, and who will conse-
quently be able to access it. Particularly sensitive custom er
and personal data need to be protected and are subject to
heavy constraints that cannot always be matched by current
cloud storage solutions. Thus, cloud computing raises stil l
many open questions concerning compliance with privacy and
security laws. A recent media report [4] shows for example
that cloud data access even across transcontinental bounda ries
can be enforced by governments and other federal institutio ns
belonging to a completely different jurisdiction.
Althoughalotofcloudstorageprovidersemployencryption
algorithms for costumer data nowadays, they usually do the
key management themselves. This is the most convenient
way of providing easy data access for their customers from
everywhere and also of allowing them to share their files
with others. As a consequence, users have no influence on
the file encryption process and lose control over who may
have access to their data. Customers, therefore, have to put a
high level of trust in the cloud storage supplier. Usually, t his is
not compliant with national regulation policies for enterp rises
handling sensitive data. The aforementioned media reports
about unsafe encryption methods and data leakage at well-
known storage providers have confirmed that the problem of
secure off-site (cloud) data storage is still an important, yet
not completely solved issue.D. Information Dispersal Algorithms
Information dispersal algorithms ( IDAs) enhance both the
confidentiality and the availability of data without requir ing
much additional storage space. They go back to an idea of
Adi Shamir published in 1979 [5]. He proposed a scheme
to share a secret among several entities who will have to
co-operate in order to reconstruct the secret’s content. Th e
information is split into nparts and distributed across several
locations. To reassemble the message, a previously defined
threshold number m(m≤n) of data fragments need to be
available.No informationgaincan beseen with the possessi on
of less than mdata slices. Michael O. Rabin picked up this
idea in 1989 and presented an initial scheme for efficient
and secure distributed storage of data, called “Informatio n
Dispersal Algorithm”[6].
In our system architecture we will be using current de-
velopments of IDAs based on erasure coding to split files
into multiple data slices which will be redundantly stored o n
several storage nodes. By employingthese techniques, we wi ll
see a large gain in availability because only a subset of data
fragments is necessary to reconstruct the original informa tion.
Compared to full data replication, this approach requires o nly
minimal storage overhead.
III. SYSTEMARCHITECTURE
A. Overview
To overcome the security and privacy issues with storing
data in the cloud as they were discussed in previous sections ,
we propose a system architecture for securing off-site data
storage, depicted in Fig. 1. The key component is a proxy
server which is responsible for integrating the external st orage
services from the Internet, offering the new resources to th e
client computerson the intranet,and securingall data tran sfers
as soon as they leave the trusted enterprise-network zone.
One part of the proxy server is an adapter for common
file transfer protocols which allows the integration and ho-
mogenization of multiple cloud storage services. These new ly
gainedresourceswill be presented in combinationwith loca lly
attached storage space as a coherent network drive for stori ng
and retrieving files in the well-known manner. The process
of saving data will proceed as follows: the user – usually a
company employee – copies a file to a desired folder on the
network drive, this file will be cached on the proxy and then
splitbytheserverintoseveralpartsusinginformationdis persal
algorithms(erasure coding).The resulting data slices wil l now
be redundantly stored either on locally attached storage, e .g.,
on aNAS, or on one of the online cloud drives using the
protocol adapter. In the latter case, the data fragments wil l
be encrypted additionally to enhance the confidentiality of
information leaving the intranet.
During the whole process additional information and meta-
data belonging to the out-sourced file will be stored into a
database which allows the cached file to be deleted from
the proxy server after the storage procedure was completed
successfully. With the help of this database information, a
253

Fig. 1. System architecture
reliable retrieval and reconstruction of the original data file
is possible. Additional measures for protecting the databa se
need to be taken, though, in order to offer high availability .
B. Technical Details
The proposed proxy server will be a Linux based system
usuallylocatedwithinthetrustedzoneofacompany’sintra net.
One of the major goals is to seamlessly integrate the externa l
cloud storage services into an employee’s desktop work-spa ce
using the proxy server as a mediator. Therefore, we do not
want users to require any additional software components to
store files in the cloud. We will offer the additional storage
resources as a network drive which can be mounted via CIFS
on the client computers. This network protocol allows the be st
interoperability between heterogeneous system platforms and
seems to be the best choice in the mostly Windows dominated
world of office computers.
In orderto start the dispersion and encryptionalgorithmso n
the server, we need a customized file system that enables us
to “overwrite” the standard file system operations. The well –
known Filesystem in Userspace ( FUSE) will be used here to
implement the necessary functionality. An appropriate era sure
code for file dispersion shall be taken from the Jerasure-
Library by Plank et al. [7]. Depending on whether the storage
node a data slice should be stored on is trusted or not,
additionalencryptionofthe slice will be performedusingA ES
which is executed by the Bouncy Castle cryptography library.
The back-end database for storing additional information o n
the proxy is going to be a MySQL db, which should be repli-
cated, distributed, and protected against attacks and fail ures
according to best practices in the field of database security .
A web application for managing different cloud storage
providers supports the integration of external storage spa ce
using SMB, NFS, WebDAV, and (secure) FTP; but can easily
be extended by proprietary protocols and applications as lo ng
as there is a possibility to mount the storage as a folder/dri ve
on the proxy server. Today, a lot of FUSE modules exist that
allowdifferenttypesofstorageservicesto bemountedaslo cal
drives. By using these modules, the server’s storage resour ces
can be further extended, e.g., via the Amazon S3, DropBox,or GmailFS FUSE modules. The managementapplication also
providesthe possibility to integrate network-internalha rd disk
space and to mark a storage provider as trustworthy. In order
to increase the server’s performance, data fragments will n ot
be encrypted on “trustworthy” (local) storage locations.
Up to this point, we achieve high availability of data by
using IDAs because only a subset of all file slices stored in
the cloud is necessary to reconstruct the original informat ion.
Incase a storage serverdoesnot respondor a file fragmenthas
been manipulated, we can omit this particular fragment and
use others to restore the correct data. High confidentiality is
achieved by combining symmetric encryption with IDAs. The
third protection goal, integrity, will be reached by using the
AES-CMAC operation mode for encryptionwhich producesan
additional message authentication code ( MAC) for the single
data fragment. This allows us to check the state of a slice and
replace it by a healthy one in case of an integrity violation.
IV. RELATEDWORK
Although the term cloud computing as used nowadays has
been around since the middle of the last decade, it took some
time until academia adopted the new research field. Thus, the
earliest work on cloud computingin general and cloud storag e
in particular ranges back to 2008/09.
The vast majority of the rather theoretical publications ar e
concerned about integrity and availability. They all apply
existing schemes and mechanisms from cryptography, peer-
to-peer networking, or coding theory and refine them for the
cloud computing setup (e.g. [8], [9]).
AONT-RS by James Plank and Jason Resch [10] is one of
the most recent works in the field of information dispersal
algorithms. A combination of modern techniques from coding
theory in order to securely disperse information achieves h igh
performance without relying on external encryption. We are
using concepts of AONT-RS as basic building blocks for our
storage gateway.
Further work tries to predict the required storage space to
optimizethe resourceallocation[11]. Recently,therehav ealso
been several proposals for system architectures to integra te
cloudstoragesolutionswith existingIT landscapes[12].W ang
et al. even propose a middleware architecture that consider s
quality of service [13]. Since none of these works have yet
presented a usable prototype implementation, our intentio n is
to fill this gap with SecCSIE.
One of the few publications that has already been evolved
to a practical system is the Wuala storage service. It offers
cloud storage space with the main focus on data security and
availability, employing client-side encryption and infor mation
dispersal. In orderto be able to share files with otherusers a nd
still maintain data privacy, a sophisticated key exchange a nd
derivationprotocolcalled “Cryptree” [14] has been develo ped.
However, Wuala is clearly designed for home users whereas
our approach addresses commercial business.
A further example of a ready-to-use system is presented
in [15]. The TAHOE-LAFS is a distributed file system that
can be used on top of a storage grid. The LAFS employs a
254

mix of symmetric and asymmetric cryptography as well as
information dispersal to reach high data security. A gatewa y
distributes all information among available TAHOE storage
locations. Our approach is somewhat similar. However, due t o
our modularized system architecture we aim at enhancing our
storage gateway by additional sophisticated mechanisms, e . g.,
integrating existing access control services of a customer ’s IT
landscape.
V. DISCUSSION & FUTUREWORK
Compared to similar research projects, our approach avoids
any kind of vendor lock-in. In fact, it leverages the hetero-
geneity on the current cloud storage market by supporting
various common network protocols as well as proprietary
storage solutions. Therefore,the system architecture pre sented
in this paper turns out to be very flexible and extensible. It i s a
decent solution for outsourcing internal storage resource s in a
user-friendly way, without giving up data security or priva cy.
As this is still a work in progress, several issues need to
be addressed and evaluated in future work. Our goal is to
provide a first functional prototype by the end of August
2011 to conduct a comprehensive evaluation and performance
measurement within autumn 2011. Particularly the combina-
tion of encryption, dispersion, and integrity checking on t he
server may pose a bottleneck to the whole system. We will
investigate the impact of different parameters and algorit hms
on computational and storage costs. Caching and prefetchin g
ofclouddataontheproxyserverwillalsohaveavastinfluenc e
on performance.
Choosing the storage providers and distributing the data
fragmentsappropriatelywill also be part of furtherresear ch.A
possible solution for this problem may include the monitori ng
of available storage services and dynamically adjusting th e
distribution algorithm according to the resulting QoS data .
Sharing files among several entities raises additional ques –
tionsconcerningextendedaccess controlmethodsandsecur ity
policies. Currently, clients need to be within the company
network to use the cloud storage resources. Changing access
rights or updating data comprises the complete reassembly,
modification, and redistribution of the corresponding files .
Delta update functionality and public key cryptography may
come in handy at these points as well as consistency and
concurrent access control.
Last but not least the interaction between SecCSIE and a
cloud-based database (+web server) should be investigated .
The execution of queries and other operations on data stored
by SecCSIE poses the most challenging research question.
VI. CONCLUSION
In this work,we proposeda system architecturefor securely
extendingenterprise-widestorageresourcesbyhighly-sc alable
and flexible cloud services. The combination of state-of-th e-
art technologies from the field of cryptography, networking ,
and operating systems strengthens the security properties of
previous approaches and allows an easy and seamless inte-
gration into the users’ desktop workstations. The system’score component is a proxy server which is responsible for en-
cryption, data distribution, and the unification of the diff erent
cloud storage localities and services. Due to the fact that a ll
operationsareexecutedinthetrustedlocalintranetandal l data
leave the system only encrypted, the usual security concern s
and privacyissues with commoncloud storage services shoul d
be mitigated.
Acknowledgements
This work has received funding under project number
080949277 by means of the European Regional Development
Fund (ERDF), the European Social Fund (ESF) and the Ger-
man Free State of Saxony. The information in this document
is provided as is, and no guarantee or warranty is given that
the information is fit for any particular purpose.
REFERENCES
[1] C. Soghoian, “How dropbox sacrifices user privacy for cos t savings,”
Slight Paranoia Blog, Apr. 2011. [Online]. Available: http ://paranoia.
dubfire.net/2011/04/how-dropbox-sacrifices-user-priva cy-for.html
[2] “Dropbox was accessible with no password, oops,” TekGob lin Blog,
Jun. 2011. [Online]. Available: http://www.tekgoblin.co m/2011/06/20/
dropbox-was-accessible-with-no-password-oops/
[3] P. Mell and T. Grance, “The NIST definition of cloud
computing,” Recommendations of the National Institute of S tandards
and Technology (NIST), Special Publication 800145 (Draft) ,
Jan. 2011. [Online]. Available: http://csrc.nist.gov/pu blications/drafts/
800-145/Draft-SP-800-145 cloud-definition.pdf
[4] Z. Whittaker, “Microsoft admits patriot act can access
eu-based cloud data,” ZDNet iGeneration Blog, Jun.
2011. [Online]. Available: http://www.zdnet.com/blog/i generation/
microsoft-admits-patriot-act-can-access-eu-based-cl oud-data/11225
[5] A. Shamir, “How to share a secret,” Commun. ACM , vol. 22, no. 11,
pp. 612–613, 1979.
[6] M. O. Rabin, “Efficient dispersal of information for secu rity, load
balancing, and fault tolerance,” J. ACM, vol. 36, pp. 335–348, April
1989. [Online]. Available: http://doi.acm.org/10.1145/ 62044.62050
[7] J. S. Plank, S. Simmerman, and C. D. Schuman, “Jerasure: A library
in C/C++ facilitating erasure coding for storage applicati ons – Version
1.2,” University of Tennessee, Tech. Rep. CS-08-627, Augus t 2008.
[8] C. Wang, Q. Wang, K. Ren, and W. Lou, “Ensuring data storag e security
in cloud computing,” in Proceedings of the 17th International Workshop
on Quality of Service , Charleston, SC, USA, 2009.
[9] Q. He, Z. Li, and X. Zhang, “Study on cloud storage system b ased
on distributed storage systems,” in 2010 International Conference on
Computational and Information Sciences (ICCIS) , Dec. 2010.
[10] J. K. Resch and J. S. Plank, “AONT-RS: blending security and per-
formance in dispersed storage systems,” in FAST-2011: 9th Usenix
Conference on File and Storage Technologies , February 2011.
[11] N. Bonvin, T. G. Papaioannou, and K. Aberer, “A self-org anized,
fault-tolerant and scalable replication scheme for cloud s torage,” in
Proceedings ofthe 1stACM Symposium onCloud computing (SoC C’10).
New York, NY, USA: ACM, 2010, pp. 205–216.
[12] P. Xu, W. Zheng, Y. Wu, X. Huang, and C. Xu, “Enabling clou d
storage to support traditional applications,” in 5th Annual ChinaGrid
Conference , 2010.
[13] J. Wang, P. Varman, and C. Xie, “Middleware enabled data sharing
on cloud storage services,” in Proceedings of the 5th International
Workshop on Middleware for Service Oriented Computing (MW4 SOC
’10). New York, NY, USA: ACM, 2010, pp. 33–38.
[14] D. Grolimund, L. Meisser, S. Schmid, and R. Wattenhofer , “Cryptree:
A folder tree structure for cryptographic file systems,” Dep artment of
Computer Science Purdue University, West Lafayette, IN, Te ch. Rep.,
2006.
[15] Z. Wilcox-O’Hearn and B. Warner, “Tahoe: the least-aut hority
filesystem,” in Proceedings of the 4th ACM international workshop
on Storage security and survivability , ser. StorageSS ’08. New
York, NY, USA: ACM, 2008, pp. 21–26. [Online]. Available:
http://doi.acm.org/10.1145/1456469.1456474
255

Similar Posts