CSE 710
– Wide Area Distributed File Systems
Spring 2014
– Project Ideas
Project-1:
FuseDLS: Design and Implementation of a Fuse-based
file system interface to a Cloud-hosted Directory Listing Service:
The
Cloud-hosted Directory Listing Service (DLS) prefetches
and caches remote directory metadata in the Cloud to minimize response time to
the thin clients (such as smartphones, Web clients etc)
to enable efficient directory traversal before issuing a remote third-party
data transfer request. Conceptually, DLS is an intermediate layer between the
thin clients and the remote servers (such as FTP, GridFTP,
SCP etc) which provides access to directory listings
as well as other metadata information. In that sense, DLS acts as a centralized
metadata server hosted in the Cloud. When a thin client wants to list a
directory or access file metadata on a remote server, it sends a request
containing necessary information (i.e., URL of the top directory to start the
traversal, along with required credentials for authorization and
authentication) to DLS, and DLS responds back to the client with the requested
metadata.
During
this process, DLS first checks if the requested metadata is available in its
disk cache. If it is available in the cache (and the provided credentials match
the associated cached credentials), DLS directly sends the cached information
to the client without connecting to the remote server. Otherwise, it connects
to the remote server, retrieves the requested metadata, and sends it to the
client. Meanwhile, several levels of subdirectories will be prefetched
at the background in case the user wants to visit a subdirectory. Any metadata
information on DLS server is cached and periodically checked with the remote
server to ensure freshness of the information. Clients also have the option to
refresh/update the DLS cache on demand to make sure they are accessing the
server directly, bypassing the cached metadata. DLSÕs caching mechanism can be
integrated with several optimization techniques in order to improve cache
consistency and access performance.
FuseDLS will be a virtual file system interface that
allows users to access the Cloud-hosted DLS service and parse remote storage
server contents as convenient as accessing the local file system. It will
enable mounting remote storage servers into the usersÕ local host. Although filesystem mounting normally requires root privileges, FuseDLS will allow non-root users to be able to mount
remote file systems locally. FuseDLS will be based on
FUSE which is a simple interface to export a virtual file system to the Linux
kernel in user space. Whenever system I/O calls are made towards mounted FuseDLS resource, FUSE will capture these I/O calls in
kernel and forward them to user space library called libFuse.
This library will map local system I/O calls into remote storage I/O calls. The
FUSE library is available in most Linux distributions today. It is a very
practical way of implementing a user-level file system. The students will be
able to use this very convenient tool to develop the client side of a wide area
file system. Access to the Clous-hosted DLS service
and the necessary API will be provided to the students.
Project-2: MDS: Design and Implementation of a
Distributed Metadata Server for Global Name Space in a Wide-area File System:
One of the
important features of a distributed storage or file system is providing a
global unified name space across all participating sites/servers, which enables
easy data sharing without the knowledge of actual physical location of the
data. This feature depends on the Òlocation metadataÓ of all files/datasets in
the system being available to all participating sites. The design and
implementation of such a metadata service which would provide high consistency,
scalability, availability, and performance at the same time is a major
challenge.
A central
metadata server is generally easy to implement and ensures consistency but it
is also a single point of failure leading to low availability, low scalability
and low performance in many cases. Ensuring high availability requires
replication of metadata servers at local sites. Synchronously replicated
metadata servers provide high consistency but introduce a big synchronization
overhead which degrades especially the write performance of the metadata
operations. Asynchronously replicated metadata servers provide high performance
but introduce conflicts and consistency issues across replicated servers. Fully
distributed approaches can be more scalable but may suffer from performance and
consistency.
In this project,
the students will study different metadata server (MDS) layouts in terms of
high availability, scalability, consistency and performance. They will design a
distributed or replicated (or a hybrid) metadata approach which would achieve
all of these four features with minimal sacrifice. This approach will be
implemented as part of Ori and GlusterFS
file systems.
Project-3:
SmartFS: Design and Implementation of a Serverless Distributed File System for Smartphones:
In this project,
the students will develop a distributed file system (SmartFS)
for file access and sharing across multiple Android smartphones. This will be a
serverless file system, meaning it will not require
any external server component nor any of the participating phones acting like a
server. In that sense, this will be a peer-to-peer (p2p) distributed file
system with POSIX interface. Each phone will be able to export certain portions
of their local file system to other users (i.e. enable data sharing), and other
phones will be able to locate and import/mount those remote files/directories
to their local file system. Performance and scalability will be the major
design considerations. The authorization and authentication of remote clients
will also be an important component of the project. The connectivity between SmartFS participating phones can be either through WIFI or
through 4G. Android phones will be provided to the students to test their
implementation.
Project-4:
PowerFS: Energy-Aware File System Design
Reducing the power
consumption has become a major design consideration across a spectrum of
computing solutions, from supercomputers and datacenters to handhelds and
mobile computers. Servers that run GoogleÕs data centers have been
estimated to consume millions of dollars in electricity costs per year, and the
total worldwide spending on power management for enterprises was a staggering
$40 billion. There has been a large body of work on managing power and
improving energy efficiency in computing at different levels, including
computer architecture, operating systems, file systems, application level tools
and schedulers. In this project, the students will design an energy-aware file
system. They will take an existing file system such as HDFS or GlusterFS, analyze the current power consumption levels of
this file system, and then modify the file system to reduce its power consumption.
The changes to be made to the file system can be on a single component such as
the CPU scheduler, I/O scheduler, or the memory management unit which can
potentially have a great impact on its power consumption.