December 3, 2015

DMS: Data Management Services

Distributed computing in grid-style wide-area cross-domain environments presents unique challenges to data management because of the heterogeneous, dynamic nature of applications and resources. In these environments, it is desirable that data be provided with application-tailored performance optimizations and techniques for improved reliability, and that data provisioning be managed in an automated manner, according to application requirements and adapting to changing environments.

A set of data management services is proposed to provide control and configuration of application-tailored data sessions using Grid Virtual File System (GVFS), GridFTP, and Secure Copy (SCP). GVFS employs proxies to virtualize Network File System (NFS) sessions via interception and modification of remote procedure calls. In our data management architecture, a File System Service (FSS) runs on every client and server and controls the local file system proxies; a Data Scheduler Service (DSS) provides centralized scheduling and control of data sessions through interactions with the FSSs; and a Data Replication Service (DRS) manages datasets and their replicas for fault tolerance and load balancing. Using these services, GVFS-based data sessions can be dynamically created on a per-application basis, and application-tailored customizations can be applied, including: the selection of block-based or whole-file based data transfer, the configuration of cache parameters and consistency protocols, the use of copy-on-write based check-pointing and replication-based failover, and the configuration of security mechanisms. These services support the interoperability with other grid middleware based on WSRF standards, and also employ the web service security standards to provide secure interactions and grid authentication and access control.

To address the complexity of managing many concurrent data sessions in a large-scale system, and adapt the sessions promptly according to dynamically changing environments, an autonomic data management system is proposed by enhancing the data management services into self-managing elements. Autonomic functions (monitor, analyze, plan and execute) are integrated into the services to provide automatic control over the distributed entities of GVFS sessions, in accordance with high-level objectives, and operate together to automatically achieve the desired data provisioning behaviors for applications. Important autonomic features are provided for cache configuration, replication configuration and replica generation, and server selection and session redirection. Experiments demonstrate that it can automatically and substantially improve both performance and reliability of grid-wide data access.

Publication

  • M. Zhao, J. Xu, and R. J. Figueiredo, “Towards Autonomic Grid Data Management with Virtualized Distributed File Systems,” In Proceedings of 3rd IEEE International Conference on Autonomic Computing (ICAC 2006), pp. 209-218, June 2006. paper presentation
  • M. Zhao, V. Chadha, and R. J. Figueiredo, “Supporting Application-tailored Grid File System Sessions with WSRF-based Services,” In Proceedings of 14th IEEE International Symposium on High Performance Distributed Computing (HPDC 2005), pp. 24-33, July 2005. paper presentation