The Google File System中文版

摘要:Google文件系统,一个为数据中心的大规模分布应用设计的可伸缩的分布文件系统。Google文件系统虽然运行在廉价的普遍硬件上,但是可以提供容错能力,为大量客户机提供高性能的服务。

分享许多以前的分布文件系统拥有的相同目标的同时,我们的设计还受到我们对我们的应用负载和技术环境观察的影响,当前还是以后,反应出与早期文件系统的预期有明显的不同。所以我们重新审视了传统的选择,探索出完全不同的设计观点。

GFS作为存储平台被广泛的部署在Google内部,用在我们的服务中产生和处理数据,还用于那些需要大规模数据集的研究和开发。目前为止最大的集群利用数千台机器内的数千个硬盘,提供了数百T的存储空间,同时为数百个客户机服务。在这篇论文中,我们展现如何用文件系统接口扩展设计去支撑分布应用,讨论我们设计的许多方面,最后报告在小规模性能测试以及真实世界中系统的性能测试结果。

关键词:容错,可伸缩性,数据存储,集群存储

ABSTRACT:Google File System,a scalable distributed file system forlarge distributed data-intensive applications. It provides fault tolerancewhile running on inexpensive commodity hardware, and it delivers high aggregateperformance to a large number of clients.

While sharing many of the same goals as previousdistributed file systems, our design has been driven by observations of ourapplication workloads and technological environment,both current andanticipated, that reflect a marked departure from some earlier file systemassumptions. Thishas led us to reexamine traditional choices and exploreradicallydifferent design points.

GFS is widely deployed within Google as the storageplatform for the generation and processing of data used by our service as wellas research and development efforts that require large data sets. The largestcluster to date Provides hundreds of terabytes of storage across thousands ofdisks on over a thousand machines, and it is concurrently Accessed by hundredsof clients.In this paper, we present file system interface extensions designedto support distributed applications, discuss many aspects of our design, andreport measurements from both micro-benchmarks and real world use.

Keywords:Fault tolerance, scalability, data storage, clusteredstorage

1. 简介
Google文件系统(Google File System – GFS),用来满足Google迅速增长的数据处理需求。GFS与过去的分布文件系统拥有许多相同的目标,例如性能,可伸缩性,可靠性以及可用性。然而,它的设计还受到我们对我们的应用负载和技术环境观察的影响,不管现在还是将来,我们和早期文件系统的假设都有明显的不同。所以我们重新审视了传统的选择,采取了完全不同的设计观点。 阅读全文 The Google File System中文版