Architecture
HDFS
- Designed to reliably store
large files
across machines
in a large cluster. - It stores
each file as a sequence of blocks
. The blocks of a file arereplicated for fault tolerance
. Theblock size and replication factor
areconfigurable per file
. Write Once Read Many
(WORM) model. It means only one client can write a file at a time.- Writes are always made at the end of the file, in
append-only
fashion. - Block size is
128 MB
by default.
NameNode (master)
- The limit to the number of files in a filesystem is governed by the amount of memory on the
namenode
.
DataNode (worker)
HDFS - Cheatsheet
List the blocks of a file
hdfs fsck $FILE_PATH -files -blocks
Copy a file from local to HDFS
hdfs dfs -copyFromLocal $LOCAL_FILE_PATH $HDFS_FILE_PATH
List
Deployment
Docker
-
apache/hadoop - Docker Image | Docker Hub (opens in a new tab)
Official Apache Hadoop Docker image, with the latest stable release
-
cloudera/quickstart - Docker Image | Docker Hub (opens in a new tab)
Cloudera Quickstart Docker image, with a pre-configured Hadoop cluster