File Protocol Module

This Lab will compare the performance of different file transfer utilities and EFS.

  1. In the CLI, run the following commands to verify the EC2 instance has approximately 2GB of EBS volume.
du -csh /ebs/tutorial/data-1m/

CloudFormation Template

  1. Run the following commands to verify the EC2 instance has 2000 files in the EBS volume.
find /ebs/tutorial/data-1m/. -type f | wc -l

CloudFormation Template

  1. Run the following commands to transfer files from EBS to EFS using rsync. Record the completion time. The command will take a few minutes to complete, don’t worry if it hangs.
sudo su
sync && echo 3 > /proc/sys/vm/drop_caches
exit
time rsync -r /ebs/tutorial/data-1m/ /efs/tutorial/rsync/

CloudFormation Template

  1. Run the following commands to transfer files from EBS to EFS using the copy command. Record the completion time. The command will take a few minutes to complete, don’t worry if it hangs.
sudo su
sync && echo 3 > /proc/sys/vm/drop_caches
exit
time cp -r /ebs/tutorial/data-1m/* /efs/tutorial/cp/

CloudFormation Template

  1. Run the following commands to set the $thread variable to 4 threads per CPU.
threads=$(($(nproc --all) * 4))
echo $threads

CloudFormation Template

  1. Run the following commands to transfer files from EBS to EFS using fpsync. Record the completion time. The command will take a few minutes to complete.
sudo su
sync && echo 3 > /proc/sys/vm/drop_caches
exit
time fpsync -n ${threads} -v /ebs/tutorial/data-1m/ /efs/tutorial/fpsync/

CloudFormation Template

  1. Run the following commands to transfer files from EBS to EFS using cp + GNU Parallel. Record the completion time. The command will take a few minutes to complete.
sudo su
sync && echo 3 > /proc/sys/vm/drop_caches
exit
time find /ebs/tutorial/data-1m/. -type f | parallel --will-cite -j ${threads} cp {} /efs/tutorial/parallelcp

CloudFormation Template

Not all file transfer utilities work the same. The file system is distributed over an unlimited number of storage servers, and this distributed data storage design means that multithreaded applications such as fpsync, mcp, and GNU in parallel can leverage throughput levels. significant amount and IOPS to EFS when compared to single-threaded applications.