/ Terminal

Text manipulation on the command line

Ok! This is ridiculous but fun! I woke this morning with the goal of clearing up my s3 buckets. Storing waste is expensive! My AWS expenses are eating into my beer budget! It's time to do some housekeeping!

So here is goes:

The output of my s3cmd listing on bucket translates to below format!

DIR   s3://tempbin/LM/LM/videos/
2018-12-24 06:12 500222497   s3://tempbin/LM/LM/BARRE A 720p.mp4
2018-12-24 06:12 570432475   s3://tempbin/LM/LM/BARRE B 720p.mp4
2018-12-24 06:12 195763537   s3://tempbin/LM/LM/BARRE TUTORIAL Beginner 720p.mp4
2018-12-24 06:12 1263819291   s3://tempbin/LM/LM/BODYPUMP 100 720p.mp4
2018-12-24 06:12 1252501516   s3://tempbin/LM/LM/BODYPUMP 101 720p.mp4
2018-12-24 06:12 625581873   s3://tempbin/LM/LM/BODYPUMP 101 Express 720p.mp4
2018-12-24 06:12 863296736   s3://tempbin/LM/LM/BODYPUMP 101 Upper Body 720p.mp4
2018-12-24 06:12 1141034117   s3://tempbin/LM/LM/BODYPUMP 102 720p.mp4
2018-12-24 06:12 353971345   s3://tempbin/LM/LM/BODYPUMP 102 Arm Focus 720p.mp4

I piped the output from my s3cmd ls to a file called lm-1.txt. Surely, there are ways to list the way I want but that isn't the point of today's experiment.

I wanted to parse through the files ls -l listing, extract only the file names and then use those filenames against my localstorage before deleting the s3 file!

Getting just the filenames was the challenge and here is how I managed to solve it!

First, remove the first three columns from the original output

awk '{$1=""; $2=""; $3=""; print}' lm-1.txt > lm-1-files.txt

Next, remove the path to the filenames and just output the final file!

awk -F"s3://tempbin/LM/LM/" '{ print $2}' lm-1-files.txt > lm-2-files.txt

That's it! I've got my ouput using just the terminal!


BARRE A 720p.mp4
BARRE B 720p.mp4
BARRE TUTORIAL Beginner 720p.mp4
BODYPUMP 100 720p.mp4
BODYPUMP 101 720p.mp4
BODYPUMP 101 Express 720p.mp4
BODYPUMP 101 Upper Body 720p.mp4
BODYPUMP 102 720p.mp4
BODYPUMP 102 Arm Focus 720p.mp4

Saves me from my typical go-to solutions of using R or python!

Complete script to check and delete the files from s3!

#!/bin/bash
awk '{$1=""; $2=""; $3=""; print}' lm-1.txt > lm-1-files.txt
awk -F"s3://tempbin/LM/LM/" '{ print $2}' lm-1-files.txt > lm-final.txt

awk '{$1=""; $2=""; $3=""; print}' lm-2.txt > lm-2-files.txt
awk -F"s3://tempbin/LM/LM/videos/" '{ print $2}' lm-2-files.txt >> lm-final.txt

filename='lm-final.txt'

# Credit: [shell - How to loop over the lines of a file? - Unix & Linux Stack Exchange](https://unix.stackexchange.com/questions/7011/how-to-loop-over-the-lines-of-a-file)
IFS=$'\n'       # make newlines the only separator
set -f          # disable globbing

for file in $(cat < "$filename"); do
  # echo "File is: $file"

  # Check if file exists
  # Credit: https://www.cyberciti.biz/faq/unix-linux-test-existence-of-file-in-bash/
  # https://stackoverflow.com/a/47788203
   if [[ ! -e "/Volumes/Seagate Backup Plus Drive/Les Mills/$file" ]]
   then
      echo "/Volumes/Seagate Backup Plus Drive/Les Mills/$file not found."
   fi
done


for file in $(cat < "$filename"); do
   if [[ ! -e "/Volumes/Seagate Backup Plus Drive/Les Mills/$file" ]]
   then
      echo "/Volumes/Seagate Backup Plus Drive/Les Mills/$file not found."
   else
      echo "/Volumes/Seagate Backup Plus Drive/Les Mills/$file found."
      s3cmd rm "s3://tempbin/LM/LM/$file"
      s3cmd rm "s3://tempbin/LM/LM/videos/$file"
   fi
done