/ Data Analytics

Processing JSON data on the command line

Originally posted at: https://blog.mypad.in/processing-json-data-on-the-command-line/

JSON has become the defacto standard of data transfer for a major part of the internet now. Particularly with the ubiquitous usage of mobile apps, pretty much all data is transferred via JSON.

Processing JSON rapidly is a recurring need in data processing.

Multiple libraries exist in Python and R for processing JSON. However, I have a strong preference to process JSON on the command line with tools particularly developed for such purpose.

jq is my favorite command line JSON processor. jq makes processing JSON a pleasure and lot of fun. Its used widely and lots of material is available on the web to help with data wrangling.

I'd like to demonstrate processing my favorite Spinning providers' data from Peloton below.

I've got all the rides to data from Pelton in a file called consolidated-od-classes.json. Typically for data analysis using R or Python I need the data to be transform to csv to be loaded into a data frame for manipulation with data.table

Here is a simple command that does the job:

# generated from other scripts
# download at https://l.mypad.in/peloton-od-classes
output="./data/peloton/consolidated-od-classes.json"

# for tracking performance
start=`date +%s`

result="./data/peloton/result.csv"

echo 'title,id,scheduled_start_time' > $result

jq -r '[.title, .id, .scheduled_start_time] | @csv' $output >> $result

end=`date +%s`

runtime=$((end-start))
echo $runtime

The csv extract is ready!

Screenshot-2020-01-01-at-9.49.02-PM

References