Using jq to Consume JSON in the Shell

2
18223

This article is a tutorial on using jq as a JSON parser and fetching information about the weather from different cities.

JSON has become the most prevalent way of consuming Web APIs. If you try to find the API documentation of a popular service, chances are that the API will respond in JSON format. Many mainstream languages even have JSON parsers built in. But when it comes to shell scripting, there is no inbuilt JSON parser, and the only hacker way of processing JSON is with a combination of awk and sed, which are very painful to use.

There are many JSON parsers apart from jq but, in this article, we will focus only on this option.

Installation

jq is a single binary program with no dependencies, so installation is as simple as downloading the binary from https://stedolan.github.io/jq/, copying the binary in /bin or

/usr/bin and setting permissions. Many Linux distributions provide jq in the repositories, so installing jq is as easy as using the following commands:

sudo apt install jq

…or:

sudo pacman -S jq

Installation instructions may vary depending upon the distribution. Detailed instructions are available at https://stedolan.github.io/jq/download/.

Usage

For this demonstration, version 1.5 of jq was used. All the code examples are available at https://github.com/jatindhankhar/jq-tutorial. jq can be used in conjunction with other tools like cat and curl, by piping, or be used to directly read from the file, although the former is more popular in practice. When working with jq, two fantastic resources can be used. The first one is the documentation at https://stedolan.github.io/jq/manual/, and the second is the Online Playground (https://jqplay.org/) where one can play with jq and even share the snippets.

Throughout this article, we will use different API endpoints of the MetaWeather API (https://www.metaweather.com/api). The simplest use of jq is to pretty format JSON data.

Let’s fetch the list of cities that contain the word ‘new’ in them, and then use this information to further fetch details of a particular city, as follows:

curl -sS https://www.metaweather.com/api/location/search/?query=new

The above command will fetch all cities containing ‘new’ in their name. At this point, the output is not formatted.

[{“title”:”New York”,”location_type”:”City”,”woeid”:2459115,”latt_long”:”40.71455,-74.007118”},{“title”:”New Delhi”,”location_type”:”City”,”woeid”:28743736,”latt_long”:”28.643999,77.091003”},{“title”:”New Orleans”,”location_type”:”City”,”woeid”:2458833,”latt_long”:”29.953690,-90.077713”},{“title”:”Newcastle”,”location_type”:”City”,”woeid”:30079,”latt_long”:”54.977940,-1.611620”},{“title”:”Newark”,”location_type”:”City”,”woeid”:2459269,”latt_long”:”40.731972,-74.174179”}]

Let’s pretty format by piping the curl output to jq as follows:

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq

The screenshot shown in Figure 1 compares the output of both commands.

Now that we have some data to work upon, we can use jq to filter the keys. The simplest filter available is ‘.’ which does nothing and filters the whole document as it is. Filters are passed to jq in single quotes. By looking at the output, we can see that all the objects are trapped inside a JSON array. To filter out the array, we use .[] , which will display all items inside an array. To target a specific item by index, we place the index number inside .[0].

To display the first item, use the following code:

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq ‘.[0]’

{

“title”: “New York”,

“location_type”: “City”,

“woeid”: 2459115,

“latt_long”: “40.71455,-74.007118”

}

To display only the available cities, we add another filter, which is the key name itself (in our case, .title). We can combine multiple filters using the | (pipe) operator.

Here we combine the .[] filter with .title in this way: .[] | .title . For simple queries, we can avoid the | operator and rewrite it as .[] .title, but we will use the | operator to combine queries.

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq ‘.[] | .title’

“New York”

“New Delhi”

“New Orleans”

“Newcastle”

“Newark”

But what if we want to display multiple keys together? Just separate them by ‘,’.

Now, let’s display the city along with its ID (woeid):

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq ‘.[] | .title,.woeid’

“New York”

2459115

“New Delhi”

28743736

“New Orleans”

2458833

“Newcastle”

30079

“Newark”

2459269

The output looks good, but what if we format the output and print it on a single line? For that we can use string interpolation. To use keys inside a string pattern, we use backslash and parentheses so that they are not executed.

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq ‘.[] | “For \(.title) code is \(.woeid)”’

“For New York code is 2459115”

“For New Delhi code is 28743736”

“For New Orleans code is 2458833”

“For Newcastle code is 30079”

“For Newark code is 2459269”

In our case, JSON is small, but if it is too big and we need to filter it based on a key value (like display the information for New Delhi), jq provides the select keyword for that operation.

curl -sS https://www.metaweather.com/api/location/search/\?query\=new | jq ‘ .[] | select(.title == “New Delhi”) ‘

{

“title”: “New Delhi”,

“location_type”: “City”,

“woeid”: 28743736,

“latt_long”: “28.643999,77.091003”

}

Now that we have the Where on Earth ID (woeid) for New Delhi, we can retrieve more information about New Delhi using the endpoint https://www.metaweather.com/api/location/woeid/.

The JSON structure for this endpoint looks like what’s shown in Figure 3.

Consolidated_weather contains an array of JSON objects with weather information, and the sources key contains an array of JSON objects from which particular weather information was fetched.

This time, let’s store JSON in a file named weather.json instead of directly piping data. This will help us avoid making an API call every time we want to perform an operation and, instead, we can use the saved JSON.

curl -sS https://www.metaweather.com/api/location/28743736/ > weather.json

Now we can use jq in the format jq ‘filters’ weather.json and we can also load filters from a file using the -f parameter. The command is jq -f filters.txt weather.json, but we can just load the JSON file and pass filters in the command line.

Let’s list the weather followed by the source name. Since both sources and consolidated_weather is of the same length (get the length using the length filter), we can use range to generate an index and use string interpolation. There is transpose and map inbuilt as well. Covering all of them won’t be possible in a single article.

jq ‘range(0;([.sources[]] | length)) as $i | “ \(.sources[$i] .title) predicts \(.consolidated_weather[$i] .weather_state_name)”’ weather.json

“ BBC predicts Light Cloud”

“ Forecast.io predicts Clear”

“ Met Office predicts Clear”

“ OpenWeatherMap predicts Clear”

“ World Weather Online predicts Clear”

“ Yahoo predicts Clear”

There are so many functions and filters but we will use sort_by and date functions, and end this article by printing the forecast for each day in ascending order.

# Format Date

# This function takes value via the Pipe (|) operator

def format_date(x):

x |strptime(“%Y-%m-%d”) | mktime | strftime(“%a - %d, %B”);

def print_location:

. | “

Location: \(.title)

Coordinates : \(.latt_long) “;

def print_data:

. | “

------------------------------------------------

| \(format_date(.applicable_date))\t\t |

| Humidity : .\(.humidity)\t\t |

| Weather State: \(.weather_state_name)\t\t\t |

------------------------------------------------”;

def process_weather_data:

. | sort_by(.applicable_date)[] | print_data;

. as $root | print_location, (.consolidated_weather | process_weather_data)

Save the above code as filter.txt.

sort_by sorts the value by data. format_date takes dates as parameters and extracts short day names, dates and months. print_location and print_data do not take any parameter, and can be applied after the pipe operator; and the default parameter for a parameterless function will be ‘.’

jq -f filter.txt weather.json -r

-r will return a raw string. The output is shown in Figure 4.

I hope this article has given you an overview of all that jq can achieve. If you are looking for a tool that is easy to use in shell scripts, jq can help you out; so give it a try.

2 COMMENTS

  1. Hi Jatin

    If you don’t mind i’d like to translate you a question related with this post.

    How do you pretty print the json body if curl command is ran with “-i” argument?

    -i, –include Include protocol headers in the output (H/F)

    Example:

    curl -sS -i https://www.metaweather.com/api/location/search/\?query\=new | jq ‘.[0]’
    parse error: Invalid numeric literal at line 1, column 9

    In this case i’d like to print out the header and json with pretty print, something like below example:

    HTTP/1.1 200 OK
    x-xss-protection: 1; mode=block
    Content-Language: en
    x-content-type-options: nosniff
    strict-transport-security: max-age=2592000; includeSubDomains
    Vary: Accept-Language, Cookie
    Allow: GET, HEAD, OPTIONS
    x-frame-options: DENY
    Content-Type: application/json
    X-Cloud-Trace-Context: 1e8ed110bc408086a359d5fc8d074ced
    Date: Thu, 17 May 2018 10:51:33 GMT
    Server: Google Frontend
    Content-Length: 475
    {
    “title”: “New York”,
    “location_type”: “City”,
    “woeid”: 2459115,
    “latt_long”: “40.71455,-74.007118”
    }

    Please, can you support me with this question ???

    KR/Airam

  2. Hi Ariam,

    I am afraid there is no one liner way  to accomplish this.
    You can save the output to a variable and then print both header and content separately
    A big thanks to this script https://gist.github.com/cirla/1c0411c1dc1bb2fe0e9f
    To print only json with pretty print, use this

    curl -sS -i  https://www.metaweather.com/api/location/search/?query=new  | sed “1,/^s*$(printf ‘r’)*$/d” |  jq ‘.[0]’

    curl -sS -i https://www.metaweather.com/api/location/search/?query=new | sed “1,/^s*$(printf ‘r’)*$/d” | jq ‘.[0]’
    {
    “title”: “New York”,
    “location_type”: “City”,
    “woeid”: 2459115,
    “latt_long”: “40.71455,-74.007118”
    }

    To print only header

    curl -sS -i  https://www.metaweather.com/api/location/search/?query=new  | sed “/^s*$(printf ‘r’)*$/q”

    curl -sS -i https://www.metaweather.com/api/location/search/?query=new | sed “/^s*$(printf ‘r’)*$/q”
    HTTP/2 200
    x-xss-protection: 1; mode=block
    content-language: en
    x-content-type-options: nosniff
    strict-transport-security: max-age=2592000; includeSubDomains
    vary: Accept-Language, Cookie
    allow: GET, HEAD, OPTIONS
    x-frame-options: DENY
    content-type: application/json
    x-cloud-trace-context: ece3c63b6070f0b0d545fba8f51d2cf7
    date: Wed, 23 May 2018 12:42:10 GMT
    server: Google Frontend
    content-length: 475

    To print both you can employ a short script

    Here is a modified version of the original gist https://gist.github.com/jatindhankhar/0c1d6c5b2d3f7c7520c4b4afba916cfc
    You can modify it and make an alias to pass the url as parameters, when I did that there was some issue with url encoding and script not working

    response=$(curl -sS -i https://www.metaweather.com/api/location/search/?query=new)

    echo “$response” | sed “/^s*$(printf ‘r’)*$/q”

    echo “$response” | sed “1,/^s*$(printf ‘r’)*$/d” | jq ‘.[0]’

    If you have more queries, let me know. 
    Happy to help :)
    Thanks,

    Jatin Dhankhar

LEAVE A REPLY

Please enter your comment!
Please enter your name here