The Complete Magazine on Open Source

Visualisation Tools in Python

SHARE
/ 7020 1

Visualisation

The value of data is enhanced by visualisation. With numerous methods of visualisation available, one would like to choose an option that gives the best results. The versatile Python programming language offers one such option.

With new domains emerging daily in the world of computer science, data has come to play a major role in everyday life. There are several emerging fields dedicated to data related activities such as data wrangling, data mining, Web analytics, data analysis, and so on. The field of data visualisation has seen a steep rise because data also needs to be represented graphically and in a way that people can understand. Several tools have come up with various graphical presentation styles, which make the task of data visualisation much easier.

Python, as a language, has evolved in several domains of data search and analysis. It also has a lot of tools and libraries that make the task of visualisation much easier to implement.

The Python tools and libraries available for data visualisation are Matplotlib, Pandas, Seaborn, Ggplot, Bokeh, Pygal, Plotly, NumPy and SciPy. Some of the visualisation that can be achieved using the above tools and libraries is: histograms, box plots, violin plots, bar charts, line charts, stacked column charts, scatter plots, bubble plots, pie charts, heat maps, etc.

Figure 1

Figure 1: Aluminium traded contracts (in lots) for 2014, 2015, 2016

Figure 2

Figure 2: Histogram for traded aluminium contracts

The above mentioned tools can be developed using Python only, but there are other ways to achieve visualisation. Various JavaScript libraries and other Web frameworks can be used for the same purpose. For example, D3.js or chart.js can be used in combination with Python for visualisation. The Django Web framework is another good tool for this purpose, as well as to create complete Web apps in Python. These days, data scientists and analysts make extensive use of the above tools in combination with others to achieve the desired visualisation. JavaScript, along with HTML5 and CSS, adds more style to suit one’s taste and beautify the visualisation. Let’s look at some examples using these tools.
First, make sure you have the following installed in your system.

OS: Windows 7
Python version: 3.5.1
Editor: Notepad++
Now, take the following steps.
1. Install the required tools and library.
2. Use Pip to install Pandas, Seaborn and other libraries.
3. The command to use is pip –m install XXXX. Replace X with different libraries.
4. Create a folder pydev and an empty file python_visualization.py.
5. Keep your csv or other file format in the pydev folder for which you want to generate the visualisation.

Note: Certain libraries need to be manually downloaded and then Pip is run to install the libraries. In case you are using Anaconda, it comes with a complete suite of libraries.

In our example, we will use a csv file which contains certain records to be visualised, for which we will generate a bar chart.
The csv file has been downloaded from one of the commodity trading indices and depicts the volume of contracts traded during three financial years—2014, 2015 and 2016. We will plot the year against the number of contracts traded, as follows:

File: - Python_visualization.py
import pandas as pd ……………………………………………………………line 1
import seaborn as sns ……………………………………………………line 2
import matplotlib.pyplot as plt………………………………………………line 3
commodity = pd.read_csv(“Historical_Quantity_Data.csv”) ……….…………………………………………………………line 4
commodity = commodity.sort(‘Traded Contracts (in Lots)’,ascending=False) ………line 5

sns.set_style(“darkgrid”)…………………………………………………line 6
bar_plot = sns.barplot(x=commodity[“Year”],y=budget[“Traded Contracts (in Lots)”], 
palette=”muted”,
order=commodity[“Year”].tolist()) …………………….……line 7
plt.xticks(rotation=90) ………………………………………………………………….line 8
plt.show() ………………………………….line 9

We import the necessary libraries in Lines 1, 2 and 3. Here we are using Pandas, Seaborn and Matplotlib.

In Line 4, we declare a variable commodity and assign it the value read from the csv file. Pandas variable PD is used as the file handler to read the file. Next, in Line 5, we sort the values of the column ‘Traded Contracts’ in descending order. In Line 6, we set the style of the grid as ‘darkgrid’. In Line 7 we declare a variable bar_plot, which will contain the values to be plotted on the x-axis and y-axis. We don’t want a palette; so, we mute it and order it year-wise. We don’t want our plot to be rotated; so we fix the rotation at 90. Finally, we need to show the bar plot in the window; hence, we call the show function.

Now run the program and you will get the image shown in Figure 2.

This is one example of how we can generate bar charts easily.

Figure 3

Figure 3: Sample list of private airlines

Figure 4

Figure 4: Scatter plot for ID vs private airlines based on length of name

Next, let’s look at another example of a scatter plot visualisation in Python. For this example, we have sample data of the list of airlines in csv format. We will be plotting a scatter diagram to check the length of airlines’ names against their IDs.

Scatter_visualization.py

# Import the pandas library.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Read in the airlines data.
airlines = pd.read_csv(“airlines.csv”, header=None, dtype=str)
#Provide Column Header names
airlines.columns = [“id”, “name”]

name_lengths = airlines[“name”].apply(lambda x: len(str(x)))
#plot scatter x axis as ids and y axis as length of names
data = pd.DataFrame({“lengths”: name_lengths, “ids”: airlines[“id”].astype(int)})
sns.jointplot(x=”ids”, y=”lengths”, data=data)
plt.show()

Once you run the above program, you will get the result shown in Figure 4.

Hint: There are many sites that offer different datasets in various formats. You can use https://catalog.data.gov/dataset?res_format=CSV  for this purpose. It gives a list of various data in json, xml and csv format.

Python provides a lot of tools that are simple to use and can generate different kinds of data visualisation. Enjoy coding.