Getting the Most Out of PostgreSQL

0
4011

The amazing thing about the PostgreSQL RDBMS is that it has been developed entirely by a worldwide team of volunteers. It is not owned by any corporation or entity, and is totally free of cost. Its source code is available for anyone to modify and use as one pleases. Let’s take a look at how to use it to our advantage.

PostgreSQL is a relational database management system (RDBMS). It is free and open source software, and its source code is available under the PostgreSQL licence, which is a liberal open source licence. Users are free to use, modify and distribute PostgreSQL in any form.

This article is logically divided into three sections. The first section is PostgreSQL specific and covers its installation, data types and DB related operations. The second section covers basic SQL, and the third discusses advanced SQL.

An overview of PostgreSQL and its salient features

PostgreSQL was designed to run on UNIX-like platforms. However, it was also designed to be portable so that it could run on various platforms such as Mac OS X, Solaris and Windows.

The salient features of PostgreSQL are:

1) Supports user-defined data types

2) Supports sophisticated locking mechanisms

3) Supports nested transactions (checkpoints)

4) Supports multi-version concurrency control

5) Supports asynchronous replication

Environment setup

You need to perform the following steps to set up the PostgreSQL development environment.

1) Visit https://www.enterprisedb.com/downloads/postgres-postgresql-downloads link.

2) Select the appropriate version and OS, and start downloading. I have downloaded ‘postgresql-10.1-3-linux-x64.run’.

3) Make the downloaded script executable and execute it as a root user to begin installation:

$ chmod +x postgresql-10.1-3-linux-x64.run

$ sudo ./postgresql-10.1-3-linux-x64.run

4) Follow the on-screen instructions to complete the installation.

Basic database operations

In my environment, PostgreSQL is installed in the /opt/PostgreSQL directory. All its utilities are located under the bin directory. Include this directory under the PATH variable using the following command:

$ export PATH=$PATH:/opt/PostgreSQL/10/bin

Let us create a PostgreSQL database using the createdb command:

$ createdb -h localhost -p 5432 -U postgres testdb

In the above command:

  • The -h option indicates the host’s FQDN/IP address
  • The – p option indicates port number
  • The – U option indicates the database user
  • – testdb indicates the database’s name

Note: Hereafter, we’ll use the psql utility, which is the command line PostgreSQL client, to perform DB operations. psql supports SQL queries as well its own commands. All psql commands start with a backslash character—for instance, \h, \q and so on. Also, the psql prompt is denoted by ‘postgres=#’ when it is not connected to any database, or ‘testdb=#’ when it is connected to ‘testdb’.

To access or connect to a database, we can use the psql SQL prompt. To connect to the database, execute the following command:

$ psql -U postgres

To list all available DBs, execute the following command in the psql prompt:

postgres=# \l

List of databases

Name | Owner | Encoding | Collate | Ctype | Access privileges

-----+-------+----------+---------+-------+------------------

testdb | postgres | UTF8 | en_IN | en_IN |

This command lists our ‘testdb’ database.

We can create multiple databases according to our requirements. To perform any operation on the database, we first have to establish a connection with it. Execute the command given below with the psql prompt to do this:

postgres=# \c testdb

Upon successful connection, the following messages will be shown in the terminal:

You are now connected to database “testdb” as user “postgres”. Also prompt will change to ‘testdb=#’.

To delete a database, we can use the DROP DATABASE statement as follows:

postgres=# DROP DATABASE testdb;

Note: Please ensure that each SQL query is terminated by a semicolon when using the psql prompt

PostgreSQL data types

Along with basic data types PostgreSQL supports many advanced data types. In this section, we’ll discuss some of the important ones.

1) Boolean: Valid value for this data type is true or false.

2) Character: This data type is further divided into three sub-categories:

a) CHAR(n): This is a fixed-length data type. If you insert a string that is shorter than the length of the column, PostgreSQL pads spaces to it.

b) VARCHAR(n): This is a variable-length data type. PostgreSQL does not pad spaces when the stored string is shorter than the length of the column.

c) TEXT: This is also a variable-length data type. Theoretically, text data is a character string with unlimited length.

3) Numeric: This data type is further divided into two sub-categories:

a) Integer: Integers can be 2- or 4-byte long. 2-byte integers are denoted by SMALLINT and the valid range is -32768 to 32767. 4-byte integers are denoted by INT and the valid range is -214783648 to +214783647.

b) Floating point number: Single precision floating point numbers are denoted by float, whereas double precision floating point numbers are denoted by real or float8.

4) Temporal data types: Temporal data types allow you to store date and time. PostgreSQL supports the following temporal data types:

a) DATE: Stores date

b) TIME: Stores time

c) TIMESTAMP: Stores date and time

d) TIMESTAMPZ: Stores both timestamp and time zone data

e) INTERVAL: Stores periods of time

5) Arrays: PostgreSQL can store arrays of strings, integers and so on.

6) JSON: This data type is further divided into two sub-categories:

a) JSON: This is plain JSON and requires re-parsing for each operation.

b) JSONB: This stores JSON data in binary format. Unlike JSON, it supports indexing.

7) UUID: This data type allows you to store Universal Unique Identifiers defined by RFC-4122.

8) Special data types: Besides the primitive data types, PostgreSQL also provides several special geometric and network related data types as listed below:

a) box: A rectangular box

b) line: A set of points

c) point: A geometric pair of numbers

d) lseg: A line segment

e) polygon: A closed geometric

f) inet: IP4 address

g) macaddr: A MAC address

SQL queries in PostgreSQL

In this section, we’ll discuss CRUD operations on tables. We’ll discuss various SQL statements with relevant examples. SQL is a case-insensitive language. However, in this article, all SQL keywords are denoted using uppercase letters.

Managing tables

We can use the CREATE TABLE statement to create a new table. Its syntax is as follows:

CREATE TABLE table_name (

column_name1 column_data_type,

column_name2 column_data_type,

...

...

column_name2 column_data_type,

);

Let us create a table to store student information, as follows:

testdb=# CREATE TABLE student_table (

id SMALLINT,

name VARCHAR(32),

age SMALLINT,

address VARCHAR(128),

department VARCHAR(128)

);

To list a table, use the psql command as follows:

testdb=# \d

We can use the DROP TABLE statement to remove a table and its data from the database permanently. The syntax is as follows:

DROP TABLE table_name;

For instance, to drop student_table, execute the command given below:

testdb=# DROP TABLE student_table;

We can use the ALTER TABLE statement to update table properties, like its name, columns, data type and so on. To rename a table, use the RENAME TO statement as follows:

testdb=# ALTER TABLE student_table RENAME TO new_student_table;

To remove a column from the table definition, use the DROP COLUMN statement as follows:

testdb=# ALTER TABLE student_table DROP COLUMN AGE;

To add a new column to the table definition, use the ADD COLUMN statement as follows:

testdb=# ALTER TABLE student_table ADD COLUMN AGE SMALLINT;

To remove all data from a table, use the TRUNCATE TABLE statement as follows:

testdb=# TRUNCATE TABLE student_table;

Working with rows

We can use the INSERT INTO statement to insert data into a table. Its syntax is as follows:

INSERT INTO table_name (column1, column2, ... columnN) VALUE(value1, value2, ... valueN);

For instance, the statement below inserts a new row into the table:

testdb=# INSERT INTO student_table (id, name, age, address, department) VALUES(1

, ‘Jarvis’, 21, ‘Earth’, ‘CS’);

To update an existing record, use the UPDATE statement. Its syntax is as follows:

UPDATE table_name

SET column1 = value1, column2 = value2...., columnN = valueN

WHERE [condition];

For instance, the statement below updates the age of the student whose ID is ‘1’:

testdb=# UPDATE student_table

SET age=22

WHERE id=1;

To delete a record, we can use the DELETE FROM statement. Its syntax is as follows:

DELETE FROM table_name

where [condition];

For instance, the statement below deletes the row with the ID ‘1’:

testdb=# DELETE FROM student_table WHERE id=1;

Querying data

We can use the SELECT statement to query data from the table. Its syntax is as follows:

SELECT column1, column2 ... columnN FROM table_name

WHERE [condition];

For instance, the statement below lists the names of the students aged ‘21’:

testdb=# SELECT name FROM student_table WHERE age=21;

We can use the ORDER BY clause to sort records. To sort by ascending order, we have to use the ASC keyword and to sort by descending order, we have to use the DESC keyword. Its syntax is as follows:

SELECT column1, column2...columnN

FROM table_name

[WHERE condition]

[ORDER BY column1, column2, .. columnN] [ASC | DESC];

For instance, the statement below sorts a column in the ascending order by age:

testdb=# SELECT * FROM student_table ORDER BY age ASC;

We can use the DISTINCT keyword to eliminate duplicate records. For instance, the following statement lists only unique values:

testdb=# SELECT DISTINCT * FROM student_table;

Advanced SQL

In this section, we’ll discuss some of the advanced features of PostgreSQL like built-in functions, transactions, the import/export feature and programming language interfaces.

Built-in functions

PostgreSQL has many useful built-in functions and this section discusses some of the more important among them.

1. COUNT: As the name suggests, this is used to count the number of records in a table. The statement below shows the usage:

testdb=# SELECT COUNT(*) FROM student_table;

2. MAX: This is used to find out the records with the maximum value. For instance, the statement below finds the record with the maximum age:

testdb=# SELECT MAX(age) FROM student_table;

3. MIN: This is used to find out the records with the minimum value. For instance, the statement below finds the record with the minimum age:

testdb=# SELECT MIN(age) FROM student_table;

4. AVG: This is used to calculate the average of a record set. For instance, the statement below calculates the average of the ages:

testdb=# SELECT AVG(age) FROM student_table;

5. SUM: This function is used to calculate the sum of the record set. For instance, the statement below calculates the sum of the ages:

testdb=# SELECT SUM(age) FROM student_table;

6. ARRAY_AGG: The array aggregate function is used to concatenate the input values into an array as shown below:

testdb=# SELECT ARRAY_AGG(age) FROM student_table;

This statement will show the aggregate values as follows:

{21,22,22,21}

In addition to this, PostgreSQL supports various numeric and string related functions. You can refer to the official documentation to know more details.

Transactions

Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. Transactions ensure data integrity.

PostgreSQL supports the following transaction control commands:

1) BEGIN: Starts a new transaction

2) COMMIT: Saves changes to databases

3) ROLLBACK: Rolls back the changes

Note that transactional control commands are only used with the DML statements like INSERT, UPDATE and DELETE. They cannot be used while creating tables or dropping them, because these operations are automatically committed in the database.

Let us look at examples of rollback first. In this example, we’ll perform the following actions:

1) Begin the transaction

2) Delete a record from the database

3) Using the SELECT query, ensure that the record is deleted

4) Roll back a transaction

5) Using the SELECT query, ensure that the record is added back

testdb=# BEGIN; # Step-1

testdb=# DELETE FROM student_table WHERE id = 1; # Step-2

testdb=# SELECT * FROM student_table; # Step-3

testdb=# ROLLBACK; # Step-4

testdb=# SELECT * FROM student_table; # Step-5

Let us look at an example of commit. In this example, we’ll perform the following actions:

1) First, begin the transaction

2) Delete a record from the database

3) Commit the transaction

testdb=# BEGIN; # Step-1

testdb=# DELETE FROM student_table WHERE id = 1; # Step-2

testdb=# COMMIT; # Step-3

Import and export databases

We can export table data into a CSV file. The example given below shows this.

To export the entire table, execute the following command:

testdb=# COPY student_table TO ‘/tmp/student_table.csv’ DELIMITER ‘,’ CSV HEADER;

We can also export only certain columns from the table. For instance, the command given below exports name and age columns only:

testdb=# COPY student_table(name, age) TO ‘/tmp/student_table.csv’ DELIMITER ‘,’ CSV HEADER;

We can export a table without a header, i.e., column names. To achieve this, exclude the HEADER keyword in the SQL command, as shown below:

testdb=# COPY student_table TO ‘/tmp/student_table.csv’ DELIMITER ‘,’ CSV;

In addition to this, we can also import data from CSV files. The statement below shows this:

testdb=# COPY student_table FROM ‘/tmp/student_table.csv’ DELIMITER ‘,’ CSV HEADER;

Python language interface

PostgreSQL supports programming interfaces in C/C++, Java, Python, Perl and PHP. Using these interfaces, we can perform operations on databases. This section discusses the Python language interface. PostgreSQL can be accessed in Python using the psycopg2 module. To install it, execute the following command in a terminal:

$ sudo apt-get install python-psycopg2

Now, let us write a Python program to display the table of contents:

#!/usr/bin/python

import psycopg2

# Step-1: Establish connection with DB

conn = psycopg2.connect(database = “testdb”,

user = “postgres”,

password = “my-password”,

host = “127.0.0.1”,

port = “5432”)

# Step-2: Create a cursor to perform operation on DB

cur = conn.cursor()

# Step-3: Execute select query

cur.execute(“SELECT id, name from student_table”)

cnt=1

rows = cur.fetchall()

for row in rows:

print “Record-%d:” %cnt

print “\tID : “, row[0]

print “\tNAME: “, row[1]

cnt += 1

# Step-4: Close DB connection

conn.close()

Note that this program doesn’t modify the database; hence, the commit operation is not required. If any operation modifies the database, like INSERT, UPDATE, DELETE and so on, then we have to execute the commit() method after modification; otherwise, the changes won’t be reflected in the database. The Python program given below shows the usage of the commit method:

#!/usr/bin/python

import psycopg2

# Step-1: Establish connection with DB

conn = psycopg2.connect(database = “testdb”,

user = “postgres”,

password = “my-password”,

host = “127.0.0.1”,

port = “5432”)

# Step-2: Create a cursor to perform operation on DB

cur = conn.cursor()

# Step-3: Execute insert query

cur.execute(‘’’INSERT INTO student_table VALUES(

4, ‘Twitty’, 20, ‘Venus’, ‘CS’)

This article has just given a few tips on how best to use PostgreSQL. There is a lot more that can be done with this excellent database.

LEAVE A REPLY

Please enter your comment!
Please enter your name here