The ability to store data persistently and to retrieve it when required is the most important aspect of software applications. There are various methods to store data persistentlyfiles, databases, etc. Flat files allow the storage of data, though retrieving them using filters is not a sophisticated option. The idea of databases came into existence to address the issue of data storage and retrieval. There are different types of database management systems, which include:
- Relational data model
- Hierarchical data model
- Network data model
- Object based data model, etc.
Relational database management systems (RDBMS) have dominated the other types mentioned above due to various advantages such as tabular structure, linked data, integrity and the ability to apply a standardised, uniform language SQL to retrieve the relevant data in a relatively effortless manner. The ability to satisfy the ACID (atomic, consistent, isolated and durable) properties is one of the key features of an RDBMS. However, the structured nature of the RDBMS, which is normally considered as an advantage, can be seen as a restriction as well, when mapping hybrid real-world data.
NoSQL
NoSQL has evolved to make a significant shift from the traditional RDBMS approach. NoSQL refers to databases that don’t need to follow a fixed schema, and means “Not Only SQL”. It is primarily designed as a distributed data store, which is well suited for Web scale applications. NoSQL stores the data as key-value pairs rather than following a rigid predefined schema, and is focused on the CAP theorem (consistency, availability and partition tolerance). NoSQL data stores are focused on high availability, scalability and high performance. These databases come under the following categories:
- Key-value pairs
- Column-oriented
- Graph based
- Document based
There are many database tools belonging to the NoSQL type as illustrated in Figure 1.
CouchDB
It can be observed from Figure 1 that Apache CouchDB is classified as a NoSQL database under the document based category. The development of CouchDB, by Damien Katz, dates back to 2005. CouchDB is managed by the Apache Software Foundation. The code for it is written in the Erlang programming language. CouchDB facilitates the storage of data as documents that don’t need to be restricted by a rigid structure. The document based data stored in CouchDB are JSON documents. HTTP is used as an interfacing method with CouchDB.
CouchDB is popularly known as “relax DB”. The design principle behind CouchDB is that it allows developers to relax by providing so many built-in features. As you start Apache CouchDB, you will be presented with the message, “Apache CouchDB started. Its time to relax.”
CouchDB is well suited for applications that handle self-contained data. The traditional RDBMS stores the related data values scattered over various places, which are linked by keys. For example, in a product bill, the product details might be stored in a separate table, which is linked by the product ID field. Similarly, the vendor details might go into a separate table, which is linked by the vendor ID field. All this data is pulled by firing SQL queries, which join the related values and present them to the user. Though this application has lots of advantages such as redundancy elimination, integrity, etc, it is not needed in all places.
In CouchDB, the data is stored as a self-contained document. This self-containment mimics real world documents such as bills, tickets, business cards, etc. For example, if you take a movie ticket document, all the relevant fields are present in the ticket itself, so it makes the user understand the semantics. It doesn’t require the user to link some other external data. These types of documents are called self-contained. CouchDB attempts to mimic this real world nature by building the data store as a collection of self-contained documents represented by JSON (JavaScript Object Notation).
CouchDB installation
The recent stable version, CouchDB 1.6.1, can be downloaded from the home page of the official CouchDB website http://couchdb.apache.org/ CouchDB can be installed in all major operating systems such as Linux, Windows or Mac. Being an open source application, the code is also available.
The dependencies for CouchDB installation are listed below:
- Erlang OTP (>=R14B01, =<R17)
- ICU
- OpenSSL
- Mozilla SpiderMonkey (1.8.5)
- GNU Make
- GNU Compiler Collection
- libcurl
- help2man
- Python (>=2.7) for docs
- Python Sphinx (>=1.1.3)
These dependencies can be installed with the following commands:
sudo apt-get install build-essential sudo apt-get install erlang-base-hipe sudo apt-get install erlang-dev sudo apt-get install erlang-manpages sudo apt-get install erlang-eunit sudo apt-get install erlang-nox sudo apt-get install libicu-dev sudo apt-get install libmozjs-dev sudo apt-get install libcurl4-openssl-dev
After the installation of the dependencies, execute the following code:
./configure make&&sudo make install
If there are no issues, then you will be greeted with the following message:
“You have installed Apache CouchDB; it’s time to relax.”
To run CouchDB, fire the following command:
sudo -i -u couchdbcouchdb
In a Windows PC, you can download the .exe file and run the installer.
The CouchDB interface
CouchDB can be accessed either by using the terminal commands or the Web based interface. Both these approaches are explored in this section. To access it from the terminal, use curl and http, as shown below:
curlhttp://127.0.0.1:5984/ This will give you the following response: { couchdb:Welcome, uuid:d9b0cf5a3deee84afadb52f9a30664a2, version:1.6.1, vendor:{version:1.6.1-1,name:Homebrew} }
To list all the databases, enter the following commands:
curl -X GET http://127.0.0.1:5984/_all_dbs [_replicator,_users,baseball,mycontacts,test_suite_db,test_suite_db2]
To create a new database named ‘osfy’, execute the following command:
curl -X PUT http://127.0.0.1:5984/osfy
If the database is created successfully, then the response will be as follows:
{ok:true}
If you again list all the databases, a new entry with the name ‘osfy’ can be noted:
["_replicator","_users","baseball","mycontacts","osfy","test_suite_db","test_suite_db2"]
Futon
Futon is the Web based interface for Apache CouchDB. It can be accessed with the URL http://127.0.0.1:5984/_utils/index.html, as illustrated in Figure 2.
The new documents in the database can be added by clicking on the ‘New document’ link. Fields can be added by clicking on ‘Add Fields’, as shown in Figure 3.
The JSON based source of the documents, with its fields, is shown below:
{ "_id": "15e67c3cb021f0d1c7619bf66703164a", "_rev": "3-7121a64ddac1b604ee27681ff6b7a2a7", "month": "Dec 2015", "article title": "Deep Learning", "pages": "76-79" }
Do note that it is not mandatory to have an identical field structure in all the documents. For example, a new field ‘image count’ has been added to the second document as shown below:
{ "_id": "15e67c3cb021f0d1c7619bf6670318aa", "_rev": "1-b02bb3077cd60496142d4c38ed2353b7", "month": "Jan 2016", "article": "Ionic", "pages": "80-84", "image count": 5 }
In addition to adding textual fields, files can be uploaded as attachments. The attachments can be added by clicking the “Upload Attachment” link in the document section.
CouchDB – Data retrieval
The retrieval of data is as important as storing it, if not more so. To retrieve data from the CouchDB store, MapReduce is used. As there is no predefined structure, firing of queries to filter out the required data is not expected here. The retrieval is done using the MapReduce functionality. The code can be written in JavaScript. There are other options such as CoffeeScript.
To write the code for MapReduce, go to “View -> Temporary View” as shown in Figure 4.
For example, to navigate through the data store, the code shown in Figure 5 can be used. After entering the code in the map area, click on the “Run” button.
Replication management
CouchDB is primarily used with Web based applications. One of the major features it provides is replication management, which can be used to perform replication from source to target. For example, the local storage can be linked with the centralised server for replication management. In the Futon interface, the replication is carried out by “Tools -> Replicator”. The replication interface is shown in Figure 6.
The replication management in CouchDB is very powerful and integrated as a core feature, providing the following features: master-slave replication, master-to-master bidirectional replication, filtered replication, incremental replication and conflict management.
This article presents only the tip of the iceberg in the world of Apache CouchDB. The detailed tutorials provided by the official documentation will serve as further reading.
References
[1] http://docs.couchdb.org/en/1.6.1/replication/index.html
[2] http://couchdb.apache.org/
[3] http://nosql-database.org/
[4] https://cwiki.apache.org/confluence/display/COUCHDB/Introduction