Python Requests: Interacting with the Web Made Easy

2
68579
Python Requests: Interacting with the Web Made Easy

python-visual

‘’Requests’’ is an Apache 2 HTTP library written in Python. Delve deeper into the topic and learn how it can be installed, and how Python Requests can be used to your advantage.

Python contains libraries that make it easy to interact with websites to perform tasks like logging into Gmail, viewing Web pages, filling forms, viewing and saving cookies—with nothing more than a few lines of code. Once a request is sent to a server, a Python script returns the information in almost the same way as a browser. Hence, the work done by a Python script is more or less similar to that of a Web browser. Some reasons why Python is preferred for accessing the Web are:

  • Scripts can be written to automate interaction with a Web page.
  • RSS feeds can be obtained and parsed.
  • A Web spider to test your site or search other sites can be written.
  • Beautifulsoup (a Python module) is used for parsing HTML and XML files.

These are some simple tasks that can be accomplished using Python. Django, a Web framework, and Scrapy, an open source Web crawler framework, are both written in Python.
Urllib/Urllib2
Urllib is the default Python module used for opening HTTP URLs. It can accomplish other tasks such as basic authentication, getting cookies, serving GET/POST requests, error handling, viewing headers. Urllib2 is an improved Python module and provides additional functionalities to several methods. Hence some urllib methods have been replaced by urllib2 methods. One such method is urllib.urlopen() , which has been replaced by urllib2.urlopen() in the later versions of Python Requests. urllib2.urlopen() can accept an instance of a request class as an argument, which allows you to specify headers that are capable of fetching URLs using a variety of different protocols like HTTP or FTP; it can also accept a request object to set the headers for a URL request. On the other hand, urllib.urlopen() accepts only a URL.
In spite of having additional features, urllib cannot be completely replaced by urllib2 since the former provides important methods (e.g., urlencode(), used for generating GET query strings) that are absent in urllib2. Hence, methods from both urllib and urllib2 are accessed for accomplishing tasks. In spite of using both these modules, there are various drawbacks:

  • First and foremost, it is unclear which module – urllib or urllib2 – is to be used and this is confusing, especially for beginners.
  • In spite of urllib2 being an improved module, it does not provide all the functionalities.
  • The documentation for both urllib and urllib2 is extremely difficult to understand and is heavily over-engineered.
  • Even for a simple GET request, it is impossible to write a short script in Python using urllib2.

Here is a sample of the code required to perform a simple login:

import urllib  
import urllib2  
import re  
import cookielib  
 
jar = cookielib.FileCookieJar("cookie")  
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))  
 
url = 'http://example.com/login.php'  
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'  
 
data =  
{  
       "Submit": " ",  
         "username":"x",  
        "password":"x",  
}  
 
data = urllib.urlencode(data)  
login_request = urllib2.Request(url, data)  
login_reply = opener.open(login_request)  
login_reply_data = login_reply.read()  
 
login_success_msg = re.compile("Login Successful")  
 
if login_success_msg.search(login_reply_data) is not None:  
#Procede  
else:  
Print “Check whether you have given the right credentials”

Performing a simple login operation requires importing four different modules and writing large volumes of complex code. HTTP is simple and so should the scripts that access it be. Hence, it is vital to develop simpler and more efficient modules for accessing the Web when using Python.

Python Requests
‘Requests’ is a simple, easy-to-use HTTP library written in Python. The lead developer is Kenneth Reitz, who is also a member of the Python Software Foundation. The current version is 2.2.1, released on January 23, 2014, and it is compatible with Python versions 2.6.8 and above. Requests makes interacting with Web services seamless, and it overcomes most of the difficulties faced with urllib/urllib2.
Installation: Installing Python Requests is very simple and can be done using any of the two methods mentioned below:

  • Pip: This works with Python versions 2.6, 2.7, 3.1, 3.2 and 3.3:
$ pip install requests
$ sudo apt-get install python-pip
                or
$ sudo apt-get install python-pip python-dev build-essential
  • Download from the source code:
$sudo apt-get install build-essential libncursesw5-dev libreadline5-dev libssl-dev libgdbm-dev libc6-dev libsqlite3-dev tk-dev

$wget http://www.python.org/ftp/python/3.x/Python-3.x.tar.bz2
$tar -xjf Python-3.xtar.bz2 cd Python-3.x
$./configure --prefix=/opt/python3
$make
$sudo make install

Parsing JSON
JSON is JavaScript Object Notation and is used for transmitting data between client and server. It is often found that Web pages have JSON embedded in their code. Hence, while receiving requests, we often get a response in the JSON format, which needs to be decoded. Python Requests have a built-in JSON decoder, which helps in parsing JSON code. You can just import the JSON module in your code.

How to know if the response is in the JSON format: After making a get request there is a response object ‘r’ from which we can get information like the status code, header etc.

import requests
 
r = requests.get(“http://www.example.com”)
print r.status_code
print r.headers[‘content-type]
 
Output:
200
'application/json'

If the content-type in the header is of the type ‘application/json’ then the response is in the JSON format.
How to parse using the JSON built-in module and Requests:

  • json.load(response) is used for decoding the response

json.dump(request) is used for encoding the request

import json
import requests
response = requests.get(url=url, params=paras)
data = json.load(response)

Differences while decoding with JSON
The data we get after decoding JSON encoding data is different from the original data, as shown below:

data = [{ ‘a’:‘A’, ‘b’:(2, 4), 'c':3.0 }]
encoded_data = json.dumps(data)
decoded_data = json.loads(encoded_data)
print ‘Original data: ’ data
print ‘Decoded data: ‘ decoded_data
 
Output:
Original data: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
Decoded data: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}

]

Features of Python Requests

  • Connection pooling: There is a pool of connections, and a connection is released only once all its data has been read.
  • Sessions with cookie persistence: You can make a session object and set certain parameters and cookie values. This allows you to persist these parameters and cookies across all requests made from the session instance.

Browser-style SSL verification: We can verify the SSL certificates for HTTPS, as follows:

requests.get('https://example.com', verify=True)

For a wrong certificate, an SSL error is raised. If you don’t want to verify the certificate, then:

requests.get('https://example.com', verify=False)
  • Proxies: You can make use of proxies for sending individual requests:
    proxy =  
    {
    "http": "http://10.10.1.10:3128"
    }
                requests.get("http://example.com", proxies=proxy)
  • Cookies: We can get the cookies set by the server from the response object ‘r’:
    url = 'http://example.com/cookie'
     r = requests.get(url)
     r.cookies['cookie_name']

    We can also send cookies to the server, as follows:

    url = 'http://example2.com/cookies'
    cookies = dict(cookie1='This_is_a_cookie')
    r = requests.get(url, cookies=cookies)
  • Response content: Requests can automatically decode the response based on the header values. Using r.encoding, you can also change the encoding type:
  r.encoding = 'utf-8’
  • Exceptions: The various types of exceptions that are handled are:
  • Connection error: DNS failure, connection refused
  • HTTP error: Invalid HTTP response
  • Too many redirects: If the maximum number of redirects that is allowed is exceeded
  • Connection timeouts: You can make the request stop waiting for a response after a certain time-interval. After this interval, the connection can close:
    r = requests.get(“http://example.com”, timeout = 1)

    Advantages of Python Requests
    Here are the advantages of Python Requests over urllib/urllib2:

  • Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.
  • Python Requests automatically decodes the response into Unicode.
  • Python Requests automatically saves the contents, enabling you to access it multiple times, unlike the read-once file-like object returned by urllib2.urlopen().
  • Python Requests handles multi-part file uploads, as well as automatic form-encoding.
  • In Python, Requests .get() is a method, auth is an optional parameter (since we may or may not require authentication).
  • Python Requests supports the entire restful API, i.e., all its methods – PUT, GET, DELETE, POST.
  • Python Requests has a built-in JSON decoder.
  • Unlike the urllib/urllib2 module, there is no confusion caused by Requests, as there is only a single module that can do the entire task.
  • Can write easier and shorter code.
    A comparison of Python Requests and urllib/urllib2
    Here are some simple examples using which you can easily make a comparison between Python Requests and urllib/urllib2.
    Example 1: A simple HTTP GET request and authentication
    Using urllib2: In this example, to make a simple HTTP GET request we need to call a lot of methods. Remembering the names of these methods can be difficult:

    import urllib2
     
    url = 'https://www.example.com'
    username= 'user'
    password = 'pass'
     
    request = urllib2.Request(url)
     
    password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
    password_manager.add_password(None, url, username, password)
     
    auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
    opener = urllib2.build_opener(auth_manager)
     
    urllib2.install_opener(opener)
     
    handler = urllib2.urlopen(request)
     
    print handler.getcode()
    print handler.headers.getheader('content-type')
    

    Using Requests: The task of making a simple HTTP GET request can be accomplished in a single line when compared to the large code written using urllib2.

    import requests
     
    r = requests.get('https://www.example.com', auth=('user', 'pass'))
     
    print r.status_code
    print r.headers['content-type']

    Example 2: Making a POST request
    Using urllib2/urllib: Note that in this example we had to make use of both the urllib and urllib2 modules in order to write a script for a simple POST request:

    import urllib
    import urllib2
     
    url = "http://www.example.com"
    values = {"firstname":" abc ", "lastname":" xyz "}
     
    header = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)"}
     
    values = urllib.urlencode(values)
    request = urllib2.Request(url, values, header)
     
    response = urllib2.urlopen(request)
    html_content = response.read()

    Using Requests: Here we do not require import multiple modules and a single requests module can accomplish the entire task:

    import requests
     
    values = {""firstname":" abc ", "lastname":" xyz "}
    r = requests.post('https://www.example.com, data=values)

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here