3. Modules

Modules are reusable libraries of code in Python. Python comes with many standard library modules.

A module is imported using the import statement.

>>> import time
>>> print(time.asctime())
'Fri Mar 30 12:59:21 2012'

In this example, we’ve imported the time module and called the asctime function from that module, which returns current time as a string.

There is also another way to use the import statement.

>>> from time import asctime
>>> asctime()
'Fri Mar 30 13:01:37 2012'

Here were imported just the asctime function from the time module.

The pydoc command provides help on any module or a function.

$ pydoc time
Help on module time:

NAME
    time - This module provides various functions to manipulate time values.
...

$ pydoc time.asctime
Help on built-in function asctime in time:

time.asctime = asctime(...)
    asctime([tuple]) -> string
...

On Windows, the pydoc command is not available. The work-around is to use, the built-in help function.

>>> help('time')
Help on module time:

NAME
    time - This module provides various functions to manipulate time values.
...

Writing our own modules is very simple.

For example, create a file called num.py with the following content.

def square(x):
    return x * x

def cube(x):
    return x * x * x

Now open Python interterter:

>>> import num
>>> num.square(3)
9
>>> num.cube(3)
27

Thats all we’ve written a python library.

Try pydoc num (pydoc.bat numbers on Windows) to see documentation for this numbers modules. It won’t have any documentation as we haven’t providied anything yet.

In Python, it is possible to associate documentation for each module, function using docstrings. Docstrings are strings written at the top of the module or at the beginning of a function.

Lets try to document our num module by changing the contents of num.py

"""The num module provides utilties to work on numbers.

Current it provides square and cube.
"""

def square(x):
    """Computes square of a number."""
    return x * x

def cube(x):
    """Computes cube of a number."""
    return x * x

The pydoc command will now show us the doumentation nicely formatted.

Help on module num:

NAME
    num - The num module provides utilties to work on numbers.

FILE
    /Users/anand/num.py

DESCRIPTION
    Current it provides square and cube.

FUNCTIONS
    cube(x)
        Computes cube of a number.

    square(x)
        Computes square of a number.

Under the hood, python stores the documentation as a special field called __doc__.

>>> import os
>>> print(os.getcwd.__doc__)
getcwd() -> path

Return a string representing the current working directory.

3.1. Standard Library

Python comes with many standard library modules. Lets look at some of the most commonly used ones.

3.1.1. os module

The os and os.path modules provides functionality to work with files, directories etc.

Problem 1: Write a program to list all files in the given directory.

Problem 2: Write a program extcount.py to count number of files for each extension in the given directory. The program should take a directory name as argument and print count and extension for each available file extension.

$ python extcount.py src/
py
txt
csv

Problem 3: Write a program to list all the files in the given directory along with their length and last modification time. The output should contain one line for each file containing filename, length and modification date separated by tabs.

Hint: see help for os.stat.

Problem 4: Write a program to print directory tree. The program should take path of a directory as argument and print all the files in it recursively as a tree.

$ python dirtree.py foo
foo
|-- a.txt
|-- b.txt
|-- code
|   |-- a.py
|   |-- b.py
|   |-- docs
|   |   |-- a.txt
|   |   \-- b.txt
|   \-- x.py
\-- z.txt

3.1.2. urllib module

The urllib module provides functionality to download webpages.

>>> from urllib.request import urlopen
>>> response = urlopen("http://python.org/")
>>> print(response.headers)
Date: Fri, 30 Mar 2012 09:24:55 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Fri, 30 Mar 2012 08:42:25 GMT
ETag: "105800d-4b7b-4bc71d1db9e40"
Accept-Ranges: bytes
Content-Length: 19323
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug

>>> response.header['Content-Type']
'text/html'

>>> content = request.read()

Problem 5: Write a program wget.py to download a given URL. The program should accept a URL as argument, download it and save it with the basename of the URL. If the URL ends with a /, consider the basename as index.html.

$ python wget.py http://docs.python.org/tutorial/interpreter.html
saving http://docs.python.org/tutorial/interpreter.html as interpreter.html.

$ python wget.py http://docs.python.org/tutorial/
saving http://docs.python.org/tutorial/ as index.html.

3.1.3. re module

Problem 6: Write a program antihtml.py that takes a URL as argument, downloads the html from web and print it after stripping html tags.

$ python antihtml.py index.html
...
The Python interpreter is usually installed as /usr/local/bin/python on
those machines where it is available; putting /usr/local/bin in your
...

Problem 7: Write a function make_slug that takes a name converts it into a slug. A slug is a string where spaces and special characters are replaced by a hyphen, typically used to create blog post URL from post title. It should also make sure there are no more than one hyphen in any place and there are no hyphens at the biginning and end of the slug.

>>> make_slug("hello world")
'hello-world'
>>> make_slug("hello  world!")
'hello-world'
>>> make_slug(" --hello-  world--")
'hello-world'

Problem 8: Write a program links.py that takes URL of a webpage as argument and prints all the URLs linked from that webpage.

Problem 9: Write a regular expression to validate a phone number.

3.1.4. json module

Problem 10: Write a program myip.py to print the external IP address of the machine. Use the response from http://httpbin.org/get and read the IP address from there. The program should print only the IP address and nothing else.

3.1.5. zipfile module

The zipfile module provides interface to read and write zip files.

Here are some examples to demonstate the power of zipfile module.

The following example prints names of all the files in a zip archive.

import zipfile
z = zipfile.ZipFile("a.zip")
for name in z.namelist():
    print(name)

The following example prints each file in the zip archive.

import zipfile
z = zipfile.ZipFile("a.zip")
for name in z.namelist():
    print()
    print("FILE:", name)
    print()
    print(z.read(name))

Problem 11: Write a python program zip.py to create a zip file. The program should take name of zip file as first argument and files to add as rest of the arguments.

$ python zip.py foo.zip file1.txt file2.txt

Problem 12: Write a program mydoc.py to implement the functionality of pydoc. The program should take the module name as argument and print documentation for the module and each of the functions defined in that module.

$ python mydoc.py os
Help on module os:

DESCRIPTION

os - OS routines for Mac, NT, or Posix depending on what system we're on.
...

FUNCTIONS

getcwd()
    ...

Hints:

The dir function to get all entries of a module
The inspect.isfunction function can be used to test if given object is a function
x.__doc__ gives the docstring for x.
The __import__ function can be used to import a module by name

3.2. Installing third-party modules

PyPI, The Python Package Index maintains the list of Python packages available. The third-party module developers usually register at PyPI and uploads their packages there.

The standard way to installing a python module is using pip or easy_install. Pip is more modern and perferred.

Lets start with installing easy_install.

Download the easy_install install script ez_setup.py.
Run it using Python.

That will install easy_install, the script used to install third-party python packages.

Before installing new packages, lets understand how to manage virtual environments for installing python packages.

Earlier the only way of installing python packages was system wide. When used this way, packages installed for one project can conflict with other and create trouble. So people invented a way to create isolated Python environment to install packages. This tool is called virtualenv.

To install virtualenv:

$ easy_install virtualenv

Installing virtualenv also installs the pip command, a better replace for easy_install.

Once it is installed, create a new virtual env by running the virtualenv command.

$ virtualenv testenv

Now to switch to that env.

On UNIX/Mac OS X:

$ source testenv/bin/activate

On Windows:

> testenv\Scripts\activate

Now the virtualenv testenv is activated.

Now all the packages installed will be limited to this virtualenv. Lets try to install a third-party package.

$ pip install tablib

This installs a third-party library called tablib.

The tablib library is a small little library to work with tabular data and write csv and Excel files.

Here is a simple example.

# create a dataset
data = tablib.Dataset()

# Add rows
data.append(["A", 1])
data.append(["B", 2])
data.append(["C", 3])

# save as csv
with open('test.csv', 'wb') as f:
    f.write(data.csv)

# save as Excel
with open('test.xls', 'wb') as f:
    f.write(data.xls)

# save as Excel 07+
with open('test.xlsx', 'wb') as f:
    f.write(data.xlsx)

It is even possible to create multi-sheet excel files.

sheet1 = tablib.Dataset()
sheet1.append(["A1", 1])
sheet1.append(["A2", 2])

sheet2 = tablib.Dataset()
sheet2.append(["B1", 1])
sheet2.append(["B2", 2])


book = tablib.Databook([data1, data2])
with open('book.xlsx', 'wb') as f:
    f.write(book.xlsx)

Problem 13: Write a program csv2xls.py that reads a csv file and exports it as Excel file. The prigram should take two arguments. The name of the csv file to read as first argument and the name of the Excel file to write as the second argument.

Problem 14: Create a new virtualenv and install BeautifulSoup. BeautifulSoup is very good library for parsing HTML. Try using it to extract all HTML links from a webpage.

Read the BeautifulSoup documentation to get started.