Introduction to Python Programming¶

Getting Started¶

Python comes with an interactive interpreter. When you type python in your shell or command prompt, the python interpreter becomes active with a >>> prompt and waits for your commands.

$ python
Python 3.5.1 |Anaconda 2.5.0 (x86_64)| (default, Dec  7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

In the class we are going to use Jupyter/Ipython notebook. It provides an the Python interpreter in a web page.

Try any simple mathematical expressions and Python will evaluate them and print the response back to you.

1 + 2

3

2 * 7

14

# 2 raised to the power of 1000
2 ** 1000

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

The print function can be used to print something explicitly.

print("Hello Python!")

Hello Python!

Running Scripts¶

%%file hello.py
print("hello world")

Overwriting hello.py

The %%file in the first line tell Jypyter notebook to save the contents after the first line as a file instead of executing it as python code.

Now we can run that python file as a script.

!python hello.py

hello world

Again the ! is a marker to tell Jyputer notebook that we are executing an external command and not Python code.

Unlike in the interpreter, we need to explicitly use the print function to print any value.

Problem: Write a program to print the value of 123 * 456.

Variables¶

Create a variable is as simple as assigning a value to it.

x = 4
print(x*x)

16

If we assign a new value to an exiting variable, its value will get replaced.

x = 5
print(x)

5

In Python, variables don't have any types. They are place holders that can hold any type of value.

x = "python"
print(x)

python

Datatypes¶

Python has integers.

1 + 2

3

Python has floating point numbers.

1.2 + 2.3

3.5

Python has strings. Strings can be enclosed either in single quotes or double quotes.

"hello world"

'hello world'

x = "hello"

x + " world"

'hello world'

print(x, "python")

hello python

Multi-line strings are written using three double quotes or three single quotes.

x = """This is a multi line string.
one
two
thee
"""

print(x)

This is a multi line string.
one
two
thee

Strings in Python support the usual escape codes.

print("a\nb\nc")

a
b
c

print("a\tb\tc")

a	b	c

Python has lists.

x = ["a", "b", "c"]

x

['a', 'b', 'c']

The len function can be used to find the length of a list.

len(x)

3

The [] syntax can be used to get the element at an index in the list.

x[0]

'a'

x[1]

'b'

x[2]

'c'

It is perfectly okay for a list to have elements of different datatypes.

x = [1, 2, 'hello', [3, 4]]

Python has a type called tuple to represent fixed length records.

x = (1, 2)
a, b = x

a

1

b

2

x = (1, 2)
print(x[0], x[1])

1 2

Python has boolean values. The built-in variables True and False represents boolean truth and false.

True

True

False

False

Three are more datatypes like dictionary, set etc. We'll see them later.

Fucntions¶

Python has many built-in functions. We've already seen print and len functions.

print("hello", "python")

hello python

len(["a", "b", "c"])

3

len("hello world")

11

Python is usally doesn't allow operations on incompatible datatypes. For example, trying to add a number to a string will result in an error.

1 + '2'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-d79ad31fc60d> in <module>()
----> 1 1 + '2'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Unlike, some other programming languages, Python doesn't try to implicitly convert the datatypes.

Python provides built-in functions int to convert a string into an integer and function str to convert value of any datatype to string.

int("2")

2

1 + int("2")

3

str(1234)

'1234'

Example: counting number of digits¶

len(str(1234))

4

len(str(2**100))

31

2**1000

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

len(str(2**1000))

302

Writing Custom Functions¶

Lets see how to write our own functions.

def square(x):
    return x*x

print(square(4))

16

The important thing to notice here is that the body of the square function is indented.

def square(x):
    return x*x

print(square(4))

Python uses indentation, the leading spaces, to identify the body of the function. Indentation is very important in Python and it is used for all kinds of compound statements like if, for etc.

Function body can have multiple lines as well.

def square(x):
    y = x*x
    return y

print(square(4))

16

By default the variables defined in a function are considered local the function.

y = 0

def square(x):
    y = x*x
    return y

print(square(4))
print(y)

16
0

The y inside square function is different from the global y, even though they have the same name.

Problem: Write a function cube to compute cube of a number.

    >>> cube(2)
    8
    >>> cube(3)
    27

Problem: Write a function count_digits that takes a number of argument and returns the number of digits it has.

>>> count_digits(12345)
5
>>> count_digits(2**100)
31

`print` vs. `return`¶

Consider the following two functions:

def square1(x):
    return x*x

def square2(x):
    print(x*x)
    
print(square1(4))
square2(4)

16
16

One of them is returning the value and the other is printing the value. Which one of these functions is better and why?

The function square1 is better because it is returning the value and that value can be used in other computations, that is not possible with square2.

1 + square1(2)

5

square1(square1(2))

16

print("square of 2 is", square1(2))

square of 2 is 4

Functions are values too¶

Just like integers, string etc., functions are also just another type of values. They can also be assigned to a variable, passed to a function as argument and even can be returned from another function.

def square(x):
    return x*x

print(square(2))

4

print(square)

<function square at 0x104433e18>

f = square

f(4)

16

Lets see a use case.

def square(x):
    return x*x

def sum_of_squares(x, y):
    return square(x) + square(y)

print(sum_of_squares(3, 4))

25

def cube(x):
    return x*x*x

def sum_of_cubes(x, y):
    return cube(x) + cube(y)

print(sum_of_cubes(3, 4))

91

The sum_of_squares and sum_of_cubes functions are doing almost the same thing. Can we generalize that into a single function?

def sumof(f, x, y):
    return f(x) + f(y)

print(sumof(square, 3, 4))
print(sumof(cube, 3, 4))

25
91

sumof(abs, -3, 4)

7

sumof(len, "hello", "python")

11

There are even some built-in functions which accept functions as arguments, like in the above example.

max(['one', 'two', 'three', 'four', 'five'])

'two'

The above example computed the maximum of a list of words based on the dictionary order.

What if we want to find the longest word? In other words, find the maximum based on length?

max(['one', 'two', 'three', 'four', 'five'], key=len)

'three'

def mylen(x):
    print("len", x)
    return len(x)

max(['one', 'two', 'three', 'four', 'five'], key=mylen)

len one
len two
len three
len four
len five

'three'

Methods¶

Methods are special type of functions that operate on an object.

x = "Hello"

x.upper()

'HELLO'

The upper method, available on strings, converts the string to upper case. Similarly the lower method converts a string to lower case.

x.lower()

'hello'

Couple of more examples of methods on strings:

"mathematics".count("mat")

2

"mathematics".replace("mat", "rat")

'ratheratics'

Problem: Write a function icount to count the number of occurances of a substring in a string, ignoring the case.

>>> icount("mathematics", "mat")
2
>>> icount("Mathematics", "mat")
2    
>>> icount("Mathematics", "MAT")
2

The split method splits the string into multiple parts.

sentence = "Anything that can go wrong, will go wrong."
sentence.split()

['Anything', 'that', 'can', 'go', 'wrong,', 'will', 'go', 'wrong.']

The split method split the string at every whitespace. Optionally a different delimiter can be specified.

sentence.split(",")

['Anything that can go wrong', ' will go wrong.']

Problem: Write a function count_words to count the number of words in a sentence.

>>> count_words("one two three")
3

Problem: Write a function longest_word to find the longest word in a sentence.

>>> longest_word("one two three four five")
'three'

We've seen how to split strings, lets see how to join them.

"-".join(["a", "b", "c"])

'a-b-c'

Modules¶

Modules are the reusable libraries containing useful functions and variables. In Python, modules are imported using the import statement.

The following example, imports the time modules and calls a function asctime from that module.

import time
print(time.asctime())

Mon Mar 14 12:38:47 2016

That is like the date command in unix, isn't it? Lets try to make it into a reusable script.

%%file date.py
import time
print(time.asctime())

Overwriting date.py

!python date.py

Mon Mar 14 12:38:47 2016

To find help about a module, try help("modulename") in the python interperter, or help(module) after importing it:

>>> import time
>>> help(time)
...

or pydoc modulename in the unix terminal.

$ pydoc time
...

print(time.time()) # time in seconds since epoch

1457939327.096859

Lets try some more modules.

import os
os.listdir(".")

['.git',
 '.ipynb_checkpoints',
 '0-installation.ipynb',
 '1-introduction-to-python.ipynb',
 '1-python.ipynb',
 '1-python.ipynb.bk',
 'Readme.md',
 'Untitled.ipynb',
 'a.txt',
 'args.py',
 'b.txt',
 'cs109-2015',
 'datasets',
 'date.py',
 'echo.py',
 'echo2.py',
 'files',
 'hello.py',
 'linear-regression.ipynb',
 'notes',
 'notes.ipynb',
 'numbers.txt',
 'three.txt',
 'wc.py',
 'wordfreq.py',
 'words.txt']

The above example lists all files in the current directory.

Problem: Write a function count_files to count the number of files in a directory.

>>> count_files("/tmp")
11

The `random` module¶

import random
random.choice(['a', 'b', 'c', 'd'])

'a'

random.choice(['a', 'b', 'c', 'd'])

'd'

The choice function takes a list of elements and returns one of them at random.

Problem: Write a function random_word that takes a sentence as argument and returns one random word from it.

>>> random_word("one two three")
'two'
>>> random_word("one two three")
'one'

Reading command-line arguments¶

Command-line arguments are usually the prefered way to pass inputs to a program.

In Python, the command-line arguments passed to a program are available in a special variable argv in the sys module.

%%file args.py
import sys
print(sys.argv)

Overwriting args.py

!python args.py hello world

['args.py', 'hello', 'world']

!python args.py

['args.py']

The sys.argv variable is a list containing the program name followed by the list of arguments passed to it.

Note that the elements are sys.argv will always be strings.

!python args.py 1 2 3 4 5

['args.py', '1', '2', '3', '4', '5']

Example: echo.py¶

Lets write a program to print the first command-line argument.

%%file echo.py
import sys
print(sys.argv[1])

Overwriting echo.py

!python echo.py hello

hello

!python echo.py hello world

hello

Problem: Write a program square.py that takes a number as command-line argument and prints its square.

$ python square.py 4
16

Problem: Write a program add.py that takes two numbers as command-line arguments and prints their sum.

$ python add.py 3 4
7

Conditional Expressions¶

The following are some examples of conditional expressions in Python. Conditional expressions always evaluate to a boolean value.

age = 30

age == 30

True

age > 25

True

name = "Alice" # assignment

name == "Alice" # comparison

True

name != "Alice"

False

Conditional expressions can be combined using and and or.

name == "Alice" and age > 25

True

age < 20 or age > 40

False

# we can also write the above expression as:
20 < age < 40

True

The not clause can be used to negate a boolean expression.

not name == "Alice"

False

Often it is useful to wrap the boolean as a function.

def is_senior_citizen(age):
    return age >= 60

is_senior_citizen(70)

True

The in operator can be used to check if an element is part of a string or a list.

"hell" in "hello"

True

"yell" in "hello"

False

"yell" not in "hello"

True

"a" in ["a", "b", "c"]

True

vowels = "aeiou"
def is_vowel(c):
    return c in vowels

is_vowel('a')

True

is_vowel('x')

False

There are some useful methods on strings to check if a string starts with a prefix or ends with a suffix.

name = "python"

name.startswith("py")

True

name.endswith("on")

True

def is_python_file(filename):
    return filename.endswith(".py")

is_python_file("hello.py")

True

is_python_file("hello.c")

False

The if statement¶

The if statement is what we use for conditional execution.

n = 35
if n % 2 == 0:
    print("even")
else:
    print("odd")

odd

A typical if statement looks like this:

if condition:
    if-block
else:
    else-block

If the condition is True, the if-block is executed and if the condition is False, the else-block is executed.

It is important to note that both if-block and else-block are indented to indicate the grouping.

If we use the if condition in a function, there will be two levels of indentation. One for the function and another for if/else.

def checkeven(n):
    if n % 2 == 0:
        print(n, "is even")
    else:
        print(n, "is odd")
        
checkeven(35)
checkeven(48)

35 is odd
48 is even

The else part is optional. It is perfectly alright to have an if statement without the else.

filename = "a.c"
if not filename.endswith(".py"):
    print("please provide a python file")

please provide a python file

Checking multiple conditions can be done using if followed by multiple elif statements.

def checknumber(n):
    if n < 10:
        print(n, "is a one digit number")
    elif n < 100:
        print(n, "is a two digit number")
    else:
        print(n, "is a big number")
        
checknumber(5)
checknumber(55)
checknumber(555)

5 is a one digit number
55 is a two digit number
555 is a big number

Problem: Write a function minimum to compute minimum of two numbers (without using the built-in min function).

>>> minimum(3, 7)
3
>>> minimum(13, 7)
7

Problem: Write a function minimum3 to compute minimum of three numbers. Can you do it by using the minimum function implemented above?

>>> minimum3(2, 3, 4)
2
>>> minimum3(12, 3, 4)
3
>>> minimum3(12, 13, 4)
4

Lists¶

We have already looked at lists briefly.

x = ['a', 'b', 'c', 'd']

x[1]

'b'

len(x)

4

For Loop¶

The for loop is used to iterate over a list of elements. Just like if, the body of the for loop is indented.

x = ['a', 'b', 'c', 'd']
for c in x:
    print(c)

a
b
c
d

for c in x:
    print(c, c.upper())

a A
b B
c C
d D

for c in x:
    print(c, end=" ")

a b c d

The built-in range function can be used to iterate over a sequence of numbers.

for i in range(5):
    print(i)

range(5) # numbers up to 5 (from 0)

range(0, 5)

list(range(5))

[0, 1, 2, 3, 4]

list(range(0, 5))

[0, 1, 2, 3, 4]

list(range(0, 5, 2)) # from 0 to 5 in steps of 2

[0, 2, 4]

list(range(2, 10, 3)) # from 2 to 10 in steps of 3

[2, 5, 8]

Python has a built-in function sum to compute sum of a list of numbers.

sum([1, 2, 3, 4])

10

sum(range(5))

10

# sum of all intergers below one million
sum(range(1000000))

499999500000

Lets try to implement our own sum function.

def my_sum(numbers):
    result = 0
    for n in numbers:
        result += n
    return result

print(my_sum([1, 2, 3, 4]))

10

print(my_sum(range(10)))

45

print(my_sum(range(1000000)))

499999500000

Problem: Write a function product to compute the product of given list of numbers.

>>> product([1, 2, 3, 4])
24

Problem: Write a function factorial to compute factorial of a number. Can you use the above implementation of product function in computing this?

>>> factorial(4)
24

Problem: Write a program listfiles.py that takes path to a directory as command-line argument and prints all the files (and directories) in it.

$ python listfiles.py .
Readme.md
1-python.ipynb
...

Hint: See os.listdir

Modifying and Growing Lists¶

Elements of a list can be replaced and new elements can appended at the end.

x = ['a', 'b', 'c', 'd']

x[1] = 'bb'

x

['a', 'bb', 'c', 'd']

x.append('e')

x

['a', 'bb', 'c', 'd', 'e']

Example: squares¶

Lets write a program to computes squares of a list of numbers.

def squares(numbers):
    result = []
    for n in numbers:
        result.append(n*n)
    return result

print(squares([1, 2, 3, 4]))

[1, 4, 9, 16]

# sum of squares of numbers below one million
sum(squares(range(1000000)))

333332833333500000

Problem: Write a function evens that takes a list of numbers as argument and returns a list containing only the even numbers from it.

>>> evens([1, 2, 3, 4, 5, 6])
[2, 4, 6]

List Comprehensions¶

List Comprehensions provide a concise way of transforming one list into another. Quite often a complex task can be modelled in a single line of code.

x = [1, 2, 3, 4]

[a*a for a in x]

[1, 4, 9, 16]

[a*a for a in x if a%2 == 0]

[4, 16]

[a for a in x if a%2 == 0]

[2, 4]

How to compute sum of squares of all even numbers below one million?

sum([x*x for x in range(1000000) if x%2 == 0])

166666166667000000

Problem: Write a function list_pyfiles that takes a directory as argument and returns a list of all python files in that directory.

>>> list_pyfiles(".")
["square.py", "hello.py", "args.py"]

Iteration Patterns¶

Lets look at various iteration patterns commonly used in Python. We've already seen the first two of them.

Iterating over a list¶

This is the most commonly used iteration pattern.

x = ['a', 'b', 'c', 'd']
for c in x:
    print(c, c.upper())

a A
b B
c C
d D

Iterating over a sequence of numbers¶

The range function can be used to iterate over a sequence of numbers.

for i in range(5):
    print(i)

for i in range(1, 5):
    print(i)

Iterating over two lists at the same time¶

The zip function can be used to iterate over two lists at the same time.

names = ["a", "b", "c", "d"]
scores = [10, 20, 30, 40]

for name, score in zip(names, scores):
    print(name, score)

a 10
b 20
c 30
d 40

zip(names, scores)

<zip at 0x10446ef88>

list(zip(names, scores))

[('a', 10), ('b', 20), ('c', 30), ('d', 40)]

It is not too hard to implement zip function yourself. Why not give it a try?

Iterating over the index and element¶

Some times we need both the index and the element when iterating over a list. The built-in function enumerate can be used in that case.

names = ["a", "b", "c", "d"]
for i, name in enumerate(names):
    print(i, name)

0 a
1 b
2 c
3 d

chapters = ["Getting Started", "Lists", "Working with Files"]
for i, title in enumerate(chapters):
    print("chapter", i+1, ":",  title)

chapter 1 : Getting Started
chapter 2 : Lists
chapter 3 : Working with Files

List Indexing¶

We've already seen how to index a list.

x = ['a', 'b', 'c', 'd']

x[0]

'a'

x[1]

'b'

x[2]

'c'

x[3]

'd'

How to find the last element of a list?

x[len(x)-1]

'd'

Python provides a short-hand for this.

x[-1]

'd'

The indices -1, -2, -3 etc. index the list from the right side, with index -1 being that last element.

def get_last_word(sentence):
    return sentence.split()[-1]

get_last_word("one two three")

'three'

Problem: Write a function getext to get extension of given filename.

>>> getext("a.py")
'py'
>>> getext("a.tar.gz")
'gz'

Assume that the filename will always have an extension.

List Slicing¶

Python has very elegant way to create a new list by slicing a list.

x = ["a", "b", "c", "d", "e", "f", "g", "h"]

x[0:2] # first two elements of a list

['a', 'b']

x[:2] # upto second element

['a', 'b']

x[2:] # from second element onwards

['c', 'd', 'e', 'f', 'g', 'h']

x[2:6]

['c', 'd', 'e', 'f']

x[1:6:2] # take every second element starting from index 1 to index 6

['b', 'd', 'f']

x[:] # a copy of the entire list

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

x[::-1] # reverse the list

['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']

Example: echo2.py¶

Lets try to improve the echo program that we wrote earlier to print all the command-line arguments instead of just the first one.

%%file echo2.py
import sys
args = sys.argv[1:]
print(" ".join(args))

Overwriting echo2.py

!python echo2.py hello world

hello world

Problem: Write a program sum.py that takes multiple numbers as command-line arguments and prints their sum.

$ python sum.py 1 2 3 4 5
15

Sorting Lists¶

names = ["alice", "dave", "bob", "charlie"]

names.sort() # sorts the list in-place
names

['alice', 'bob', 'charlie', 'dave']

names = ["alice", "dave", "bob", "charlie"]
sorted(names) # returns a new sorted list

['alice', 'bob', 'charlie', 'dave']

names

['alice', 'dave', 'bob', 'charlie']

How to sort the names by length?

sorted(names, key=len)

['bob', 'dave', 'alice', 'charlie']

sorted(names, key=len, reverse=True)

['charlie', 'alice', 'dave', 'bob']

Problem: Write a function isorted to sort given list of strings, ignoring the case.

>>> names = ["Alice", "bob", "Dave", "charlie"]
>>> isorted(names)
["Alice", "bob", "charlie", "Dave"]

Lets try another example.

Given a list of student records with name and marks, how to sort the records by the marks?

marks = [
    ("A", 10),
    ("B", 65),
    ("C", 48),
    ("D", 58)
]
def get_marks(record):
    return record[1]
    
sorted(marks, key=get_marks)

[('A', 10), ('C', 48), ('D', 58), ('B', 65)]

The get_marks function is written just for using with the sorted function. Wouldn't it be nice if we can write it directly there?

Python has a feature called lambda expressions for doing that.

sorted(marks, key=lambda record: record[1])

[('A', 10), ('C', 48), ('D', 58), ('B', 65)]

Strings¶

In Python, strings are very much like lists.

x = "hello"

len(x)

5

x[0]

'h'

for c in x:
    print(c)

h
e
l
l
o

max(x)

'o'

x[:4]

'hell'

String Formatting¶

name = "Python"
message = "Hello, {}!".format(name)
print(message)

Hello, Python!

"Chapter {}: {}".format(1, "Getting Started")

'Chapter 1: Getting Started'

"Chapter {0}: {1}".format(1, "Getting Started")

'Chapter 1: Getting Started'

"Chapter {index}: {title}".format(index=1, title="Getting Started")

'Chapter 1: Getting Started'

Lets a real example.

def make_link(url):
    return '<a href="{url}">{url}</a>'.format(url=url)

make_link("http://www.google.com/")

'<a href="http://www.google.com/">http://www.google.com/</a>'

It is even possible to specify width for parameter.

"{:3} - {}".format(10, "A")

' 10 - A'

For floating point numbers, we can specify precission as well.

0.1 + 0.2

0.30000000000000004

"{:.2f} +{:.2f} = {:.2f}".format(0.1, 0.2, 0.1+0.2)

'0.10 +0.20 = 0.30'

There is an old-way of string formatting, that is still supported.

"hello %s" % "Python"

'hello Python'

"chapter %d: %s" % (1, "getting started")

'chapter 1: getting started'

Working with Files¶

%%file three.txt
1
2
3

Overwriting three.txt

f = open("three.txt")

The easiest way to read the contents of a file is:

f.read()

'1\n2\n3'

open("three.txt").read()

'1\n2\n3'

The readlines methods returns all the lines of the file as a list.

open("three.txt").readlines()

['1\n', '2\n', '3']

f = open("three.txt")
f.readline()

'1\n'

f.readline()

'2\n'

f.readline()

'3'

f.readline()

''

Empty string indicates end of file.

Problem: Write a program cat.py that takes a filename as command-line argument and prints all contents of the file.

$ python cat.py three.txt
1
2
3

Example: Word Count¶

Lets try to implement the unix command wc in python. The wc command computers number of lines, words and characters in a file.

We'll use the following file as input to test our program.

%%file numbers.txt
1 one
2 two
3 three
4 four
5 five

Overwriting numbers.txt

The %%file doesn't add a new line at the end of the file. Fit it by editing the file and adding a new line at the end.

!wc numbers.txt

       4      10      33 numbers.txt

%%file wc.py
# Program to compute line count, word count and char count of a file.
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)
    
main()

Overwriting wc.py

!python wc.py numbers.txt

5 10 33 numbers.txt

Problem: Write a program head.py that takes a filename as command-line argument and print first 5 lines of it.

$ python head.py one-to-ten.txt
1
2
3
4
5

Problem: Write a program sumfile.py that takes a filename as command-line argument and prints sum of all numbers in that file. It is assumed that the file has one number in every line.

$ python sumfile.py one-to-ten.txt
55

Problem: Write a program grep.py that takes a pattern and a filename as command-line arguments and prints all tthe lines in the file containing the given pattern.

$ python grep.py def wc.py
def linecount(f):
def wordcount(f):
def charcount(f):
def main():

Writing to Files¶

File can be open in write mode by specifying "w" as the second argument to open.

f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()

It is important to close the file after writing. Only when the file is closed, all the content written to the file gets flushed to the disk.

open("a.txt").read()

'one\ntwo\n'

To add more contents to an existing file, we need to open the file in append mode.

f = open("a.txt", "a")
f.write("three\n")
f.close()

open("a.txt").read()

'one\ntwo\nthree\n'

The `with` statement¶

The with statement is handly when writing to files as it automatically closes the file at the end of with block.

with open("b.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")
# f is automatically closed here

open("b.txt").read()

'one\ntwo\n'

Problem: Write a file copyfile.py to copy contents of one file into another. The program should accept two filenames as command-line arguments and copy ther first one into the second.

$ python copyfile.py a.txt a2.txt

Warning: Don't call that file copy.py as that interferes with system module copy.

Binary and Text¶

Python, version 3 esp., treats text and binary differently.

type("helloworld")

str

The encode method encodes a string as bytes using specified encoding.

x = "helloworld".encode('ascii')

x

b'helloworld'

type(x)

bytes

Literal bytes are written like strings, but with a b prefix.

b'hello'

b'hello'

Lets look at couple of unicode characters to understand how that makes a difference.

name = "అఆఇఈ" # or "\u0c05\u0c06\u0c07\u0c08"

len(name)

4

name_bytes = name.encode("utf-8")

name_bytes

b'\xe0\xb0\x85\xe0\xb0\x86\xe0\xb0\x87\xe0\xb0\x88'

len(name)

4

When writing text to a file, we can also specify the encoding as well. If you don't know anything about encoding, just use utf-8. That is the most sane encoding.

text = "hello \u0c05\u0c06\u0c07\u0c08"

with open("a.txt", "w", encoding='utf-8') as f:
    f.write(text)

!wc -c a.txt

      18 a.txt

len(text)

10

len(text.encode('utf-8'))

18

Notice the file size is same as the number of bytes.

Dictionaries¶

Dictionaries are key-value pairs with fast lookup.

d = {'x': 1, 'y': 2}

d['x']

1

d['y']

2

d['x'] = 11

d['z'] = 3

print(d)

{'y': 2, 'z': 3, 'x': 11}

The keys in the dictionary are not ordeded.

The keys, values and items methods provide the keys, values and key-value pairs of the dictionary respectively.

d.keys()

dict_keys(['y', 'z', 'x'])

d.values()

dict_values([2, 3, 11])

d.items()

dict_items([('y', 2), ('z', 3), ('x', 11)])

for k in d.keys():
    print(k)

y
z
x

Directly iterating over the dictionary also iterates over the keys.

for k in d:
    print(k)

y
z
x

for v in d.values():
    print(v)

2
3
11

for k, v in d.items():
    print(k, v)

y 2
z 3
x 11

The in operator works for dictionaries as well.

'x' in d

True

'xx' in d

False

Other commonly used methods on dictionaries are get and setdefault.

d.get('x', 0)

11

d.get('xx', 0)

0

The get method takes a key and a default-value as argument and returns the value for that key if it exists, default-value otherwise. The dictionary will not be modified at all.

The setdefault method works like get but also updates the dictionary when the key is not available in the dictionary.

d.setdefault('x', 0)

11

d.setdefault('xx', 0)

0

d

{'x': 11, 'xx': 0, 'y': 2, 'z': 3}

Example: Word Frequency¶

%%file words.txt
five
five four
five four three
five four three two
five four three two one

Overwriting words.txt

%%file wordfreq.py
"""Program to compute frequency of all words in a file.

USAGE: python wordfreq.py file.txt
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    freq = {}
    for w in words:
        freq[w] = freq.get(w, 0) + 1
    return freq

def print_freq(freq):
    # TODO: improve this
    print(freq)

def main():
    filename = sys.argv[1]
    words = read_words(filename)
    freq = wordfreq(words)
    print_freq(freq)
    
if __name__ == "__main__":
    main()

Overwriting wordfreq.py

!python wordfreq.py words.txt

{'two': 2, 'one': 1, 'four': 4, 'five': 5, 'three': 3}

Problem: Improve the above program to print one word per line, like the following:

four 4
one 1
five 5
three 3
two 2

Problem: Improve the above program further to print the words sorted by frequency, with most frequent word on the top.

five 5
four 4
three 3
two 2
one 1

Problem: Write a program extcount.py to count the number of files per extension in the given directory. The program should take path to a directory as command-line argument and print count and extension for each available extension.

$ python extcount.py foo
14 py
2 txt
1 csv

Writing Custom Modules¶

%%file mymodule.py
print("begin mymodule")
x = 1

def add(a, b):
    return a+b

print(add(3, 4))
print("end mymodule")

Writing mymodule.py

!python3 mymodule.py

begin mymodule
7
end mymodule

%%file a.py
import mymodule

print(mymodule.x)
print(mymodule.add(10, 20))

Writing a.py

!python3 a.py

begin mymodule
7
end mymodule
1
30

The __name__ magic variable

%%file mymodule2.py

x = 1

def add(a, b):
    return a+b

print(add(3, 4))
print(__name__)

Writing mymodule2.py

!python mymodule2.py

7
__main__

When the file is run as a script, the __name__ is set to "__main__".

!python -c "import mymodule2"

7
mymodule2

When the file is imported as a module, the __name__ is set to the module name.

%%file mymodule3.py

x = 1

def add(a, b):
    return a+b

if __name__ == "__main__":
    # If this file is executed as script, then run the following code
    # Ignore this when the file is imported as a module
    print(add(3, 4))

Writing mymodule3.py

!python mymodule3.py

7

!python -c "import mymodule3"

Docstings¶

help("mymodule3")

Help on module mymodule3:

NAME
    mymodule3

FUNCTIONS
    add(a, b)

DATA
    x = 1

FILE
    /Users/anand/github/anandology/python-datascience/mymodule3.py

%%file mymodule4.py
"""This is module 4.

Long description of the module.
"""

x = 1

def add(a, b):
    """Adds two numbers.
    
    Example:
    
        >>> add(2, 3)
        5
    """
    return a+b

if __name__ == "__main__":
    print(add(3, 4))

Writing mymodule4.py

Classes¶

class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

p = Point()
print(p.x, p.y)

0 0

p.z = 3 # new attributes can be added to an object, any time.

p.z

3

class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x
    
    def display(self):
        print(self.x, self.y)
        
    def add(self, p):
        """Adds this point to another point and returns the new point.
        """
        x = self.x + p.x
        y = self.y + p.y
        return Point(x, y)

p1 = Point()
p1.x = 1
p1.y = 2
print(p1.getx())

p2 = Point()
p2.x = 3
p2.y = 4

p1.display()
p2.display()

p3 = p1.add(p2)
p3.display()

1
1 2
3 4
4 6

Point

__main__.Point

Point.getx

<function __main__.Point.getx>

Point.getx(p1)

1

Point.getx(p2)

3

The method calling syntax is shorthand for the above.

p1.getx() # Point.getx(p1)

1

"mathematics".count("mat") # str.count("mathematics", "mat")

2

str.count("mathematics", "mat")

2

Problem: Write a method double in the above point class. It should return a new point with both x and y doubled.

>>> p = Point(1, 2)
>>> q = p.double()
>>> q.display()
2 4