Machine Learning with Python - Day 3

Cognizant, Bangalore
March, 2015
Jigsaw Academy

Instructor: Anand Chitipothu

This live notes are avaialble online at http://bit.ly/cognizant-py.

Working with Files

In [1]:
%%file three.txt
one
two
three
Writing three.txt
In [2]:
f = open("three.txt")
In [3]:
f.read()
Out[3]:
'one\ntwo\nthree'
In [4]:
open("three.txt").read()
Out[4]:
'one\ntwo\nthree'
In [5]:
print(open("three.txt").read())
one
two
three
In [6]:
open("three.txt").readlines()
Out[6]:
['one\n', 'two\n', 'three']
In [8]:
for line in open("three.txt").readlines():
    print(line, end="")
one
two
three
In [9]:
for i, line in enumerate(open("three.txt").readlines()):
    print("Line", i, ":", line, end="")
Line 0 : one
Line 1 : two
Line 2 : three
In [10]:
for line in open("three.txt"):
    print(line, end="")
one
two
three

Q: What happens if we read the same file object twice?

In [11]:
f = open("three.txt")
f.read()
Out[11]:
'one\ntwo\nthree'
In [12]:
f.read()
Out[12]:
''

That is because the file pointer is at the end.

In [15]:
f.tell() # file offset at the end of the file
Out[15]:
13
In [16]:
open("three.txt").tell() # file offset at the beginning of the file
Out[16]:
0
In [17]:
f.seek(0) 
f.tell()
Out[17]:
0
In [18]:
f.read()
Out[18]:
'one\ntwo\nthree'
In [ ]:
 
In [ ]:
 

Problem: Write a program cat.py that takes a filename as argument and prints all the contents of the file.

$ python cat.py three.txt
one
two
three

Example: Word Count

Lets try to implement the unix word count command wc in Python.

In [20]:
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five
Writing numbers.txt
In [21]:
!wc numbers.txt
       5      10      34 numbers.txt
In [30]:
%%file wc.py
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)
    
main()    
Overwriting wc.py
In [31]:
!python wc.py numbers.txt
5 10 34 numbers.txt

Problem: Write a program sumfile.py that takes a filename as command-line argument and prints the sum of all numbers in that file. It is assumed that the file has one number per line.

$ python sumfile.py one-to-ten.txt
55

Problem: Write a program head.py that takes a filename as command-line argument and prints the first five lines of the file.

$ python head.py one-to-ten.txt
1
2
3
4
5

Problem: Write a program grep.py that takes a pattern and a filename as arguments and prints all the lines containing that pattern.

$ python grep.py def wc.py
def linecount(f):
def wordcount(f):
def charcount(f):
def main():    
In [32]:
%%file sumfile.py
import sys
filename = sys.argv[1]
numbers = [int(line) for line in open(filename)]
print(sum(numbers))
Writing sumfile.py
In [33]:
%%file one-to-ten.txt
1
2
3
4
5
6
7
8
9
10
Writing one-to-ten.txt
In [34]:
!python sumfile.py one-to-ten.txt
55

Writing to Files

File can be opened in write mode my specifying "w" as second argument.

In [35]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()

Lets see what we have in that file now.

In [36]:
open("a.txt").read()
Out[36]:
'one\ntwo\n'

Q: How to test if a file already exists?

In [37]:
import os
os.path.exists("a.txt")
Out[37]:
True
In [38]:
os.path.exists("b.txt")
Out[38]:
False

To add more contents to an existing file, we need to open the file in append mode.

In [39]:
f = open("a.txt", "a")
f.write("three\n")
f.close()
In [41]:
open("a.txt").read()
Out[41]:
'one\ntwo\nthree\n'

The with Statement

The with statement is handy when writing to files as it closes the file automatically at the end of the with block.

In [42]:
with open("b.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")    
# f gets closed automatically here    
In [43]:
open("b.txt").read()
Out[43]:
'one\ntwo\n'

Problem: Write a program copyfile.py to copy contents of one file to another. The program should accept two filenames as command-line argument and copy the first one into the second.

$ python copyfile.py a.txt a2.txt

WARNING: Don't call the file copy.py as it interferes with built-in module copy

Problem: Write a program mergefiles.py that takes one target file and multiple source files as arguments and copies the contents of all source files into the target file.

$ python mergefile.py ten.txt five.txt five-to-ten.txt

+Problem: Write a program split.py that splits a large file into multiple smaller files. The program should take a filename and the number of lines as arguments and write multiple small files each containing the specified number of lines (The last one may have smaller number of lines).

$ python split.py 100.txt 30
writing 100.txt-part1
writing 100.txt-part2    
writing 100.txt-part3    
writing 100.txt-part4        
In [44]:
%%file copyfile.py
import sys
src = sys.argv[1]
dest = sys.argv[2]

contents = open(src).read()

with open(dest, "w") as f:
    f.write(contents)
Writing copyfile.py
In [52]:
%%file mergefiles.py
import sys

destfile = sys.argv[1]
srcfiles = sys.argv[2:]

print(destfile)
print(srcfiles)

with open(destfile, "w") as dest:
    for f in srcfiles:
        dest.write(open(f).read())
Overwriting mergefiles.py
In [53]:
!python mergefiles.py c.txt a.txt b.txt
c.txt
['a.txt', 'b.txt']
In [51]:
print(open("c.txt").read())
one
two
three
one
two

Binary and Text

In [54]:
type("helloworld")
Out[54]:
str

The encode method encodes the given string as bytes using specified encoding.

In [56]:
x = "helloworld".encode("ascii")
x
Out[56]:
b'helloworld'
In [46]:
"h"
Out[46]:
'h'
In [49]:
"h".encode("utf-8")
Out[49]:
b'h'
In [ ]:
 
In [57]:
type(x)
Out[57]:
bytes

Literal bytes are written like strings, but with a b prefix.

In [58]:
b'these are bytes'
Out[58]:
b'these are bytes'

Lets look at some unicode text.

In [60]:
name = "\u0c85\u0c86\u0c87\u0c88"
In [61]:
name
Out[61]:
'ಅಆಇಈ'
In [62]:
len(name)
Out[62]:
4
In [63]:
# this will fail
name.encode("ascii")
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-63-65ea99204210> in <module>()
      1 # this will fail
----> 2 name.encode("ascii")

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
In [64]:
name.encode("utf-8")
Out[64]:
b'\xe0\xb2\x85\xe0\xb2\x86\xe0\xb2\x87\xe0\xb2\x88'
In [66]:
name_bytes = name.encode("utf-8")
In [67]:
len(name_bytes)
Out[67]:
12

Lets try to write the name into a file.

In [68]:
f = open("kannada.txt", "w", encoding="utf-8")
In [69]:
f.write(name)
f.close()
In [70]:
# look at the file size using unix command ls
!ls -l kannada.txt
-rw-r--r--  1 anand  staff  12 Mar 16 11:27 kannada.txt
In [71]:
open("kannada.txt", "r", encoding="utf-8").read()
Out[71]:
'ಅಆಇಈ'
In [72]:
open("kannada.txt", "rb").read()
Out[72]:
b'\xe0\xb2\x85\xe0\xb2\x86\xe0\xb2\x87\xe0\xb2\x88'
In [51]:
open("kannada.txt", "r", encoding="utf-8")
Out[51]:
<_io.TextIOWrapper name='kannada.txt' mode='r' encoding='utf-8'>
In [52]:
open("kannada.txt", "rb")
Out[52]:
<_io.BufferedReader name='kannada.txt'>
In [73]:
!cat kannada.txt
������������
In [74]:
name_bytes
Out[74]:
b'\xe0\xb2\x85\xe0\xb2\x86\xe0\xb2\x87\xe0\xb2\x88'
In [75]:
name_bytes.decode("utf-8")
Out[75]:
'ಅಆಇಈ'

Q: How strings and bytes work in Python 2?

In [80]:
%%file bytes.py
# -*- encoding: utf-8 -*-

name = 'ಅಆಇಈ'
print("hello" + name)
Overwriting bytes.py
In [81]:
!python bytes.py
Traceback (most recent call last):
  File "bytes.py", line 4, in <module>
    print("hello" + name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-8: ordinal not in range(128)

Seems to be some issue here. Lets ignore this for now.

Example: Reading CSV files

In [82]:
%%file a.csv
A1,B1,C1
A2,B2,C2
A3,B3,C3
Writing a.csv
In [83]:
open("a.csv").readlines()
Out[83]:
['A1,B1,C1\n', 'A2,B2,C2\n', 'A3,B3,C3']
In [84]:
[line for line in open("a.csv").readlines()]
Out[84]:
['A1,B1,C1\n', 'A2,B2,C2\n', 'A3,B3,C3']
In [85]:
[line for line in open("a.csv")]
Out[85]:
['A1,B1,C1\n', 'A2,B2,C2\n', 'A3,B3,C3']
In [86]:
[line.strip("\n") for line in open("a.csv")]
Out[86]:
['A1,B1,C1', 'A2,B2,C2', 'A3,B3,C3']
In [87]:
[line.strip("\n").split(",") for line in open("a.csv")]
Out[87]:
[['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3']]
In [88]:
def read_csv(filename):
    return [line.strip("\n").split(",") for line in open(filename)]
In [89]:
read_csv("a.csv")
Out[89]:
[['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3']]

Problem: Improve the read_csv function written above to ignore empty lines and comments. Assume that comment lines start with a # character.

In [93]:
%%file b.csv
# begin
A1,B1,C1

A2,B2,C2
# last line
A3,B3,C3
#end
Overwriting b.csv

Problem: Improve the read_csv function further to take delimiter as optional argument. The delimiter should default to , when not specified.

>>> read_csv("c.txt", delimiter=":")
[['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3']]    
In [91]:
%%file c.txt
A1:B1:C1
A2:B2:C2
A3:B3:C3
Overwriting c.txt

Writing Custom Modules

In [94]:
%%file mymodule.py
print("BEGIN mymodule")
x = 1

def add(a, b):
    return a+b

print(add(3, 4))
print("END mymodule")
Writing mymodule.py
In [95]:
!python mymodule.py
BEGIN mymodule
7
END mymodule

Lets say we want to use the add function defined in the mymodule.py somewhere else.

In [96]:
%%file a.py
import mymodule
print(mymodule.x)
print(mymodule.add(2, 3))
Writing a.py
In [97]:
!python a.py
BEGIN mymodule
7
END mymodule
1
5

The __name__ magic variable

Now we don't want these prints from the file to come when it is imported as a module, but they are required when the file is run as a script.

In [101]:
%%file mymodule2.py
x = 1

def add(a, b):
    return a+b

print(add(3, 4))
print(__name__)
Overwriting mymodule2.py
In [102]:
!python mymodule2.py
7
__main__

When the file is executed as a script, the special variable __name__ is set to "__main__".

In [103]:
!python -c "import mymodule2"
7
mymodule2

But when the file is imported as a module, the __name__ is set to the module name.

In [104]:
%%file mymodule3.py
x = 1

def add(a, b):
    return a+b

if __name__ == "__main__":
    # Run the following code only when this file is 
    # executed as a script.
    # Ignore this when imported as a module.
    print(add(3, 4))
Writing mymodule3.py
In [105]:
!python mymodule3.py
7
In [107]:
!python -c "import mymodule3"

Problem: Make the wc.py that we write earlier importable.

>>> import wc
>>> wc.linecount("a.txt")
3
In [116]:
%%file wc2.py
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)

if __name__ == "__main__":    
    main()    
Overwriting wc2.py
In [117]:
import wc2
In [118]:
wc2.linecount("a.txt")
Out[118]:
3
In [120]:
help("wc2")
Help on module wc2:

NAME
    wc2

FUNCTIONS
    charcount(f)
    
    linecount(f)
    
    main()
    
    wordcount(f)

DATA
    __warningregistry__ = {'version': 341, ("unclosed file <_io.TextIOWrap...

FILE
    /Users/anand/trainings/2016/cognizant/wc2.py


Docstrings

In [121]:
def square(x):
    return x*x
In [122]:
help(square)
Help on function square in module __main__:

square(x)

In [123]:
def square(x):
    "Computes square of a number."
    return x*x
In [124]:
help(square)
Help on function square in module __main__:

square(x)
    Computes square of a number.

In [125]:
def square(x):
    """Computes square of a number.
    
        >>> square(3)
        9
    """
    return x*x
In [126]:
help(square)
Help on function square in module __main__:

square(x)
    Computes square of a number.
    
    >>> square(3)
    9

In [127]:
square?
In [128]:
%%file mymodule4.py
"""This is mymodule4. 

Written to demonstrate docstrings.
"""
x = 1

def add(a, b):
    """Adds two numbers.
    
        >>> add(3, 4)
        7
    """
    return a+b

if __name__ == "__main__":
    # Run the following code only when this file is 
    # executed as a script.
    # Ignore this when imported as a module.
    print(add(3, 4))
Writing mymodule4.py
In [129]:
help("mymodule4")
Help on module mymodule4:

NAME
    mymodule4 - This is mymodule4.

DESCRIPTION
    Written to demonstrate docstrings.

FUNCTIONS
    add(a, b)
        Adds two numbers.
        
        >>> add(3, 4)
        7

DATA
    x = 1

FILE
    /Users/anand/trainings/2016/cognizant/mymodule4.py


Q: What is from module import something?

In [132]:
import wc2
print(wc2.linecount("a.txt"))
3
In [133]:
wc2
Out[133]:
<module 'wc2' from '/Users/anand/trainings/2016/cognizant/wc2.py'>
In [134]:
from wc2 import linecount
In [136]:
linecount("a.txt")
Out[136]:
3
In [137]:
from wc2 import linecount as lc
In [138]:
lc("a.txt")
Out[138]:
3
In [139]:
import wc2 as wc
wc.linecount("a.txt")
Out[139]:
3
In [140]:
import time as t
t.asctime()
Out[140]:
'Wed Mar 16 12:44:05 2016'

Q: What is difference between package and a module?

package is basically a nested module, containing more modules inside it.

Lets try to create utils package.

In [141]:
!mkdir utils
In [142]:
%%file utils/square.py
"""The square module."""

def square(x):
    return x*x
Writing utils/square.py
In [144]:
%%file utils/cube.py
"""The cube module."""

def cube(x):
    return x*x*x
Overwriting utils/cube.py
In [145]:
%%file utils/__init__.py
"""The utils package.

This provides square and cube modules.
"""
Writing utils/__init__.py
In [146]:
!tree utils/
utils/
|-- __init__.py
|-- cube.py
`-- square.py

0 directories, 3 files
In [147]:
from utils.square import square
square(4)
Out[147]:
16
In [148]:
from utils.cube import cube
cube(4)
Out[148]:
64
In [149]:
help("utils")
Help on package utils:

NAME
    utils - The utils package.

DESCRIPTION
    This provides square and cube modules.

PACKAGE CONTENTS
    cube
    square

FILE
    /Users/anand/trainings/2016/cognizant/utils/__init__.py


In [150]:
help("utils.square")
Help on module utils.square in utils:

NAME
    utils.square

FUNCTIONS
    square(x)

FILE
    /Users/anand/trainings/2016/cognizant/utils/square.py


Dictionaries

In [151]:
d = {"x": 1, "y": 2}
In [152]:
d['x']
Out[152]:
1
In [153]:
d['y']
Out[153]:
2
In [154]:
d['x'] = 11
In [155]:
d
Out[155]:
{'x': 11, 'y': 2}
In [156]:
d['z'] = 3
In [158]:
print(d)
{'x': 11, 'y': 2, 'z': 3}
In [159]:
person = {
    "name": "Alice",
    "email": "alice@example.com",
    "phone": "9876500012"
}

Dictionary can also be created by passing key-value pairs to dict function.

In [160]:
dict([("x", 1), ("y", 2), ("z", 3)])
Out[160]:
{'x': 1, 'y': 2, 'z': 3}

Lets try a simple example.

In [161]:
%%file prices.txt
apple 20
mango 40
banane 30
Writing prices.txt
In [162]:
def load_prices(filename):
    prices = {}
    for line in open(filename):
        name, price = line.strip().split()
        prices[name] = int(price)
    return prices
In [163]:
prices = load_prices("prices.txt")
In [164]:
prices["apple"]
Out[164]:
20
In [165]:
%%file inventory.txt
notebook 100
pen 58
pencil 83
Writing inventory.txt
In [167]:
inventory = load_prices("inventory.txt")
In [168]:
inventory['notebook']
Out[168]:
100
In [169]:
inventory.get("notebook", 0)
Out[169]:
100
In [170]:
inventory.get("ruler", 0)
Out[170]:
0

Q: How to check if a key in present in a dictionary?

In [172]:
"notebook" in inventory
Out[172]:
True
In [173]:
"ruler" in inventory
Out[173]:
False
In [175]:
inventory.keys()
Out[175]:
dict_keys(['pencil', 'notebook', 'pen'])
In [176]:
inventory.values()
Out[176]:
dict_values([83, 100, 58])
In [177]:
inventory.items()
Out[177]:
dict_items([('pencil', 83), ('notebook', 100), ('pen', 58)])
In [178]:
for k in inventory.keys():
    print(k)
pencil
notebook
pen
In [179]:
for k in inventory:
    print(k)
pencil
notebook
pen
In [181]:
for v in inventory.values():
    print(v)
83
100
58
In [182]:
for k,v in inventory.items():
    print(k, v)
pencil 83
notebook 100
pen 58

Problem: Given a file containing the prices of each product and another file containing the products purchased and their quantity, write a program to generate a bill for the purchases.

$ python bill.py prices.txt purchases.txt
mango 5 40 200
apple 2 20 40
banana 4 30 120
TOTAL 360
In [183]:
%%file prices.txt
apple 20
mango 40
banana 30
Overwriting prices.txt
In [184]:
%%file purchases.txt
mango 5
apple 2
banana 4
Writing purchases.txt
In [185]:
def load_dict(filename):
    prices = {}
    for line in open(filename):
        name, price = line.strip().split()
        prices[name] = int(price)
    return prices

Example: Word Count

In [186]:
%%file words.txt
five
five four
five four three
five four three two
five four three two one
Writing words.txt
In [189]:
%%file wordfreq.py
"""Program to compute frequency of words in a file.

USAGE: python wordfreq.py filename.txt
"""
import sys

def read_words(filename):
    """Reads words from a file."""
    return open(filename).read().split()

def wordfreq(words):
    """Computes frequency of each words from the given words.
    """
    freq = {}
    for w in words:
        freq[w] = freq.get(w, 0) + 1
    return freq

def print_freq(freq):
    """Prints the frequency of words.
    """
    # TODO: FIXME
    print(freq)

def main():
    filename = sys.argv[1]
    words = read_words(filename)
    freq = wordfreq(words)
    print_freq(freq)
    
if __name__ == "__main__":
    main()
Overwriting wordfreq.py
In [190]:
!python wordfreq.py words.txt
{'two': 2, 'one': 1, 'four': 4, 'five': 5, 'three': 3}

Problem: Improve the above program to print one word per line, like the following:

two 2
one 1
four 4
five 5
three 3

Problem: Improve the above program further to print the words sorted by count, with the most common word on the top.

five 5
four 4
three 3
two 2
one 1    

+Problem: Write a program extcount.py to count the number of files per extension in the given directory. The program should take path to a directory as command argument and print the count and extension for each available extension.

$ python extcount.py foo/
14 py
2 txt
1 csv

Can you reuse the wordfreq function implemented in the above example, by importing it as a module?

Classes

In [1]:
class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
In [2]:
p = Point()
print(p.x, p.y)
0 0
In [3]:
q = Point(2, 3)
print(q.x, q.y)
2 3
In [5]:
class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x
    
    def display(self):
        print(self.x, self.y)
        
    def add(self, p):
        x = self.x + p.x
        y = self.y + p.y
        return Point(x, y)
    
p1 = Point(1, 2)
p2 = Point(3, 4)

p1.display()
p2.display()
print(p1.getx())

p3 = p1.add(p2)
p3.display()
1 2
3 4
1
4 6

Problem: Write a method double that returns a new Point with both x and y coordinates doubled.

>>> p = Point(2, 3)
>>> q = p.double()
>>> q.display()
4 6
In [7]:
p = Point(1, 2)
In [8]:
p.x
Out[8]:
1
In [9]:
p.y
Out[9]:
2
In [10]:
p.z = 3
In [11]:
p.z
Out[11]:
3
In [12]:
p.__dict__
Out[12]:
{'x': 1, 'y': 2, 'z': 3}
In [13]:
p.__class__
Out[13]:
__main__.Point

Q: Can a method of a class access the dynamically added attributes?

In [14]:
class Foo:
    def getx(self):
        return self.x
In [16]:
f = Foo()
f.getx()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-c25cfc827fb5> in <module>()
      1 f = Foo()
----> 2 f.getx()

<ipython-input-14-27f09e900618> in getx(self)
      1 class Foo:
      2     def getx(self):
----> 3         return self.x

AttributeError: 'Foo' object has no attribute 'x'
In [17]:
f.x = 1
f.getx()
Out[17]:
1
In [18]:
class DummyFile:
    def read(self):
        return "one two three four"
In [26]:
def read_words(fileobj):
    return fileobj.read().split()
In [27]:
read_words(open("words.txt"))
Out[27]:
['five',
 'five',
 'four',
 'five',
 'four',
 'three',
 'five',
 'four',
 'three',
 'two',
 'five',
 'four',
 'three',
 'two',
 'one']
In [28]:
read_words(DummyFile())
Out[28]:
['one', 'two', 'three', 'four']

Example: CSVParser

In [29]:
class CSVParser:
    def __init__(self, delimiter=",", comment_indicator="#"):
        self.delimiter = delimiter
        self.comment_indicator = comment_indicator
        
    def parse(self, filename):
        return [line.strip("\n").split(self.delimiter) 
                for line in open(filename)
                if not line.startswith(self.comment_indicator)
                   and line.strip() != ""] 
In [30]:
csv_parser = CSVParser(delimiter=",")
tsv_parser = CSVParser(delimiter="\t")
special_parser = CSVParser(delimiter=":", comment_indicator=",")
In [31]:
csv_parser.parse("a.csv")
Out[31]:
[['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3']]

Exception Handling

In [32]:
no_such_variable
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-b7c1357f8e68> in <module>()
----> 1 no_such_variable

NameError: name 'no_such_variable' is not defined
In [33]:
open("nofile.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-33-dba134cf36ca> in <module>()
----> 1 open("nofile.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'nofile.txt'
In [34]:
int("not-a-number")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-ac8f40c8d19c> in <module>()
----> 1 int("not-a-number")

ValueError: invalid literal for int() with base 10: 'not-a-number'

Lets try an example.

In [35]:
def read_file(filename):
    """Returns contents of the file.
    
    If the file is not found or if there is any error
    in reading the file, returns empty string.
    """
    try:
        return open(filename).read()
    except FileNotFoundError:
        return ""
In [36]:
read_file("a.txt")
Out[36]:
'one\ntwo\nthree\n'
In [37]:
read_file("nofile.txt")
Out[37]:
''

Problem: Write a function safeint to convert given string into an integer. The function should accept two arguments, the string to convert and a default value. If the given string is not a valid integer, the default value should be returned.

>>> safeint("3", 0)
3
>>> safeint("NA", 0)
0

Problem: Improve the sumfile.py we wrote earlier to ignore the invalid numbers after printing a waring message.

$ python sumfile.py num.txt
WARNING: Bad number 'N/A'
WARNING: Bad number 'xxx'
15
In [40]:
%%file num.txt
1
2
3
N/A
4
xxx
5
Writing num.txt
In [44]:
%%file sumfile.py
import sys
filename = sys.argv[1]

def safeint(value, default):
    try:
        return int(value)
    except ValueError:
        print("WARNING: Bad Number", repr(value))
        return default

numbers = [safeint(line, 0) for line in open(filename)]
print(sum(numbers))
Overwriting sumfile.py
In [45]:
!python sumfile.py num.txt
WARNING: Bad Number 'N/A\n'
WARNING: Bad Number 'xxx\n'
15

Why Python 3?

There are lot of nice features coming up in Python 3.

In [53]:
def add(x: int, y: int) -> int:
    return x+y+"hello"
In [ ]: