Cognizant, Bangalore
March, 2015
Jigsaw Academy
Instructor: Anand Chitipothu
This live notes are avaialble online at http://bit.ly/cognizant-py.
%%file three.txt
one
two
three
f = open("three.txt")
f.read()
open("three.txt").read()
print(open("three.txt").read())
open("three.txt").readlines()
for line in open("three.txt").readlines():
print(line, end="")
for i, line in enumerate(open("three.txt").readlines()):
print("Line", i, ":", line, end="")
for line in open("three.txt"):
print(line, end="")
Q: What happens if we read the same file object twice?
f = open("three.txt")
f.read()
f.read()
That is because the file pointer is at the end.
f.tell() # file offset at the end of the file
open("three.txt").tell() # file offset at the beginning of the file
f.seek(0)
f.tell()
f.read()
Problem: Write a program cat.py
that takes a filename as argument and prints all the contents of the file.
$ python cat.py three.txt
one
two
three
Lets try to implement the unix word count command wc
in Python.
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five
!wc numbers.txt
%%file wc.py
import sys
def linecount(f):
return len(open(f).readlines())
def wordcount(f):
return len(open(f).read().split())
def charcount(f):
return len(open(f).read())
def main():
f = sys.argv[1]
print(linecount(f), wordcount(f), charcount(f), f)
main()
!python wc.py numbers.txt
Problem: Write a program sumfile.py
that takes a filename as command-line argument and prints the sum of all numbers in that file. It is assumed that the file has one number per line.
$ python sumfile.py one-to-ten.txt
55
Problem: Write a program head.py
that takes a filename as command-line argument and prints the first five lines of the file.
$ python head.py one-to-ten.txt
1
2
3
4
5
Problem: Write a program grep.py
that takes a pattern and a filename as arguments and prints all the lines containing that pattern.
$ python grep.py def wc.py
def linecount(f):
def wordcount(f):
def charcount(f):
def main():
%%file sumfile.py
import sys
filename = sys.argv[1]
numbers = [int(line) for line in open(filename)]
print(sum(numbers))
%%file one-to-ten.txt
1
2
3
4
5
6
7
8
9
10
!python sumfile.py one-to-ten.txt
File can be opened in write mode my specifying "w"
as second argument.
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()
Lets see what we have in that file now.
open("a.txt").read()
Q: How to test if a file already exists?
import os
os.path.exists("a.txt")
os.path.exists("b.txt")
To add more contents to an existing file, we need to open the file in append mode.
f = open("a.txt", "a")
f.write("three\n")
f.close()
open("a.txt").read()
with
Statement¶The with
statement is handy when writing to files as it closes the file automatically at the end of the with block.
with open("b.txt", "w") as f:
f.write("one\n")
f.write("two\n")
# f gets closed automatically here
open("b.txt").read()
Problem: Write a program copyfile.py
to copy contents of one file to another. The program should accept two filenames as command-line argument and copy the first one into the second.
$ python copyfile.py a.txt a2.txt
WARNING: Don't call the file copy.py as it interferes with built-in module copy
Problem: Write a program mergefiles.py
that takes one target file and multiple source files as arguments and copies the contents of all source files into the target file.
$ python mergefile.py ten.txt five.txt five-to-ten.txt
+Problem: Write a program split.py
that splits a large file into multiple smaller files. The program should take
a filename and the number of lines as arguments and write multiple small files each containing the specified number of lines (The last one may have smaller number of lines).
$ python split.py 100.txt 30
writing 100.txt-part1
writing 100.txt-part2
writing 100.txt-part3
writing 100.txt-part4
%%file copyfile.py
import sys
src = sys.argv[1]
dest = sys.argv[2]
contents = open(src).read()
with open(dest, "w") as f:
f.write(contents)
%%file mergefiles.py
import sys
destfile = sys.argv[1]
srcfiles = sys.argv[2:]
print(destfile)
print(srcfiles)
with open(destfile, "w") as dest:
for f in srcfiles:
dest.write(open(f).read())
!python mergefiles.py c.txt a.txt b.txt
print(open("c.txt").read())
type("helloworld")
The encode
method encodes the given string as bytes using specified encoding.
x = "helloworld".encode("ascii")
x
"h"
"h".encode("utf-8")
type(x)
Literal bytes are written like strings, but with a b
prefix.
b'these are bytes'
Lets look at some unicode text.
name = "\u0c85\u0c86\u0c87\u0c88"
name
len(name)
# this will fail
name.encode("ascii")
name.encode("utf-8")
name_bytes = name.encode("utf-8")
len(name_bytes)
Lets try to write the name into a file.
f = open("kannada.txt", "w", encoding="utf-8")
f.write(name)
f.close()
# look at the file size using unix command ls
!ls -l kannada.txt
open("kannada.txt", "r", encoding="utf-8").read()
open("kannada.txt", "rb").read()
open("kannada.txt", "r", encoding="utf-8")
open("kannada.txt", "rb")
!cat kannada.txt
name_bytes
name_bytes.decode("utf-8")
Q: How strings and bytes work in Python 2?
%%file bytes.py
# -*- encoding: utf-8 -*-
name = 'ಅಆಇಈ'
print("hello" + name)
!python bytes.py
Seems to be some issue here. Lets ignore this for now.
%%file a.csv
A1,B1,C1
A2,B2,C2
A3,B3,C3
open("a.csv").readlines()
[line for line in open("a.csv").readlines()]
[line for line in open("a.csv")]
[line.strip("\n") for line in open("a.csv")]
[line.strip("\n").split(",") for line in open("a.csv")]
def read_csv(filename):
return [line.strip("\n").split(",") for line in open(filename)]
read_csv("a.csv")
Problem: Improve the read_csv
function written above to ignore empty lines and comments. Assume that comment lines start with a #
character.
%%file b.csv
# begin
A1,B1,C1
A2,B2,C2
# last line
A3,B3,C3
#end
Problem: Improve the read_csv
function further to take delimiter as optional argument. The delimiter should default to ,
when not specified.
>>> read_csv("c.txt", delimiter=":")
[['A1', 'B1', 'C1'], ['A2', 'B2', 'C2'], ['A3', 'B3', 'C3']]
%%file c.txt
A1:B1:C1
A2:B2:C2
A3:B3:C3
%%file mymodule.py
print("BEGIN mymodule")
x = 1
def add(a, b):
return a+b
print(add(3, 4))
print("END mymodule")
!python mymodule.py
Lets say we want to use the add
function defined in the mymodule.py
somewhere else.
%%file a.py
import mymodule
print(mymodule.x)
print(mymodule.add(2, 3))
!python a.py
__name__
magic variable¶Now we don't want these prints from the file to come when it is imported as a module, but they are required when the file is run as a script.
%%file mymodule2.py
x = 1
def add(a, b):
return a+b
print(add(3, 4))
print(__name__)
!python mymodule2.py
When the file is executed as a script, the special variable __name__
is set to "__main__"
.
!python -c "import mymodule2"
But when the file is imported as a module, the __name__
is set to the module name.
%%file mymodule3.py
x = 1
def add(a, b):
return a+b
if __name__ == "__main__":
# Run the following code only when this file is
# executed as a script.
# Ignore this when imported as a module.
print(add(3, 4))
!python mymodule3.py
!python -c "import mymodule3"
Problem: Make the wc.py
that we write earlier importable.
>>> import wc
>>> wc.linecount("a.txt")
3
%%file wc2.py
import sys
def linecount(f):
return len(open(f).readlines())
def wordcount(f):
return len(open(f).read().split())
def charcount(f):
return len(open(f).read())
def main():
f = sys.argv[1]
print(linecount(f), wordcount(f), charcount(f), f)
if __name__ == "__main__":
main()
import wc2
wc2.linecount("a.txt")
help("wc2")
def square(x):
return x*x
help(square)
def square(x):
"Computes square of a number."
return x*x
help(square)
def square(x):
"""Computes square of a number.
>>> square(3)
9
"""
return x*x
help(square)
square?
%%file mymodule4.py
"""This is mymodule4.
Written to demonstrate docstrings.
"""
x = 1
def add(a, b):
"""Adds two numbers.
>>> add(3, 4)
7
"""
return a+b
if __name__ == "__main__":
# Run the following code only when this file is
# executed as a script.
# Ignore this when imported as a module.
print(add(3, 4))
help("mymodule4")
Q: What is from module import something
?
import wc2
print(wc2.linecount("a.txt"))
wc2
from wc2 import linecount
linecount("a.txt")
from wc2 import linecount as lc
lc("a.txt")
import wc2 as wc
wc.linecount("a.txt")
import time as t
t.asctime()
Q: What is difference between package and a module?
package is basically a nested module, containing more modules inside it.
Lets try to create utils package.
!mkdir utils
%%file utils/square.py
"""The square module."""
def square(x):
return x*x
%%file utils/cube.py
"""The cube module."""
def cube(x):
return x*x*x
%%file utils/__init__.py
"""The utils package.
This provides square and cube modules.
"""
!tree utils/
from utils.square import square
square(4)
from utils.cube import cube
cube(4)
help("utils")
help("utils.square")
d = {"x": 1, "y": 2}
d['x']
d['y']
d['x'] = 11
d
d['z'] = 3
print(d)
person = {
"name": "Alice",
"email": "alice@example.com",
"phone": "9876500012"
}
Dictionary can also be created by passing key-value pairs to dict
function.
dict([("x", 1), ("y", 2), ("z", 3)])
Lets try a simple example.
%%file prices.txt
apple 20
mango 40
banane 30
def load_prices(filename):
prices = {}
for line in open(filename):
name, price = line.strip().split()
prices[name] = int(price)
return prices
prices = load_prices("prices.txt")
prices["apple"]
%%file inventory.txt
notebook 100
pen 58
pencil 83
inventory = load_prices("inventory.txt")
inventory['notebook']
inventory.get("notebook", 0)
inventory.get("ruler", 0)
Q: How to check if a key in present in a dictionary?
"notebook" in inventory
"ruler" in inventory
inventory.keys()
inventory.values()
inventory.items()
for k in inventory.keys():
print(k)
for k in inventory:
print(k)
for v in inventory.values():
print(v)
for k,v in inventory.items():
print(k, v)
Problem: Given a file containing the prices of each product and another file containing the products purchased and their quantity, write a program to generate a bill for the purchases.
$ python bill.py prices.txt purchases.txt
mango 5 40 200
apple 2 20 40
banana 4 30 120
TOTAL 360
%%file prices.txt
apple 20
mango 40
banana 30
%%file purchases.txt
mango 5
apple 2
banana 4
def load_dict(filename):
prices = {}
for line in open(filename):
name, price = line.strip().split()
prices[name] = int(price)
return prices
%%file words.txt
five
five four
five four three
five four three two
five four three two one
%%file wordfreq.py
"""Program to compute frequency of words in a file.
USAGE: python wordfreq.py filename.txt
"""
import sys
def read_words(filename):
"""Reads words from a file."""
return open(filename).read().split()
def wordfreq(words):
"""Computes frequency of each words from the given words.
"""
freq = {}
for w in words:
freq[w] = freq.get(w, 0) + 1
return freq
def print_freq(freq):
"""Prints the frequency of words.
"""
# TODO: FIXME
print(freq)
def main():
filename = sys.argv[1]
words = read_words(filename)
freq = wordfreq(words)
print_freq(freq)
if __name__ == "__main__":
main()
!python wordfreq.py words.txt
Problem: Improve the above program to print one word per line, like the following:
two 2
one 1
four 4
five 5
three 3
Problem: Improve the above program further to print the words sorted by count, with the most common word on the top.
five 5
four 4
three 3
two 2
one 1
+Problem: Write a program extcount.py
to count the number of files per extension in the given directory. The program should take path to a directory as command argument and print the count and extension for each available extension.
$ python extcount.py foo/
14 py
2 txt
1 csv
Can you reuse the wordfreq function implemented in the above example, by importing it as a module?
class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
p = Point()
print(p.x, p.y)
q = Point(2, 3)
print(q.x, q.y)
class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
def getx(self):
return self.x
def display(self):
print(self.x, self.y)
def add(self, p):
x = self.x + p.x
y = self.y + p.y
return Point(x, y)
p1 = Point(1, 2)
p2 = Point(3, 4)
p1.display()
p2.display()
print(p1.getx())
p3 = p1.add(p2)
p3.display()
Problem: Write a method double
that returns a new Point with both x and y coordinates doubled.
>>> p = Point(2, 3)
>>> q = p.double()
>>> q.display()
4 6
p = Point(1, 2)
p.x
p.y
p.z = 3
p.z
p.__dict__
p.__class__
Q: Can a method of a class access the dynamically added attributes?
class Foo:
def getx(self):
return self.x
f = Foo()
f.getx()
f.x = 1
f.getx()
class DummyFile:
def read(self):
return "one two three four"
def read_words(fileobj):
return fileobj.read().split()
read_words(open("words.txt"))
read_words(DummyFile())
class CSVParser:
def __init__(self, delimiter=",", comment_indicator="#"):
self.delimiter = delimiter
self.comment_indicator = comment_indicator
def parse(self, filename):
return [line.strip("\n").split(self.delimiter)
for line in open(filename)
if not line.startswith(self.comment_indicator)
and line.strip() != ""]
csv_parser = CSVParser(delimiter=",")
tsv_parser = CSVParser(delimiter="\t")
special_parser = CSVParser(delimiter=":", comment_indicator=",")
csv_parser.parse("a.csv")
no_such_variable
open("nofile.txt")
int("not-a-number")
Lets try an example.
def read_file(filename):
"""Returns contents of the file.
If the file is not found or if there is any error
in reading the file, returns empty string.
"""
try:
return open(filename).read()
except FileNotFoundError:
return ""
read_file("a.txt")
read_file("nofile.txt")
Problem: Write a function safeint
to convert given string into an integer. The function should accept two arguments, the string to convert and a default value. If the given string is not a valid integer, the default value should be returned.
>>> safeint("3", 0)
3
>>> safeint("NA", 0)
0
Problem: Improve the sumfile.py
we wrote earlier to ignore the invalid numbers after printing a waring message.
$ python sumfile.py num.txt
WARNING: Bad number 'N/A'
WARNING: Bad number 'xxx'
15
%%file num.txt
1
2
3
N/A
4
xxx
5
%%file sumfile.py
import sys
filename = sys.argv[1]
def safeint(value, default):
try:
return int(value)
except ValueError:
print("WARNING: Bad Number", repr(value))
return default
numbers = [safeint(line, 0) for line in open(filename)]
print(sum(numbers))
!python sumfile.py num.txt
There are lot of nice features coming up in Python 3.
def add(x: int, y: int) -> int:
return x+y+"hello"