To start this guide, download this zip file.
Counting
Dictionaries make it easy to count items. For example let’s say we wanted to count the number of vowels in a string. Here is what this program should do:
% python vowel_counts.py 'Hello. How are you?'
{'a': 1', 'e': 2, 'i': 0, 'o': 3, 'u': 1}
Notice that for the string Hello. How are you?
we have created a dictionary
that maps each vowel to the number of times it appears:
- a: 1
- e: 2
- i: 0
- o: 3
- u: 1
To see how we can do this, take a look at this function:
def count(letters, text):
# create an empty dictionary
counts = {}
# loop through all of the letters we are counting
# and initialize their counts to zero
for letter in letters:
counts[letter] = 0
# loop through all of the letters in the text
# be sure to convert to lowercase
for c in text.lower():
# if this letter is one we are counting, add 1 to its count
if c in counts:
counts[c] += 1
# return the dictionary
return counts
This function takes a set of letters to count and a string. For example, we could call this with:
vowel_counts = count('aeiou', text)
In this function we:
- create an empty dictionary
- loop through all of the letters we are counting and initialize their counts to zero
- loop through all of the letters in the text
- if the letter we are looking at is one of the ones we are counting, then add one to its count
Here is a program that uses this function, which you can find in
vowel_counts.py
:
import sys
def count(letters, text):
# create an empty dictionary
counts = {}
# loop through all of the letters we are counting
# and initialize their counts to zero
for letter in letters:
counts[letter] = 0
# loop through all of the letters in the text
# be sure to convert to lowercase
for c in text.lower():
# if this letter is one we are counting, add 1 to its count
if c in counts:
counts[c] += 1
# return the dictionary
return counts
def main(text):
# count how many times each vowel occurs in the text
vowel_counts = count('aeiou', text)
# print out the dictionary
print(vowel_counts)
if __name__ == '__main__':
main(sys.argv[1])
We can test this program by giving it another string:
% python vowel_counts.py "I am going to double major in Computer Science and Journalism"
{'a': 4, 'e': 4, 'i': 5, 'o': 6, 'u': 3}
Looks like it works!
States
To practice this, we are going to write a program that has a group of people enter their home state or country. After all of the places are entered, the program then prints out how many people are from each place. For example:
% python place_count.py
State or Country: Delaware
State or Country: Montana
State or Country: Pakistan
State or Country: Iran
State or Country: Montana
State or Country: Pakistan
State or Country: India
State or Country: California
State or Country:
{'Delaware': 1, 'Montana': 2, 'Pakistan': 2, 'Iran': 1, 'India': 1, 'California': 1}
Here is a function to do compute the dictionary:
def get_places():
# create an empty dictionary
places = {}
while True:
# get a place
place = input('State or Country: ')
# break if we are done
if not place:
break
# if this place is not in the dictionary yet
# then initialize this place to zero
if place not in places:
places[place] = 0
# increment this place by one
# this doesn't cause an error because we were sure
# to initialize it to zero above
places[place] += 1
# return the dictionary
return places
Notice that this follows a similar pattern as when we counted values. However, the difference here is that we don’t know the keys for the dictionary in advance. If we are counting vowels, the keys are always “aeiou”. But for this problem, the keys are whatever states and countries people enter.
We can handle this problem by using this code:
if place not in places:
places[place] = 0
Whenever we find a place that is not in the dictionary, then we initailize its value to zero.
Here is a complete program using this function, which you can find in
places_count.py
:
def get_places():
# create an empty dictionary
places = {}
while True:
# get a place
place = input('State or Country: ')
# break if we are done
if not place:
break
# if this place is not in the dictionary yet
# then initialize this place to zero
if place not in places:
places[place] = 0
# increment this place by one
# this doesn't cause an error because we were sure
# to initialize it to zero above
places[place] += 1
# return the dictionary
return places
def main():
places = get_places()
print(places)
if __name__ == '__main__':
main()
Removing punctuation
Counting words
For this program, we are going to count all times each word occurs in a file. But we need to ignore both case and punctuation. This is important because if the file contains:
Twinkle, twinkle, little star,
how I wonder, what you are!
Up above the world so high,
like a diamond in the sky.
Twinkle, twinkle, little star,
how I wonder what you are!
Then we need “Twinkle” to be counted the same as “twinkle”, and we need to remove commas and exclamation points.
Reading the file as a long string
When we want to count words in a file, we could read the file as a list of
lines, like we usually do, and then split each line into words. However, a
simpler thing to do is to read the file as one long string. Then you can split
this long string into words all at once using split()
.
Here is how to read a file as one long string:
def readfile(filename):
with open(filename) as file:
return file.read()
This function uses file.read()
instead of file.readlines()
:
file.read()
— read an entire file and return it as one long string:
'Line one\n, Line two\n, Line three\n'
file.readlines()
— read an entire file and return it as a list of strings, one per line in the file:
['Line one\n', 'Line two\n', 'Line three\n']
Removing punctuation
To remove punctuation, we can use strip()
. Normally, strip()
removes all
leading and trailing white space. But if we give it a string as an argument,
then we can remove all trailing and leading characters that are in the string.
For example, this will remove just exclamation points and question marks:
word = word.strip('!?')
To remove all punctuation, you could imagine trying to list all the punctuation characters in something like:
word = word.strip('.,?!#@$%^&*()')
However, with this strategy it can be easy to overlook something. Instead, python can provide us with a full list of all the punctuation characters:
from string import punctuation
word = word.strip(punctuation)
A function to count words
Here is a function that will count words in a long string (containing multiple lines):
from string import punctuation
def count_words(content):
"""Count the number of each word in content.
Ignore casing and punctuation."""
# create an empty dictionary
counts = {}
# loop through all of the words, first converting to lowercase
# and then splitting them using white space
for word in content.lower().split():
# strip any leading or trailing punctuation from the word
word = word.strip(punctuation)
# if the word is not in the dictionary,
# initialize an entry to zero
if word not in counts:
counts[word] = 0
# increment the count by one for this word
counts[word] += 1
# return the dictionary
return counts
The two important things to notice here are:
- we convert the content to lowercase using
lower()
before we split it into words usingsplit()
- we remove all of the punctuation using
strip()
Otherwise, this follows the same pattern as counting places.
The file count_words.py
contains a complete program:
import sys
from string import punctuation
def readfile(filename):
with open(filename) as file:
return file.read()
def count_words(content):
"""Count the number of each word in content.
Ignore casing and punctuation."""
# create an empty dictionary
counts = {}
# loop through all of the words, first converting to lowercase
# and then splitting them using white space
for word in content.lower().split():
# strip any leading or trailing punctuation from the word
word = word.strip(punctuation)
# if the word is not in the dictionary,
# initialize an entry to zero
if word not in counts:
counts[word] = 0
# increment the count by one for this word
counts[word] += 1
# return the dictionary
return counts
def main(filename):
# read the file
content = readfile(filename)
# count how many times each word appears
counts = count_words(content)
# print the counts dictionary
print(counts)
if __name__ == '__main__':
main(sys.argv[1])
You can run this using the file twinkle.txt
:
python count_words.py twinkle.txt
{'twinkle': 4, 'little': 2, 'star': 2, 'how': 2, 'i': 2, 'wonder': 2,
'what': 2, 'you': 2, 'are': 2, 'up': 1, 'above': 1, 'the': 2, 'world': 1,
'so': 1, 'high': 1, 'like': 1, 'a': 1, 'diamond': 1, 'in': 1, 'sky': 1}