To start this guide, download this zip file.
Counting
Dictionaries make it easy to count items. For example let’s say we wanted to count the number of vowels in a string. Here is what this program should do:
% python vowel_counts.py 'Hello. How are you?'
{'a': 1', 'e': 2, 'i': 0, 'o': 3, 'u': 1}Notice that for the string Hello. How are you? we have created a dictionary
that maps each vowel to the number of times it appears:
- a: 1
- e: 2
- i: 0
- o: 3
- u: 1
To see how we can do this, take a look at this function:
def count(letters, text):
    # create an empty dictionary
    counts = {}
    # loop through all of the letters we are counting
    # and initialize their counts to zero
    for letter in letters:
        counts[letter] = 0
    # loop through all of the letters in the text
    # be sure to convert to lowercase
    for c in text.lower():
        # if this letter is one we are counting, add 1 to its count
        if c in counts:
            counts[c] += 1
    # return the dictionary
    return countsThis function takes a set of letters to count and a string. For example, we could call this with:
vowel_counts = count('aeiou', text)In this function we:
- create an empty dictionary
- loop through all of the letters we are counting and initialize their counts to zero
- loop through all of the letters in the text
- if the letter we are looking at is one of the ones we are counting, then add one to its count
 
Here is a program that uses this function, which you can find in
vowel_counts.py:
import sys
def count(letters, text):
    # create an empty dictionary
    counts = {}
    # loop through all of the letters we are counting
    # and initialize their counts to zero
    for letter in letters:
        counts[letter] = 0
    # loop through all of the letters in the text
    # be sure to convert to lowercase
    for c in text.lower():
        # if this letter is one we are counting, add 1 to its count
        if c in counts:
            counts[c] += 1
    # return the dictionary
    return counts
def main(text):
    # count how many times each vowel occurs in the text
    vowel_counts = count('aeiou', text)
    # print out the dictionary
    print(vowel_counts)
if __name__ == '__main__':
    main(sys.argv[1])We can test this program by giving it another string:
% python vowel_counts.py "I am going to double major in Computer Science and Journalism"
{'a': 4, 'e': 4, 'i': 5, 'o': 6, 'u': 3}Looks like it works!

States
To practice this, we are going to write a program that has a group of people enter their home state or country. After all of the places are entered, the program then prints out how many people are from each place. For example:
% python place_count.py
State or Country: Delaware
State or Country: Montana
State or Country: Pakistan
State or Country: Iran
State or Country: Montana
State or Country: Pakistan
State or Country: India
State or Country: California
State or Country:
{'Delaware': 1, 'Montana': 2, 'Pakistan': 2, 'Iran': 1, 'India': 1, 'California': 1}Here is a function to do compute the dictionary:
def get_places():
    # create an empty dictionary
    places = {}
    while True:
        # get a place
        place = input('State or Country: ')
        # break if we are done
        if not place:
            break
        # if this place is not in the dictionary yet
        # then initialize this place to zero
        if place not in places:
            places[place] = 0
        # increment this place by one
        # this doesn't cause an error because we were sure
        # to initialize it to zero above
        places[place] += 1
    # return the dictionary
    return placesNotice that this follows a similar pattern as when we counted values. However, the difference here is that we don’t know the keys for the dictionary in advance. If we are counting vowels, the keys are always “aeiou”. But for this problem, the keys are whatever states and countries people enter.
We can handle this problem by using this code:
if place not in places:
    places[place] = 0Whenever we find a place that is not in the dictionary, then we initailize its value to zero.
Here is a complete program using this function, which you can find in
places_count.py:
def get_places():
    # create an empty dictionary
    places = {}
    while True:
        # get a place
        place = input('State or Country: ')
        # break if we are done
        if not place:
            break
        # if this place is not in the dictionary yet
        # then initialize this place to zero
        if place not in places:
            places[place] = 0
        # increment this place by one
        # this doesn't cause an error because we were sure
        # to initialize it to zero above
        places[place] += 1
    # return the dictionary
    return places
def main():
    places = get_places()
    print(places)
if __name__ == '__main__':
    main()Removing punctuation
Counting words
For this program, we are going to count all times each word occurs in a file. But we need to ignore both case and punctuation. This is important because if the file contains:
Twinkle, twinkle, little star,
how I wonder, what you are!
Up above the world so high,
like a diamond in the sky.
Twinkle, twinkle, little star,
how I wonder what you are!Then we need “Twinkle” to be counted the same as “twinkle”, and we need to remove commas and exclamation points.
Reading the file as a long string
When we want to count words in a file, we could read the file as a list of
lines, like we usually do, and then split each line into words. However, a
simpler thing to do is to read the file as one long string. Then you can split
this long string into words all at once using split().
Here is how to read a file as one long string:
def readfile(filename):
    with open(filename) as file:
        return file.read()This function uses file.read() instead of file.readlines():
- file.read()— read an entire file and return it as one long string:
'Line one\n, Line two\n, Line three\n'- file.readlines()— read an entire file and return it as a list of strings, one per line in the file:
['Line one\n', 'Line two\n', 'Line three\n']Removing punctuation
To remove punctuation, we can use strip(). Normally, strip() removes all
leading and trailing white space. But if we give it a string as an argument,
then we can remove all trailing and leading characters that are in the string.
For example, this will remove just exclamation points and question marks:
word = word.strip('!?')To remove all punctuation, you could imagine trying to list all the punctuation characters in something like:
word = word.strip('.,?!#@$%^&*()')However, with this strategy it can be easy to overlook something. Instead, python can provide us with a full list of all the punctuation characters:
from string import punctuation
word = word.strip(punctuation)
A function to count words
Here is a function that will count words in a long string (containing multiple lines):
from string import punctuation
def count_words(content):
    """Count the number of each word in content.
    Ignore casing and punctuation."""
    # create an empty dictionary
    counts = {}
    # loop through all of the words, first converting to lowercase
    # and then splitting them using white space
    for word in content.lower().split():
        # strip any leading or trailing punctuation from the word
        word = word.strip(punctuation)
        # if the word is not in the dictionary,
        # initialize an entry to zero
        if word not in counts:
            counts[word] = 0
        # increment the count by one for this word
        counts[word] += 1
    # return the dictionary
    return countsThe two important things to notice here are:
- we convert the content to lowercase using lower()before we split it into words usingsplit()
- we remove all of the punctuation using strip()
Otherwise, this follows the same pattern as counting places.
The file count_words.py contains a complete program:
import sys
from string import punctuation
def readfile(filename):
    with open(filename) as file:
        return file.read()
def count_words(content):
    """Count the number of each word in content.
    Ignore casing and punctuation."""
    # create an empty dictionary
    counts = {}
    # loop through all of the words, first converting to lowercase
    # and then splitting them using white space
    for word in content.lower().split():
        # strip any leading or trailing punctuation from the word
        word = word.strip(punctuation)
        # if the word is not in the dictionary,
        # initialize an entry to zero
        if word not in counts:
            counts[word] = 0
        # increment the count by one for this word
        counts[word] += 1
    # return the dictionary
    return counts
def main(filename):
    # read the file
    content = readfile(filename)
    # count how many times each word appears
    counts = count_words(content)
    # print the counts dictionary
    print(counts)
if __name__ == '__main__':
    main(sys.argv[1])You can run this using the file twinkle.txt:
python count_words.py twinkle.txt
{'twinkle': 4, 'little': 2, 'star': 2, 'how': 2, 'i': 2, 'wonder': 2,
 'what': 2, 'you': 2, 'are': 2, 'up': 1, 'above': 1, 'the': 2, 'world': 1,
 'so': 1, 'high': 1, 'like': 1, 'a': 1, 'diamond': 1, 'in': 1, 'sky': 1}