hmdprs / idiomatic-python

Transforming Code into Beautiful, Idiomatic Python

Notes from Raymond Hettinger's talk at pycon US 2013 video, slides.

Looping over a Range of Numbers

for i in [0, 1, 2, 3, 4, 5]:
    print(i ** 2)

# better
for i in range(6):
    print(i ** 2)

Looping over a Collection

colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)):
    print(colors[i])

# better
for color in colors:
    print(color)

Looping Backwards

colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)-1, -1, -1):
    print(colors[i])

# better
for color in reversed(colors):
    print(color)

Looping over a Collection and Indices

colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)):
    print(i, '--->', colors[i])

# better
for i, color in enumerate(colors):
    print(i, '--->', color)

It's fast and beautiful and saves you from tracking the individual indices and incrementing them.
Whenever you find yourself manipulating indices [in a collection], you're probably doing it wrong.

Looping over Two Collections

names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']

n = min(len(names), len(colors))
for i in range(n):
    print(names[i], '--->', colors[i])

# better
for name, color in zip(names, colors):
    print(name, '--->', color)

Looping in Sorted Order

colors = ['red', 'green', 'blue', 'yellow']

# forward sorted order
for color in sorted(colors):
    print(color)

# backwards sorted order
for color in sorted(colors, reverse=True):
    print(color)

Custom Sort Order

colors = ['red', 'green', 'blue', 'yellow']

def compare_length(c1, c2):
    if len(c1) < len(c2): return -1
    if len(c1) > len(c2): return 1
    return 0

print(sorted(colors, cmp=compare_length))

It is slow and unpleasant to write. Also, comparison functions are no longer available in Python 3.

# better
print(sorted(colors, key=len))

Call a Function until a Sentinel Value

blocks = []
while True:
    block = f.read(32)
    if block == '':
        break
    blocks.append(block)

# better
blocks = []
for block in iter(partial(f.read, 32), ''):
    blocks.append(block)

iter takes two arguments. The first you call over and over again and the second is a sentinel value.

Distinguishing Multiple Exit Points in Loops

def find(seq, target):
    found = False
    for i, value in enumerate(seq):
        if value == target:
            found = True
            break
    if not found:
        return -1
    return i

# better
def find(seq, target):
    for i, value in enumerate(seq):
        if value == target:
            break
    else:
        return -1
    return i

Inside of every for loop is an else.

Looping over Dictionary Keys

d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

for k in d:
    print(k)

for k in d.keys():
    if k.startswith('r'):
        del d[k]

When should you use the second and not the first?

When you're mutating the dictionary.
If you mutate something while you're iterating over it, you're living in a state of sin and deserve what ever happens to you.

d.keys() makes a copy of all the keys and stores them in a list. Then you can modify the dictionary. Note: in Python 3 to iterate through a dictionary you have to explicitly write: list(d.keys()) because d.keys() returns a "dictionary view" (an iterable that provide a dynamic view on the dictionary’s keys). See documentation.

Looping over Dictionary Keys and Values

# not very fast, has to re-hash every key and do a lookup
for k in d:
    print(k, '--->', d[k])

# better
for k, v in d.items():
    print(k, '--->', v)

items() is better as it returns an iterator.

Construct a Dictionary from Pairs

names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue']

d = dict(zip(names, colors))
# {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

Counting with Dictionaries

colors = ['red', 'green', 'red', 'blue', 'green', 'red']

# simple and basic way to count
d = {}
for color in colors:
    if color not in d:
        d[color] = 0
    d[color] += 1
# {'blue': 1, 'green': 2, 'red': 3}

# better
d = {}
for color in colors:
    d[color] = d.get(color, 0) + 1

# more modern but has several caveats, better for advanced users
d = collections.defaultdict(int)
for color in colors:
    d[color] += 1

Grouping with Dictionaries

names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

# grouping by name length
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = []
    d[key].append(name)
# {5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}

d = {}
for name in names:
    key = len(name)
    d.setdefault(key, []).append(name)

# better
import collections
d = collections.defaultdict(list)
for name in names:
    key = len(name)
    d[key].append(name)

Is a Dictionary `popitem()` atomic?

d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

while d:
    key, value = d.popitem()
    print(key, '-->', value)

popitem is atomic so you don't have to put locks around it to use it in threads.

Linking Dictionaries

defaults = {'color': 'red', 'user': 'guest'}

import argparse, os
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args([])
command_line_args = {k:v for k, v in vars(namespace).items() if v}

# use defaults, override them with environment variables, override them with command line arguments
d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)

# better
import collections
d = collections.ChainMap(command_line_args, os.environ, defaults)

Improving Clarity

Positional arguments and indicies are nice
Keywords and names are better
The first way is convenient for the computer
The second corresponds to how human’s think

Clarify Function Calls with Keyword Arguments

twitter_search('@obama', False, 20, True)

# better
twitter_search('@obama', retweets=False, numtweets=20, popular=True)

This is slightly (microseconds) slower but is worth it for the code clarity and developer time savings.

Clarify Multiple Return Values with `namedtuples`

import doctest
doctest.testmod()
# (0, 4)  # is this good or bad? it's not clear.

# better
from collections import namedtuple
TestResults = namedtuple('TestResults', ['failed', 'attempted'])

doctest.testmod()
# TestResults(failed=0, attempted=4)

A named tuple is a subclass of tuple so they still work like a regular tuple, but are more friendly.

Unpacking Sequences

p = 'Raymond', 'Hettinger', 0x30, 'python@example.com'

fname = p[0]
lname = p[1]
age = p[2]
email = p[3]

# better, tuple unpacking
fname, lname, age, email = p

Updating Multiple State Variables

def fibonacci(n):
    x = 0
    y = 1
    for i in range(n):
        print(x)
        t = y
        y = x + y
        x = t

x and y are state, and state should be updated all at once or in between lines that state is mis-matched and a common source of issues.
Ordering matters.
It's too low-level.

# better
def fibonacci(n):
    x, y = 0, 1
    for i in range(n):
        print(x)
        x, y = y, x + y

It is more high-level, doesn't risk getting the order wrong and is fast.

Simultaneous State Updates

tmp_x = x + dx * t
tmp_y = y + dy * t
tmp_dx = influence(m, x, y, dx, dy, partial='x')
tmp_dy = influence(m, x, y, dx, dy, partial='y')
x = tmp_x
y = tmp_y
dx = tmp_dx
dy = tmp_dy

The influence function here is just an example function. The important part is how to manage updating multiple variables at once.

# better
x, y, dx, dy = (x + dx * t,
                y + dy * t,
                influence(m, x, y, dx, dy, partial='x'),
                influence(m, x, y, dx, dy, partial='y'))

Efficiency

An optimization fundamental rule
Don’t cause data to move around unnecessarily
It takes only a little care to avoid O(n**2) behavior instead of linear behavior
Don't move data around unecessarily.

Concatenating Strings

names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

s = names[0]
for name in names[1:]:
    s += ', ' + name
print(s)

# better
print(', '.join(names))

Updating Sequences

names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

del names[0]
names.pop(0)
names.insert(0, 'mark')

The last two lines are signs you're using the wrong data structure

# better
import collections
names = collections.deque(['raymond', 'rachel', 'matthew', 'roger',
               'betty', 'melissa', 'judith', 'charlie'])

del names[0]
names.popleft()
names.appendleft('mark')

Decorators and Context Managers

Helps separate business logic from administrative logic
Clean, beautiful tools for factoring code and improving code reuse
Good naming is essential.
Remember the Spiderman rule: With great power, comes great responsibility!

Using Decorators to Factor-out Administrative Logic

def web_lookup(url, saved={}):
    if url in saved:
        return saved[url]
    page = urllib.urlopen(url).read()
    saved[url] = page
    return page

It mixes business / administrative logic and is not reusable.

# better
@cache
def web_lookup(url):
    return urllib.urlopen(url).read()

Since Python 3.2 there is a decorator for this in the standard library: functools.lru_cache.

Factor-out Temporary Contexts

# saving the old, restore the new
old_context = getcontext().copy()
getcontext().prec = 50
print(Decimal(355) / Decimal(113))
setcontext(old_context)

# better
with localcontext(Context(prec=50)):
    print(Decimal(355) / Decimal(113))

How to Open and Close Files

f = open('data.txt')
try:
    data = f.read()
finally:
    f.close()

# better
with open('data.txt') as f:
    data = f.read()

How to use Locks

# make a lock
lock = threading.Lock()

lock.acquire()
try:
    print('Critical section 1')
    print('Critical section 2')
finally:
    lock.release()

# better
with lock:
    print('Critical section 1')
    print('Critical section 2')

Factor-out Temporary Contexts

try:
    os.remove('somefile.tmp')
except OSError:
    pass

# better
from contextlib import suppress
with suppress(OSError):
    os.remove('somefile.tmp')

To make your own suppress context manager in the meantime:

@contextmanager
def ignored(*exceptions):
    try:
        yield
    except exceptions:
        pass

Factor-out Temporary Contexts

# temporarily redirect standard out to a file and then return it to normal
with open('help.txt', 'w') as f:
    oldstdout = sys.stdout
    sys.stdout = f
    try:
        help(pow)
    finally:
        sys.stdout = oldstdout

# better
from contextlib import redirect_stdout
with open('help.txt', 'w') as f:
    with redirect_stdout(f):
        help(pow)

To roll your own redirect_stdout context manager:

@contextmanager
def redirect_stdout(fileobj):
    oldstdout = sys.stdout
    sys.stdout = fileobj
    try:
        yield fileobj
    finally:
        sys.stdout = oldstdout

Concise Expressive One-Liners

Two conflicting rules:

Don’t put too much on one line
Don’t break atoms of thought into subatomic particles

Raymond’s rule:

One logical line of code equals one sentence in English

List Comprehensions and Generator Expressions

# tell what you do
result = []
for i in range(10):
    s = i ** 2
    result.append(s)
print(sum(result))

# better, tell what you want
print(sum(i ** 2 for i in range(10)))

Aug	SEP	Oct
	04
2019	2020	2021

hmdprs / idiomatic-python

README.md

Transforming Code into Beautiful, Idiomatic Python

Looping over a Range of Numbers

Looping over a Collection

Looping Backwards

Looping over a Collection and Indices

Looping over Two Collections

Looping in Sorted Order

Custom Sort Order

Call a Function until a Sentinel Value

Distinguishing Multiple Exit Points in Loops

Looping over Dictionary Keys

Looping over Dictionary Keys and Values

Construct a Dictionary from Pairs

Counting with Dictionaries

Grouping with Dictionaries

Is a Dictionary `popitem()` atomic?

Linking Dictionaries

Improving Clarity

Clarify Function Calls with Keyword Arguments

Clarify Multiple Return Values with `namedtuples`

Unpacking Sequences

Updating Multiple State Variables

Simultaneous State Updates

Efficiency

Concatenating Strings

Updating Sequences

Decorators and Context Managers

Using Decorators to Factor-out Administrative Logic

Factor-out Temporary Contexts

How to Open and Close Files

How to use Locks

Factor-out Temporary Contexts

Factor-out Temporary Contexts

Concise Expressive One-Liners

List Comprehensions and Generator Expressions

About

Releases

Packages

hmdprs / idiomatic-python

Join GitHub today

Clone with HTTPS

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

Transforming Code into Beautiful, Idiomatic Python

Looping over a Range of Numbers

Looping over a Collection

Looping Backwards

Looping over a Collection and Indices

Looping over Two Collections

Looping in Sorted Order

Custom Sort Order

Call a Function until a Sentinel Value

Distinguishing Multiple Exit Points in Loops

Looping over Dictionary Keys

Looping over Dictionary Keys and Values

Construct a Dictionary from Pairs

Counting with Dictionaries

Grouping with Dictionaries

Is a Dictionary popitem() atomic?

Linking Dictionaries

Improving Clarity

Clarify Function Calls with Keyword Arguments

Clarify Multiple Return Values with namedtuples

Unpacking Sequences

Updating Multiple State Variables

Simultaneous State Updates

Efficiency

Concatenating Strings

Updating Sequences

Decorators and Context Managers

Using Decorators to Factor-out Administrative Logic

Factor-out Temporary Contexts

How to Open and Close Files

How to use Locks

Factor-out Temporary Contexts

Factor-out Temporary Contexts

Concise Expressive One-Liners

List Comprehensions and Generator Expressions

About

Topics

Resources

Releases

Packages 0

Is a Dictionary `popitem()` atomic?

Clarify Multiple Return Values with `namedtuples`

Packages