Transforming Code into Beautiful, Idiomatic Python
11 Dec 2019 | 19 minutes to readIntroduction
I accidentally came across to this video that Raymond Hettinger presented at PyCon US 2013. Even though this was back 6 years in the past, but I still managed to learn new things from it. In this talk, Raymon focuses on these three aspects:
- Replace traditional index manipulation with Python’s core looping idioms
- Learn advanced techniques with
for-else
caluses and the two argument form ofiter()
- Improve your craftmanship and aim for clean, fast, idiomatic Python code
Note: the original presentation was presented in Python 2 but I am putting Python 3 version of it here.
List
Looping over a range of numbers
for i in [0, 1, 2, 3, 4, 5]:
print(i**2)
Better
for i in range(6):
print(i**2)
Both of the code above performs the same task of “printing squared term of numbers range from 0 to 5 (included)”. However, later performs better as it utilizes an iterator so it does not create extra memory space.
Looping over a collection
colors = ['red', 'green', 'blue', 'yellow']
for i in range(len(colors)):
print(colors[i])
Better
colors = ['red', 'green', 'blue', 'yellow']
for color in colors:
print(color)
If Python is not your first programming language, you probably learned to use indices to loop through an array; however, in Python, we can simply loop through by element. This is more of a foreach
loop than for
loop.
Lookping backwards
Let’s say we want to print the above colors
from backwards.
for i in range(len(colors)-1, -1, -1):
print(colors[i])
Better
for color in reversed(colors):
print(color)
Using a simple reversed()
function, we can loop through backwards.
Looping over a collection and indicies
What if we are interested with both item and index?
for i in range(len(colors)):
print(f"{i} --> {colors[i]})
Better
for i, color in enumerate(colors):
print(f"{i} --> {color}")
Using enumerate()
, we can access both item and index at the same time.
Looping over two collections
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']
# calculate minimum length between two arrays and loop through only this amount
# to prevent index out of range error
n = min(len(names), len(colors))
for i in range(n):
print(names[i], '-->', colors[i])
Better
for name, color in zip(names, colors):
print(name, '-->', color)
With zip()
, we can loop through two collections together at the same time, and it will take care of length mis-match by looping through only the shorter amount.
Looping in sorted order
Using sorted()
, we can sort Python list.
for color in sorted(colors):
print(color)
Sorting reversed order
By passing in optional argument reverse=True
, the sorting can be done in a reversed manner.
for color in sorted(colors, reverse=True):
print(color)
Custom sort order
We can also specify the comparison by providing key
optional argument. By default, the element gets compared directly.
print(sorted(colors, key=len))
# ['red', 'blue', 'green', 'yellow']
Iterator
Call a function until a sentinel value
A sentinel value (or a flag value) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm.
Let’s consider a case when we want to loop through a file line by line until it hits an empty line.
blocks = []
while True:
block = f.read(32)
if block == '':
break
blocks.append(block)
Better
blocks = []
for block in iter(partial(f.read, 32), ''):
blocks.append(block)
iter()
returns an iterator object where it acceps optional parameter (sentinel
) which is used to stop the iteration when the value returned is equal to sentinel.
Distinguishing multiple exit points in loops
Let’s consider a problem of finding a target value in a sequence. We can easily perform such calculation using an external flag item like find
in the below example.
def find(seq, target):
for i, value in enumerate(seq):
if value == target:
found = True
break
if not found:
return -1
return i
Better
def find(seq, target):
for i, value in enumerate(seq):
if value == target:
break
else:
return -1
return i
In here else
works like a no break
action where if the look did not break out, the one under else
block gets executed. This provides a tightly bound relationship towards the loop compared to using a flag variable.
Dictionary
Dictionary Skills
- Mastering dictionaries is a fundamental Python skill
- They are fundamental for expressing relationships, linking, counting, and grouping.
Looping over dictionary keys
d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}
for k in d:
print(k)
By default, for ... in
with dictionary loops over the keys
. We can also use keys()
to extract dictionary keys. Finally, we can use dictionary comprehension
to perform similar.
for k in d.keys():
if k.startswith('r'):
del d[k]
d = {k: d[k] for k in d if not k.startswith('r')}
Looping over a dictionary keys and values
for k in d:
print(k, '-->', d[k])
Better
for k, v in d.items():
print(k, '-->', v)
The later performs better as the first method requires to re-hash every key and do a lookup.
Construct a dictionary from pairs
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue']
d = dict(zip(names, colors))
# d == {'raymond': 'red', 'rachel': 'green', 'matthew': 'blue'}
Counting with dictionaries
colors = ['red', 'green', 'red', 'blue', 'green', 'red']
d = {}
for color in colors:
if color not in d:
d[color] = 0
d[color] += 1
Advanced
from collections import defaultdict
d = defaultdict(int)
for color in colors:
d[color] += 1
Similarly we can perform defaultdict
with list
values too.
names = ['raymond', 'rachel', 'matthew', 'roger',
'betty', 'melissa', 'judith', 'charlie']
d = {}
for name in names:
key = len(name)
if key not in d:
d[key] = []
d[key].append(name)
Advanced
d = defaultdict(list)
for name in names:
key = len(name)
d[key].append(name)
Is a dictionary popitem() atomic?
What is atomic?
d = {'matthew': 'blue',
'rachel': 'green',
'raymond': 'red'}
while d:
key, value = d.popitem()
print(key, '-->', value)
Linking dictionaries
import os, argparse
defaults = {'color': 'red', 'user': 'guest'}
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parse.parse_args([])
command_line_args = {k:v for k, v in vars(namespace).items() if b}
"""
The common approach here allows you to use defaults at first, then override them
with environments and then finally with command line arguments.
"""
d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)
Better
d = ChainMap(command_line_args, os.environ, defaults)
Improving Clarity
- Positional arguments and indicies are nice
- Keywords and names are better
- The first way is convenient for the computer
- The second corresponds to how human’s think
Clarify function calls with keyword arguments
twitter_search('@obama', False, 20, True)
Better
twitter_search('@obama', retweets=False, numtweets=20, popular=True)
Clarify multiple return values with named tuples
doctest.testmod() # (0, 4)
Better
TestResults = namedtuple('TestResults', ['failed', 'attempted'])
doctest.testmod() # TestResults(failed=0, attempted=4)
Unpacking Sequence
p = ['Raymond', 'Hettinger', 0x30, 'python@example.com']
fname = p[0]
lname = p[1]
age = p[2]
email = p[3]
Better
fname, lname, age, email = p
Updating multiple state variables
def fibonacci(n):
x = 0
y = 1
for i in range(n):
print(x)
t = y
y = x + y
x = t
Better
def fibonacci(n):
x, y = 0, 1
for i in range(n):
print(x)
x, y = y, x + y
This performs state level update so the new x
and y
values get updated with using old x
and y
.
Tuple packing and unpacking
- Don’t under-estimate the advantages of updating state variables at the same time
- It eliminates an entire class of errors due to out-of-order updates
- It allows high level thinking: “chunking”
Simultaneous state updates
tmp_x = x + dx * t
tmp_y = y + dy * t
tmp_dx = influence(m, x, y, dx, dy, partial='x')
tmp_dy = influence(m, x, y, dx, dy, partial='y')
x = tmp_x
y = tmp_y
dx = tmp_dx
dy = tmp_dy
Better
x, y, dx, dy = (x + dx * t,
y + dy * t,
tmp_dx = influence(m, x, y, dx, dy, partial='x'),
tmp_dy = influence(m, x, y, dx, dy, partial='y'))
Efficiency
- An optimization fundamental rule
- Don’t cause data to move around unnecessarily
- It takes only a little care to avoid $O(n^2)$ behaviour instead of linear behaviour
Concatenating strings
names = ['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie']
s = names[0]
for name in names[1:]:
s += ', ' + name
print(s)
Better
print(', '.join(names))
Updating sequence
names = ['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie']
del names[0]
names.pop(0)
names.insert(0, 'mark')
Better
names = deque(['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie'])
del names[0]
names.popleft()
names.appendleft('mark')
Decorators and Context Managers
- Helps separate business logic from administrative logic
- Clean, beautiful tools for factoring code and improving code reuse
- Good naming is essential
- Remember the Spiderman rule:
With great power, comes great responsivility!
Using decorators to factor-out administrative logic
def web_lookup(url, saved={}):
if url in saved:
return saved[url]
page = urllib.urlopen(url).read()
saved[url] = page
return page
Better
@lru_cache
def web_lookup(url):
return urllib.urlopen(url).read()
(https://orbifold.xyz/local-lru.html)
Factor-out temporary contexts
old_context = getcontext().copy()
getcontext().prec = 50
print(Decimal(355)/Decimal(113))
setcontect(old_context)
Better
with localcontext(Context(prec=50)):
print(Decimal(355)/Decimal(113))
How to open and close files
f = open('data.txt')
try:
data = f.read()
finally:
f.close()
Better
with open('data.txt') as f:
data = f.read()
How to use locks
lock = threading.Lock()
lock.acquire()
try:
print('Critical section 1')
print('Critical seciton 2')
finally:
lock.release()
Better
with lock:
print('Critical section 1')
print('Critical section 2')
Factor-out temporary contexts
try:
os.remove('somefile.tmp')
except FileNotFoundError:
pass
Better
from contextlib import suppress
with suppress(FileNotFoundError):
os.remove('somefile.tmp')
with open('help.txt', 'w') as f:
oldstdout = sys.stdout
sys.stdout = f
try:
help(pow)
finally:
sys.stdout = oldstdout
with open('help.txt', 'w') as f:
with redirect_stdout(f):
help(pow)
Concise expressive one-liners
Two conflicting rules:
- Don’t put too much on one line
- Don’t break atoms of thought into subatomic particles
Raymond’s rule:
- One logical line of code equals one sentence in English.
List comprehensions and Generator expressions
result = []
for i in range(10):
s = i ** 2
result.append(s)
print(sum(result))
print(sum([i**2 for i in range(10)]))
Best
print(sum(i**2 for i in range(10)))
Reference
- https://github.com/JeffPaine/beautiful_idiomatic_python
- https://www.youtube.com/watch?feature=player_embedded&v=OSGv2VnC0go