Transforming Code into Beautiful, Idiomatic Python
11 Dec 2019 | 19 minutes to readIntroduction
I accidentally came across to this video that Raymond Hettinger presented at PyCon US 2013. Even though this was back 6 years in the past, but I still managed to learn new things from it. In this talk, Raymon focuses on these three aspects:
- Replace traditional index manipulation with Python’s core looping idioms
- Learn advanced techniques with
for-elsecaluses and the two argument form ofiter() - Improve your craftmanship and aim for clean, fast, idiomatic Python code
Note: the original presentation was presented in Python 2 but I am putting Python 3 version of it here.
List
Looping over a range of numbers
for i in [0, 1, 2, 3, 4, 5]:
print(i**2)
Better
for i in range(6):
print(i**2)
Both of the code above performs the same task of “printing squared term of numbers range from 0 to 5 (included)”. However, later performs better as it utilizes an iterator so it does not create extra memory space.
Looping over a collection
colors = ['red', 'green', 'blue', 'yellow']
for i in range(len(colors)):
print(colors[i])
Better
colors = ['red', 'green', 'blue', 'yellow']
for color in colors:
print(color)
If Python is not your first programming language, you probably learned to use indices to loop through an array; however, in Python, we can simply loop through by element. This is more of a foreach loop than for loop.
Lookping backwards
Let’s say we want to print the above colors from backwards.
for i in range(len(colors)-1, -1, -1):
print(colors[i])
Better
for color in reversed(colors):
print(color)
Using a simple reversed() function, we can loop through backwards.
Looping over a collection and indicies
What if we are interested with both item and index?
for i in range(len(colors)):
print(f"{i} --> {colors[i]})
Better
for i, color in enumerate(colors):
print(f"{i} --> {color}")
Using enumerate(), we can access both item and index at the same time.
Looping over two collections
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']
# calculate minimum length between two arrays and loop through only this amount
# to prevent index out of range error
n = min(len(names), len(colors))
for i in range(n):
print(names[i], '-->', colors[i])
Better
for name, color in zip(names, colors):
print(name, '-->', color)
With zip(), we can loop through two collections together at the same time, and it will take care of length mis-match by looping through only the shorter amount.
Looping in sorted order
Using sorted(), we can sort Python list.
for color in sorted(colors):
print(color)
Sorting reversed order
By passing in optional argument reverse=True, the sorting can be done in a reversed manner.
for color in sorted(colors, reverse=True):
print(color)
Custom sort order
We can also specify the comparison by providing key optional argument. By default, the element gets compared directly.
print(sorted(colors, key=len))
# ['red', 'blue', 'green', 'yellow']
Iterator
Call a function until a sentinel value
A sentinel value (or a flag value) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm.
Let’s consider a case when we want to loop through a file line by line until it hits an empty line.
blocks = []
while True:
block = f.read(32)
if block == '':
break
blocks.append(block)
Better
blocks = []
for block in iter(partial(f.read, 32), ''):
blocks.append(block)
iter() returns an iterator object where it acceps optional parameter (sentinel) which is used to stop the iteration when the value returned is equal to sentinel.
Distinguishing multiple exit points in loops
Let’s consider a problem of finding a target value in a sequence. We can easily perform such calculation using an external flag item like find in the below example.
def find(seq, target):
for i, value in enumerate(seq):
if value == target:
found = True
break
if not found:
return -1
return i
Better
def find(seq, target):
for i, value in enumerate(seq):
if value == target:
break
else:
return -1
return i
In here else works like a no break action where if the look did not break out, the one under else block gets executed. This provides a tightly bound relationship towards the loop compared to using a flag variable.
Dictionary
Dictionary Skills
- Mastering dictionaries is a fundamental Python skill
- They are fundamental for expressing relationships, linking, counting, and grouping.
Looping over dictionary keys
d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}
for k in d:
print(k)
By default, for ... in with dictionary loops over the keys. We can also use keys() to extract dictionary keys. Finally, we can use dictionary comprehension to perform similar.
for k in d.keys():
if k.startswith('r'):
del d[k]
d = {k: d[k] for k in d if not k.startswith('r')}
Looping over a dictionary keys and values
for k in d:
print(k, '-->', d[k])
Better
for k, v in d.items():
print(k, '-->', v)
The later performs better as the first method requires to re-hash every key and do a lookup.
Construct a dictionary from pairs
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue']
d = dict(zip(names, colors))
# d == {'raymond': 'red', 'rachel': 'green', 'matthew': 'blue'}
Counting with dictionaries
colors = ['red', 'green', 'red', 'blue', 'green', 'red']
d = {}
for color in colors:
if color not in d:
d[color] = 0
d[color] += 1
Advanced
from collections import defaultdict
d = defaultdict(int)
for color in colors:
d[color] += 1
Similarly we can perform defaultdict with list values too.
names = ['raymond', 'rachel', 'matthew', 'roger',
'betty', 'melissa', 'judith', 'charlie']
d = {}
for name in names:
key = len(name)
if key not in d:
d[key] = []
d[key].append(name)
Advanced
d = defaultdict(list)
for name in names:
key = len(name)
d[key].append(name)
Is a dictionary popitem() atomic?
What is atomic?
d = {'matthew': 'blue',
'rachel': 'green',
'raymond': 'red'}
while d:
key, value = d.popitem()
print(key, '-->', value)
Linking dictionaries
import os, argparse
defaults = {'color': 'red', 'user': 'guest'}
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parse.parse_args([])
command_line_args = {k:v for k, v in vars(namespace).items() if b}
"""
The common approach here allows you to use defaults at first, then override them
with environments and then finally with command line arguments.
"""
d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)
Better
d = ChainMap(command_line_args, os.environ, defaults)
Improving Clarity
- Positional arguments and indicies are nice
- Keywords and names are better
- The first way is convenient for the computer
- The second corresponds to how human’s think
Clarify function calls with keyword arguments
twitter_search('@obama', False, 20, True)
Better
twitter_search('@obama', retweets=False, numtweets=20, popular=True)
Clarify multiple return values with named tuples
doctest.testmod() # (0, 4)
Better
TestResults = namedtuple('TestResults', ['failed', 'attempted'])
doctest.testmod() # TestResults(failed=0, attempted=4)
Unpacking Sequence
p = ['Raymond', 'Hettinger', 0x30, 'python@example.com']
fname = p[0]
lname = p[1]
age = p[2]
email = p[3]
Better
fname, lname, age, email = p
Updating multiple state variables
def fibonacci(n):
x = 0
y = 1
for i in range(n):
print(x)
t = y
y = x + y
x = t
Better
def fibonacci(n):
x, y = 0, 1
for i in range(n):
print(x)
x, y = y, x + y
This performs state level update so the new x and y values get updated with using old x and y.
Tuple packing and unpacking
- Don’t under-estimate the advantages of updating state variables at the same time
- It eliminates an entire class of errors due to out-of-order updates
- It allows high level thinking: “chunking”
Simultaneous state updates
tmp_x = x + dx * t
tmp_y = y + dy * t
tmp_dx = influence(m, x, y, dx, dy, partial='x')
tmp_dy = influence(m, x, y, dx, dy, partial='y')
x = tmp_x
y = tmp_y
dx = tmp_dx
dy = tmp_dy
Better
x, y, dx, dy = (x + dx * t,
y + dy * t,
tmp_dx = influence(m, x, y, dx, dy, partial='x'),
tmp_dy = influence(m, x, y, dx, dy, partial='y'))
Efficiency
- An optimization fundamental rule
- Don’t cause data to move around unnecessarily
- It takes only a little care to avoid $O(n^2)$ behaviour instead of linear behaviour
Concatenating strings
names = ['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie']
s = names[0]
for name in names[1:]:
s += ', ' + name
print(s)
Better
print(', '.join(names))
Updating sequence
names = ['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie']
del names[0]
names.pop(0)
names.insert(0, 'mark')
Better
names = deque(['raymond', 'rachel', 'matthew', 'roger',
'bettey', 'melissa', 'judith', 'charlie'])
del names[0]
names.popleft()
names.appendleft('mark')
Decorators and Context Managers
- Helps separate business logic from administrative logic
- Clean, beautiful tools for factoring code and improving code reuse
- Good naming is essential
- Remember the Spiderman rule:
With great power, comes great responsivility!
Using decorators to factor-out administrative logic
def web_lookup(url, saved={}):
if url in saved:
return saved[url]
page = urllib.urlopen(url).read()
saved[url] = page
return page
Better
@lru_cache
def web_lookup(url):
return urllib.urlopen(url).read()
(https://orbifold.xyz/local-lru.html)
Factor-out temporary contexts
old_context = getcontext().copy()
getcontext().prec = 50
print(Decimal(355)/Decimal(113))
setcontect(old_context)
Better
with localcontext(Context(prec=50)):
print(Decimal(355)/Decimal(113))
How to open and close files
f = open('data.txt')
try:
data = f.read()
finally:
f.close()
Better
with open('data.txt') as f:
data = f.read()
How to use locks
lock = threading.Lock()
lock.acquire()
try:
print('Critical section 1')
print('Critical seciton 2')
finally:
lock.release()
Better
with lock:
print('Critical section 1')
print('Critical section 2')
Factor-out temporary contexts
try:
os.remove('somefile.tmp')
except FileNotFoundError:
pass
Better
from contextlib import suppress
with suppress(FileNotFoundError):
os.remove('somefile.tmp')
with open('help.txt', 'w') as f:
oldstdout = sys.stdout
sys.stdout = f
try:
help(pow)
finally:
sys.stdout = oldstdout
with open('help.txt', 'w') as f:
with redirect_stdout(f):
help(pow)
Concise expressive one-liners
Two conflicting rules:
- Don’t put too much on one line
- Don’t break atoms of thought into subatomic particles
Raymond’s rule:
- One logical line of code equals one sentence in English.
List comprehensions and Generator expressions
result = []
for i in range(10):
s = i ** 2
result.append(s)
print(sum(result))
print(sum([i**2 for i in range(10)]))
Best
print(sum(i**2 for i in range(10)))
Reference
- https://github.com/JeffPaine/beautiful_idiomatic_python
- https://www.youtube.com/watch?feature=player_embedded&v=OSGv2VnC0go