Categories: pathlib
A common operation is changing the extension of a file. If you have an
existing Path
object, you don't need to convert it to a string, slice
it, and append a new extension. Instead, use the with_suffix()
method:
Bad:
new_filepath = str(Path("file.txt"))[:4] + ".md"
Good:
new_filepath = Path("file.txt").with_suffix(".md")
Categories: pathlib
When you just want to save the contents of a file to a variable, using a
with
block is a bit overkill. A simpler alternative is to use pathlib's
read_text()
function:
Bad:
with open(filename) as f:
contents = f.read()
Good:
contents = Path(filename).read_text()
Categories: string
startswith()
and endswith()
both take a tuple, so instead of calling
startswith()
multiple times on the same string, you can check them all
at once:
Bad:
name = "bob"
if name.startswith("b") or name.startswith("B"):
pass
Good:
name = "bob"
if name.startswith(("b", "B")):
pass
Categories: pathlib
When you just want to save some contents to a file, using a with
block is
a bit overkill. Instead you can use pathlib's write_text()
method:
Bad:
with open(filename, "w") as f:
f.write("hello world")
Good:
Path(filename).write_text("hello world")
Categories: pathlib
A modern alternative to os.getcwd()
is the Path.cwd()
method:
Bad:
cwd = os.getcwd()
Good:
cwd = Path.cwd()
Categories: builtin
readability
print("")
can be simplified to just print()
.
Categories: string
If you want to expand the tabs at the start of a string, don't use
.replace("\t", " " * 8)
, use .expandtabs()
instead. Note that this
only works if the tabs are at the start of the string, since expandtabs()
will expand each tab to the nearest tab column.
Bad:
spaces_8 = "\thello world".replace("\t", " " * 8)
spaces_4 = "\thello world".replace("\t", " ")
Good:
spaces_8 = "\thello world".expandtabs()
spaces_4 = "\thello world".expandtabs(4)
Categories: contextlib
readability
Often times you want to handle an exception and just ignore it. You can do
this with a try
/except
block with a single pass
in the except
block, but there is a simpler and more concise way using the suppress()
function from contextlib
:
Bad:
try:
f()
except FileNotFoundError:
pass
Good:
with suppress(FileNotFoundError):
f()
Note: suppress()
is slower than using try
/except
, so for performance
critical code you might consider ignoring this check.
Categories: logical
readability
When comparing a value to multiple possible options, don't or
multiple
comparison checks, use a single in
expr:
Bad:
if x == "abc" or x == "def":
pass
Good:
if x in ("abc", "def"):
pass
Note: This should not be used if the operands depend on boolean short circuiting, since the operands will be eagerly evaluated. This is primarily useful for comparing against a range of constant values.
Categories: iterable
readability
Since tuple, list, and set literals can be used with the in
operator, it
is best to pick one and stick with it.
Bad:
for x in (1, 2, 3):
pass
nums = [str(x) for x in [1, 2, 3]]
Good:
for x in (1, 2, 3):
pass
nums = [str(x) for x in (1, 2, 3)]
Categories: logical
readability
Sometimes the ternary operator (aka, inline if statements) can be
simplified to a single or
expression.
Bad:
z = x if x else y
Good:
z = x or y
Note: if x
depends on side-effects, then this check should be ignored.
Categories: performance
readability
Don't use a lambda if its only forwarding its arguments to a function.
Bad:
predicate = lambda x: bool(x)
some_func(lambda x, y: print(x, y))
Good:
predicate = bool
some_func(print)
In addition, don't use lambdas when you want a default value for a literal type:
Bad:
counter = defaultdict(lambda: 0)
multimap = defaultdict(lambda: [])
Good:
counter = defaultdict(int)
multimap = defaultdict(list)
Categories: pythonic
readability
Using list
and dict
without any arguments is slower, and not Pythonic.
Use []
and {}
instead:
Bad:
nums = list()
books = dict()
Good:
nums = []
books = {}
Categories: list
When appending multiple values to a list, you can use the .extend()
method to add an iterable to the end of an existing list. This way, you
don't have to call .append()
on every element:
Bad:
nums = [1, 2, 3]
nums.append(4)
nums.append(5)
nums.append(6)
Good:
nums = [1, 2, 3]
nums.extend((4, 5, 6))
Categories: builtin
readability
truthy
Double negatives are confusing, so use bool(x)
instead of not not x
.
Bad:
if not not value:
pass
Good:
if value:
pass
Categories: iterable
truthy
Don't check a container's length to determine if it is empty or not, use a truthiness check instead:
Bad:
name = "bob"
if len(name) == 0:
pass
nums = [1, 2, 3]
if len(nums) >= 1:
pass
Good:
name = "bob"
if not name:
pass
nums = [1, 2, 3]
if nums:
pass
Categories: builtin
fstring
The bin()
, oct()
, and hex()
functions return the string
representation of a number but with a prefix attached. If you don't want
the prefix, you might be tempted to just slice it off, but using an
f-string will give you more flexibility and let you work with negative
numbers:
Bad:
print(bin(1337)[2:])
Good:
print(f"{1337:b}")
Categories: pathlib
When you want to open a Path object, don't pass it to open()
, just call
.open()
on the Path object itself:
Bad:
path = Path("filename")
with open(path) as f:
pass
Good:
path = Path("filename")
with path.open() as f:
pass
Categories: operator
Don't write lambdas/functions to wrap builtin operators, use the operator
module instead:
Bad:
from functools import reduce
nums = [1, 2, 3]
print(reduce(lambda x, y: x + y, nums)) # 6
Good:
from functools import reduce
from operator import add
nums = [1, 2, 3]
print(reduce(add, nums)) # 6
In addition, the operator.itemgetter()
function can be used to get one or
more items from an object, removing the need to create a lambda just to
extract values from an object:
Bad:
row = (1, "Some text", True)
transform = lambda x: (x[2], x[0])
Good:
from operator import itemgetter
row = (1, "Some text", True)
transform = itemgetter(2, 0)
Categories: builtin
fstring
Certain expressions which are passed to f-strings are redundant because the f-string itself is capable of formatting it. For example:
Bad:
print(f"{bin(1337)}")
print(f"{ascii(input())}")
print(f"{str(123)}")
Good:
print(f"{1337:#b}")
print(f"{input()!a}")
print(f"{123}")
Categories:
Don't pass an argument if it is the same as the default value:
Bad:
def greet(name: str = "bob") -> None:
print(f"Hello {name}")
greet("bob")
{}.get("some key", None)
Good:
def greet(name: str = "bob") -> None:
print(f"Hello {name}")
greet()
{}.get("some key")
Categories: python310
readability
isinstance()
and issubclass()
both take tuple arguments, so instead of
calling them multiple times for the same object, you can check all of them
at once:
Bad:
if isinstance(num, float) or isinstance(num, int):
pass
Good:
if isinstance(num, (float, int)):
pass
Note: In Python 3.10+, you can also pass type unions as the second param to these functions:
if isinstance(num, float | int):
pass
Categories: builtin
readability
When you want to write a list of lines to a file, don't call .write()
for every line, use .writelines()
instead:
Bad:
lines = ["line 1\n", "line 2\n", "line 3\n"]
with open("file") as f:
for line in lines:
f.write(line)
Good:
lines = ["line 1\n", "line 2\n", "line 3\n"]
with open("file") as f:
f.writelines(lines)
Categories: readability
Don't cast a variable or literal if it is already of that type. This
usually is the result of not realizing a type is already the type you want,
or artifacts of some debugging code. One example of where this might be
intentional is when using container types like dict
or list
, which
will create a shallow copy. If that is the case, it might be preferable
to use .copy()
instead, since it makes it more explicit that a copy
is taking place.
Examples:
Bad:
name = str("bob")
num = int(123)
ages = {"bob": 123}
copy = dict(ages)
Good:
name = "bob"
num = 123
ages = {"bob": 123}
copy = ages.copy()
Categories: logical
readability
When checking that multiple objects are equal to each other, don't use
an and
expression. Use a comparison chain instead, for example:
Bad:
if x == y and x == z:
pass
# and
if x is None and y is None:
pass
Good:
if x == y == z:
pass
# and
if x is y is None:
pass
Note: if x
depends on side-effects, then this check should be ignored.
Categories: control-flow
readability
Don't explicitly return if you are already at the end of the control flow for the current function:
Bad:
def func():
print("hello world!")
return
def func2(x):
if x == 1:
print("x is 1")
else:
print("x is not 1")
return
Good:
def func():
print("hello world!")
def func2(x):
if x == 1:
print("x is 1")
else:
print("x is not 1")
Categories: control-flow
readability
Sometimes a return statement can be written more succinctly:
Bad:
def index_or_default(nums: list[Any], index: int, default: Any):
if index >= len(nums):
return default
else:
return nums[index]
def is_on_axis(position: tuple[int, int]) -> bool:
match position:
case (0, _) | (_, 0):
return True
case _:
return False
Good:
def index_or_default(nums: list[Any], index: int, default: Any):
if index >= len(nums):
return default
return nums[index]
def is_on_axis(position: tuple[int, int]) -> bool:
match position:
case (0, _) | (_, 0):
return True
return False
Categories: readability
scoping
Due to Python's scoping rules, you can use a variable that has gone "out
of scope" so long as all previous code paths can bind to it. Long story
short, you don't need to declare a variable before you assign it in a
with
statement:
Bad:
x = ""
with open("file.txt") as f:
x = f.read()
Good:
with open("file.txt") as f:
x = f.read()
Categories: readability
You don't need to use a temporary variable to swap 2 variables, you can use tuple unpacking instead:
Bad:
temp = x
x = y
y = temp
Good:
x, y = y, x
Categories: builtin
readability
When iterating over a file object line-by-line you don't need to add
.readlines()
, simply iterate over the object itself. This assumes you
aren't passing an argument to readlines().
Bad:
with open("file.txt") as f:
for line in f.readlines():
...
Good:
with open("file.txt") as f:
for line in f:
...
Categories: dict
readability
If you only want to check if a key exists in a dictionary, you don't need
to call .keys()
first, just use in
on the dictionary itself:
Bad:
d = {"key": "value"}
if "key" in d.keys():
...
Good:
d = {"key": "value"}
if "key" in d:
...
Categories: builtin
readability
Slice expressions can be used to replace part a list without reassigning
it. If you want to clear all the elements out of a list while maintaining
the same reference, don't use del x[:]
or x[:] = []
, use the faster
x.clear()
method instead.
Bad:
nums = [1, 2, 3]
del nums[:]
# or
nums[:] = []
Good:
nums = [1, 2, 3]
nums.clear()
Categories: readability
set
If you want to remove a value from a set regardless of whether it exists or
not, use the discard()
method instead of remove()
:
Bad:
nums = {123, 456}
if 123 in nums:
nums.remove(123)
Good:
nums = {123, 456}
nums.discard(123)
Categories: control-flow
readability
Don't explicitly continue if you are already at the end of the control flow for the current for/while loop:
Bad:
def func():
for _ in range(10):
print("hello world!")
continue
def func2(x):
for x in range(10):
if x == 1:
print("x is 1")
else:
print("x is not 1")
continue
Good:
def func():
for _ in range(10):
print("hello world!")
def func2(x):
for x in range(10):
if x == 1:
print("x is 1")
else:
print("x is not 1")
Categories: functools
python39
readability
Python 3.9 introduces the @cache
decorator which can be used as a
short-hand for @lru_cache(maxsize=None)
.
Bad:
from functools import lru_cache
@lru_cache(maxsize=None)
def f(x: int) -> int:
return x + 1
Good:
from functools import cache
@cache
def f(x: int) -> int:
return x + 1
Categories: dict
Don't use .items()
on a dict
if you only care about the keys or the
values, but not both:
Bad:
books = {"Frank Herbert": "Dune"}
for author, _ in books.items():
print(author)
for _, book in books.items():
print(book)
Good:
books = {"Frank Herbert": "Dune"}
for author in books:
print(author)
for book in books.values():
print(book)
Categories: builtin
logical
readability
Certain ternary expressions can be written more succinctly using the
builtin min
/max
functions:
Bad:
score1 = 90
score2 = 99
highest_score = score1 if score1 > score2 else score2
Good:
score1 = 90
score2 = 99
highest_score = max(score1, score2)
Categories: builtin
iterable
readability
Often times generator expressions and list/set/dict comprehensions can be
written more succinctly. For example, passing a list comprehension to a
function when a generator expression would suffice, or using the shorthand
notation in the case of list
and set
. For example:
Bad:
nums = [1, 1, 2, 3]
nums_times_10 = list(num * 10 for num in nums)
unique_squares = set(num ** 2 for num in nums)
number_tuple = tuple([num ** 2 for num in nums])
Good:
nums = [1, 1, 2, 3]
nums_times_10 = [num * 10 for num in nums]
unique_squares = {num ** 2 for num in nums}
number_tuple = tuple(num ** 2 for num in nums)
Categories: performance
readability
When constructing a new list it is usually more performant to use a list comprehension, and in some cases, it can be more readable.
Bad:
nums = [1, 2, 3, 4]
odds = []
for num in nums:
if num % 2:
odds.append(num)
Good:
nums = [1, 2, 3, 4]
odds = [num for num in nums if num % 2]
Categories: readability
If you want to define a multi-line string but don't want a leading/trailing
newline, use a continuation character ('') instead of calling lstrip()
,
rstrip()
, or strip()
.
Bad:
"""
This is some docstring
""".lstrip()
"""
This is another docstring
""".strip()
Good:
"""\
This is some docstring
"""
"""\
This is another docstring\
"""
Categories: itertools
performance
If you only want to iterate and unpack values so that you can pass them
to a function (in the same order and with no modifications), you should
use the more performant starmap
function:
Bad:
scores = [85, 100, 60]
passing_scores = [60, 80, 70]
def passed_test(score: int, passing_score: int) -> bool:
return score >= passing_score
passed_all_tests = all(
passed_test(score, passing_score)
for score, passing_score
in zip(scores, passing_scores)
)
Good:
from itertools import starmap
scores = [85, 100, 60]
passing_scores = [60, 80, 70]
def passed_test(score: int, passing_score: int) -> bool:
return score >= passing_score
passed_all_tests = all(starmap(passed_test, zip(scores, passing_scores)))
Categories: pathlib
When checking whether a file exists or not, try and use the more modern
pathlib
module instead of os.path
.
Bad:
import os
if os.path.exists("filename"):
pass
Good:
from pathlib import Path
if Path("filename").exists():
pass
Categories: builtin
When you want to add/remove a bunch of items to/from a set, don't use a for loop, call the appropriate method on the set itself.
Bad:
sentence = "hello world"
vowels = "aeiou"
letters = set(sentence)
for vowel in vowels:
letters.discard(vowel)
Good:
sentence = "hello world"
vowels = "aeiou"
letters = set(sentence)
letters.difference_update(vowels)
Categories: logical
readability
Don't check an expression to see if it is falsey then assign the same
falsey value to it. For example, if an expression used to be of type
int | None
, checking if the expression is falsey would make sense,
since it could be None
or 0
. But, if the expression is changed to
be of type int
, the falsey value is just 0
, so setting it to 0
if it is falsey (0
) is redundant.
Bad:
def is_markdown_header(line: str) -> bool:
return (line or "").startswith("#")
Good:
def is_markdown_header(line: str) -> bool:
return line.startswith("#")
Categories: pathlib
When removing a file, use the more modern Path.unlink()
method instead of
os.remove()
or os.unlink()
: The pathlib
module allows for more
flexibility when it comes to traversing folders, building file paths, and
accessing/modifying files.
Bad:
import os
os.remove("filename")
Good:
from pathlib import Path
Path("filename").unlink()
Categories: readability
Don't use a slice expression (with no bounds) to make a copy of something,
use the more readable .copy()
method instead:
Bad:
nums = [3.1415, 1234]
copy = nums[:]
Good:
nums = [3.1415, 1234]
copy = nums.copy()
Categories: pathlib
Don't use the os.path.isfile
(or similar) functions, use the more modern
pathlib
module instead:
Bad:
if os.path.isfile("file.txt"):
pass
Good:
if Path("file.txt").is_file():
pass
Categories: pathlib
When joining strings to make a filepath, use the more modern and flexible
Path()
object instead of os.path.join
:
Bad:
with open(os.path.join("folder", "file"), "w") as f:
f.write("hello world!")
Good:
from pathlib import Path
with open(Path("folder", "file"), "w") as f:
f.write("hello world!")
# even better ...
with Path("folder", "file").open("w") as f:
f.write("hello world!")
# even better ...
Path("folder", "file").write_text("hello world!")
Note that this check is disabled by default because Path()
returns a Path
object, not a string, meaning that the Path object will propagate through
your code. This might be what you want, and might encourage you to use the
pathlib module in more places, but since it is not a drop-in replacement it
is disabled by default.
Categories: builtin
Don't use enumerate
if you are disregarding either the index or the
value:
Bad:
books = ["Ender's Game", "The Black Swan"]
for index, _ in enumerate(books):
print(index)
for _, book in enumerate(books):
print(book)
Good:
books = ["Ender's Game", "The Black Swan"]
for index in range(len(books)):
print(index)
for book in books:
print(book)
Categories: logical
readability
truthy
Don't use is
or ==
to check if a boolean is True or False, simply
use the name itself:
Bad:
failed = True
if failed is True:
print("You failed")
Good:
failed = True
if failed:
print("You failed")
Categories: pathlib
Use the mkdir
method from the pathlib library instead of using the
mkdir
and makedirs
functions from the os
library: the pathlib library
is more modern and provides better flexibility over the construction and
manipulation of file paths.
Bad:
import os
os.mkdir("new_folder")
Good:
from pathlib import Path
Path("new_folder").mkdir()
Categories: pathlib
Don't use open(x, "w").close()
if you just want to create an empty file,
use the less confusing Path.touch()
method instead.
Bad:
open("file.txt", "w").close()
Good:
from pathlib import Path
Path("file.txt").touch()
This check is disabled by default because touch()
will throw a
FileExistsError
if the file already exists, and (at least on Linux) it
sets different file permissions, meaning it is not a drop-in replacement.
If you don't care about the file permissions or know that the file doesn't
exist beforehand this check may be for you.
Categories: math
readability
Don't hardcode math constants like pi, tau, or e, use the math.pi
,
math.tau
, or math.e
constants respectively.
Bad:
def area(r: float) -> float:
return 3.1415 * r * r
Good:
import math
def area(r: float) -> float:
return math.pi * r * r
Categories: pathlib
readability
The Path() constructor defaults to the current directory, so don't pass the current directory explicitly.
Bad:
file = Path(".")
Good:
file = Path()
Note: Lots of different values can trigger this check, including "."
,
""
, os.curdir
, and os.path.curdir
.
Categories: builtin
readability
The global
and nonlocal
keywords can take multiple comma-separated
names, removing the need for multiple lines.
Bad:
def some_func():
global x
global y
print(x, y)
Good:
def some_func():
global x, y
print(x, y)
Categories: pathlib
Don't use the os.path.getsize
(or similar) functions, use the more modern
pathlib
module instead:
Bad:
if os.path.getsize("file.txt"):
pass
Good:
if Path("file.txt").stat().st_size:
pass
Categories: readability
string
Python includes some pre-defined charsets such as digits (0-9), upper and lower case alpha characters, and so on. You don't have to define them yourself, and they are usually more readable.
Bad:
digits = "0123456789"
if c in digits:
pass
if c in "0123456789abcdefABCDEF":
pass
Good:
if c in string.digits:
pass
if c in string.hexdigits:
pass
Note that when using a literal string, the corresponding string.xyz
value
must be exact, but when used in an in
comparison, the characters can be
out of order since in
will compare every character in the string.
Categories: decimal
Under certain circumstances the Decimal()
constructor can be made more
succinct.
Bad:
if x == Decimal("0"):
pass
if y == Decimal(float("Infinity")):
pass
Good:
if x == Decimal(0):
pass
if y == Decimal("Infinity"):
pass
Categories: pattern-matching
readability
When pattern matching builtin classes such as int()
and str()
, don't
use an as
pattern to bind to the value, since the most common builtin
classes can use positional patterns instead.
Bad:
match x:
case str() as name:
print(f"Hello {name}")
Good:
match x:
case str(name):
print(f"Hello {name}")
Categories: readability
string
In some situations the .lstrip()
, .rstrip()
and .strip()
string
methods can be written more succinctly: strip()
is the same thing as
calling both lstrip()
and rstrip()
together, and all the strip
functions take an iterable argument of the characters to strip, meaning
you don't need to call strip methods multiple times with different
arguments, you can just concatenate them and call it once.
Bad:
name = input().lstrip().rstrip()
num = " -123".lstrip(" ").lstrip("-")
Good:
name = input().strip()
num = " -123".lstrip(" -")
Categories: readability
Sometimes when you are debugging (or copy-pasting code) you will end up with a variable that is assigning itself to itself. These lines can be removed.
Bad:
name = input("What is your name? ")
name = name
Good:
name = input("What is your name? ")
Categories: builtin
performance
python310
readability
Python 3.10 adds a very helpful bit_count()
function for integers which
counts the number of set bits. This new function is more descriptive and
faster compared to converting/counting characters in a string.
Bad:
x = bin(0b1010).count("1")
assert x == 2
Good:
x = 0b1010.bit_count()
assert x == 2
Categories: datetime
python311
readability
Python 3.11 adds support for parsing UTC timestamps that end with Z
, thus
removing the need to strip and append the +00:00
timezone.
Bad:
date = "2023-02-21T02:23:15Z"
start_date = datetime.fromisoformat(date.replace("Z", "+00:00"))
Good:
date = "2023-02-21T02:23:15Z"
start_date = datetime.fromisoformat(date)
Categories: math
readability
Use the shorthand log2
and log10
functions instead of passing 2 or 10
as the second argument to the log
function. If math.e
is used as the
second argument, just use math.log(x)
instead, since e
is the default.
Bad:
power = math.log(x, 10)
Good:
power = math.log10(x)
Categories: decimal
fractions
readability
When constructing a Fraction or Decimal using a float, don't use the
from_float()
or from_decimal()
class methods: Just use the more concise
Fraction()
and Decimal()
class constructors instead.
Bad:
ratio = Fraction.from_float(1.2)
score = Decimal.from_float(98.0)
Good:
ratio = Fraction(1.2)
score = Decimal(98.0)
Categories: readability
You don't need to construct a class object to call a static method or a class method, just invoke the method on the class directly:
Bad:
cwd = Path().cwd()
Good:
cwd = Path.cwd()
Categories: builtin
readability
When converting a string starting with 0b
, 0o
, or 0x
to an int, you
don't need to slice the string and set the base yourself: just call int()
with a base of zero. Doing this will autodeduce the correct base to use
based on the string prefix.
Bad:
num = "0xABC"
if num.startswith("0b"):
i = int(num[2:], 2)
elif num.startswith("0o"):
i = int(num[2:], 8)
elif num.startswith("0x"):
i = int(num[2:], 16)
print(i)
Good:
num = "0xABC"
i = int(num, 0)
print(i)
This check is disabled by default because there is no way for Refurb to
detect whether the prefixes that are being stripped are valid Python int
prefixes (like 0x
) or some other prefix which would fail if parsed using
this method.
Categories: readability
regex
Regex operations can be changed using flags such as re.I
, which will make
the regex case-insensitive. These single-character flag names can be harder
to read/remember, and should be replaced with the longer aliases so that
they are more descriptive.
Bad:
if re.match("^hello", "hello world", re.I):
pass
Good:
if re.match("^hello", "hello world", re.IGNORECASE):
pass
Categories: pythonic
readability
Checking if an object is None
using isinstance()
is un-pythonic: use an
is
comparison instead.
Bad:
x = 123
if isinstance(x, type(None)):
pass
Good:
x = 123
if x is None:
pass
Categories: pythonic
readability
Don't use type(None)
to check if the type of an object is None
, use an
is
comparison instead.
Bad:
x = 123
if type(x) is type(None):
pass
Good:
x = 123
if x is None:
pass
Categories: performance
readability
regex
If you are passing a compiled regular expression to a regex function, consider calling the regex method on the pattern itself: It is faster, and can improve readability.
Bad:
import re
COMMENT = re.compile(".*(#.*)")
found_comment = re.match(COMMENT, "this is a # comment")
Good:
import re
COMMENT = re.compile(".*(#.*)")
found_comment = COMMENT.match("this is a # comment")
Categories: iterable
readability
Don't use in
to check against a single value, use ==
instead:
Bad:
if name in ("bob",):
pass
Good:
if name == "bob":
pass
Categories: pathlib
When checking the file extension for a Path object don't call
endswith()
on the name
field, directly check against suffix
instead.
Bad:
from pathlib import Path
def is_markdown_file(file: Path) -> bool:
return file.name.endswith(".md")
Good:
from pathlib import Path
def is_markdown_file(file: Path) -> bool:
return file.suffix == ".md"
Note: The suffix
field will only contain the last file extension, so
don't use suffix
if you are checking for an extension like .tar.gz
.
Refurb won't warn in those cases, but it is good to remember in case you
plan to use this in other places.
Categories: dict
readability
Dicts can be created/combined in many ways, one of which is the **
operator (inside the dict), and another is the |
operator (used outside
the dict). While they both have valid uses, the |
operator allows for
more flexibility, including using |=
to update an existing dict.
See PEP 584 for more info.
Bad:
def add_defaults(settings: dict[str, str]) -> dict[str, str]:
return {"color": "1", **settings}
Good:
def add_defaults(settings: dict[str, str]) -> dict[str, str]:
return {"color": "1"} | settings
Categories: readability
secrets
Depending on how you are using the secrets
module, there might be more
expressive ways of writing what it is you're trying to write.
Bad:
random_hex = token_bytes().hex()
random_url = token_urlsafe()[:16]
Good:
random_hex = token_hex()
random_url = token_urlsafe(16)
Categories: fastapi
readability
FastAPI will automatically pass along query parameters to your function, so
you only need to use Query()
when you use params other than default
.
Bad:
@app.get("/")
def index(name: str = Query()) -> str:
return f"Your name is {name}"
Good:
@app.get("/")
def index(name: str) -> str:
return f"Your name is {name}"
Categories: datetime
Because naive datetime
objects are treated by many datetime
methods
as local times, it is preferred to use aware datetimes to represent times
in UTC.
This check affects datetime.utcnow
and datetime.utcfromtimestamp
.
Bad:
from datetime import datetime
now = datetime.utcnow()
past_date = datetime.utcfromtimestamp(some_timestamp)
Good:
from datetime import datetime, timezone
datetime.now(timezone.utc)
datetime.fromtimestamp(some_timestamp, tz=timezone.utc)
Categories: pathlib
If you want to get the current working directory don't call resolve()
on
an empty Path()
object, use Path.cwd()
instead.
Bad:
cwd = Path().resolve()
Good:
cwd = Path.cwd()
Categories: readability
shlex
When using shlex
to escape and join a bunch of strings consider using the
shlex.join
method instead.
Bad:
args = ["hello", "world!"]
cmd = " ".join(shlex.quote(arg) for arg in args)
Good:
args = ["hello", "world!"]
cmd = shlex.join(args)
Categories: itertools
performance
readability
When flattening a list of lists, use the chain.from_iterable()
function
from the itertools
stdlib package. This function is faster than native
list/generator comprehensions or using sum()
with a list default.
Bad:
from itertools import chain
rows = [[1, 2], [3, 4]]
# using list comprehension
flat = [col for row in rows for col in row]
# using sum()
flat = sum(rows, [])
# using chain(*x)
flat = chain(*rows)
Good:
from itertools import chain
rows = [[1, 2], [3, 4]]
flat = chain.from_iterable(rows)
Note: chain.from_iterable()
returns an iterator, which means you might
need to wrap it in list()
depending on your use case. Refurb cannot
detect this (yet), so this is something you will need to keep in mind.
Note: chain(*x)
may be marginally faster/slower depending on the length
of x
. Since *
might potentially expand to a lot of arguments, it is
better to use chain.from_iterable()
when you are unsure.
Categories: abc
readability
Instead of setting metaclass
directly, inherit from the ABC
wrapper
class. This is semantically the same thing, but more succinct.
Bad:
class C(metaclass=ABCMeta):
pass
Good:
class C(ABC):
pass
Categories: hashlib
readability
Use .hexdigest()
to get a hex digest from a hash.
Bad:
from hashlib import sha512
hashed = sha512(b"some data").digest().hex()
Good:
from hashlib import sha512
hashed = sha512(b"some data").hexdigest()
Categories: hashlib
readability
You can pass data into hashlib
constructors, so instead of creating a
hash object and immediately updating it, pass the data directly.
Bad:
from hashlib import sha512
h = sha512()
h.update(b"data")
Good:
from hashlib import sha512
h = sha512(b"data")
Categories: readability
If you want to stringify a single value without concatenating anything, use
the str()
function instead.
Bad:
nums = [123, 456]
num = f"{num[0]}"
Good:
nums = [123, 456]
num = str(num[0])
Categories: readability
When an API has a Fluent Interface (the ability to chain multiple calls together), you should chain those calls instead of repeatedly assigning and using the value. Sometimes a return statement can be written more succinctly:
Bad:
def get_tensors(device: str) -> torch.Tensor:
t1 = torch.ones(2, 1)
t2 = t1.long()
t3 = t2.to(device)
return t3
def process(file_name: str):
common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
df = spark.read.parquet(file_name)
df = df \
.withColumnRenamed('col1', 'col1_renamed') \
.withColumnRenamed('col2', 'col2_renamed')
df = df \
.select(common_columns) \
.withColumn('service_type', F.lit('green'))
return df
Good:
def get_tensors(device: str) -> torch.Tensor:
t3 = (
torch.ones(2, 1)
.long()
.to(device)
)
return t3
def process(file_name: str):
common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
df = (
spark.read.parquet(file_name)
.withColumnRenamed('col1', 'col1_renamed')
.withColumnRenamed('col2', 'col2_renamed')
.select(common_columns)
.withColumn('service_type', F.lit('green'))
)
return df
Categories: readability
You don't need to call .copy()
on a dict/set when using it in a union
since the original dict/set is not modified.
Bad:
d = {"a": 1}
merged = d.copy() | {"b": 2}
Good:
d = {"a": 1}
merged = d | {"b": 2}
Categories: performance
readability
Don't use sorted()
to sort a list and reassign it to itself, use the
faster in-place .sort()
method instead.
Bad:
names = ["Bob", "Alice", "Charlie"]
names = sorted(names)
Good:
names = ["Bob", "Alice", "Charlie"]
names.sort()
Categories: performance
readability
Don't use x[::-1]
or reversed(x)
to reverse a list and reassign it to
itself, use the faster in-place .reverse()
method instead.
Bad:
names = ["Bob", "Alice", "Charlie"]
names = reversed(names)
# or
names = list(reversed(names))
# or
names = names[::-1]
Good:
names = ["Bob", "Alice", "Charlie"]
names.reverse()
Categories: performance
readability
string
Don't explicitly check a string prefix/suffix if you're only going to
remove it, use .removeprefix()
or .removesuffix()
instead.
Bad:
def strip_txt_extension(filename: str) -> str:
return filename[:-4] if filename.endswith(".txt") else filename
Good:
def strip_txt_extension(filename: str) -> str:
return filename.removesuffix(".txt")
Categories: collections
Subclassing dict
, list
, or str
objects can be error prone, use the
UserDict
, UserList
, and UserStr
objects from the collections
module
instead.
Bad:
class CaseInsensitiveDict(dict):
...
Good:
from collections import UserDict
class CaseInsensitiveDict(UserDict):
...
Note: isinstance()
checks for dict
, list
, and str
types will fail
when using the corresponding User class. If you need to pass custom dict
or list
objects to code you don't control, ignore this check. If you do
control the code, consider using the following type checks instead:
dict
->collections.abc.MutableMapping
list
->collections.abc.MutableSequence
str
-> No such conversion exists
Categories: performance
readability
Don't use a lambda function to call a no-arg method on a string, use the name of the string method directly. It is faster, and often times improves readability.
Bad:
def normalize_phone_number(phone_number: str) -> int:
digits = filter(lambda x: x.isdigit(), phone_number)
return int("".join(digits))
Good:
def normalize_phone_number(phone_number: str) -> int:
digits = filter(str.isdigit, phone_number)
return int("".join(digits))
Categories: readability
Don't check if a value is True
or False
using in
, use an
isinstance()
call.
Bad:
if value in {True, False}:
pass
Good:
if isinstance(value, bool):
pass
Categories: builtin
performance
readability
Don't use sorted()
to get the min/max value out of an iterable element,
use min()
or max()
.
Bad:
nums = [3, 1, 4, 1, 5]
lowest = sorted(nums)[0]
highest = sorted(nums)[-1]
Good:
nums = [3, 1, 4, 1, 5]
lowest = min(nums)
highest = max(nums)