Houston, TX shay_public@hotmail
.com
Python Struct Options

Python Struct Options

Or: Do we still need namedtuple? (spoiler: yes)

Python has several composite types to hold, pass, and iterate over multiple values, and Python functions can take and return multiple arguments.

But there’s still room for something like a struct, a composite type to hold multiple values with a name for each.

Here, I’m comparing some of the better Python “struct” options in their type-hinted forms.

# ----------------------------------------------------------- dictionary
from typing import TypedDict # This only works as of Python 3.8

class ScoreGameDict(TypedDict):
    score: float
    game: str

my_dictionary: ScoreGameDict = {"score": 3, "game": "piquet"}

# ----------------------------------------------------------- namedtuple
from typing import NamedTuple

ScoreGameNT = NamedTuple("ScoreGameNT", [("score", float), ("game", str)])

my_namedtuple = ScoreGameNT(3, "piquet")

# ---------------------------------------------------- object with slots
class ScoreGameSlots:
    __slots__ = ["score", "game"]
    def __init__(self, score: float, game: str) -> None:
    self.score = score
    self.game = game

my_slots = ScoreGameSlots(3, "piquet")

# ------------------------------------------------------------ dataclass
from dataclasses import dataclass

@dataclass
class ScoreGameDC:
    score: float
    game: str

my_dataclass = ScoreGameDC(3, "piquet")

On to the comparisons.

mutability

  • Dictionary is mutable (and extendable)
  • Namedtuple is immutable
  • Object with slots is mutable (but not extendable)
  • Dataclass can be mutable (and extendable) OR immutable (and not extendable)

If you plan to use your struct as an aggregator of data, a slotted object will both allow extension and require (a good thing) you to “register” new keys as you come up with new things to aggregate. You can still make a mess, but it won’t be a hidden mess.

unpacking

Straightforward with namedtuple.

a, b = my_namedtuple

The linter knowns how many values you have to unpack from your namedtuple.

a, b, c = my_namedtuple # ValueError: not enough values to unpack (expected 3, got 2)

With Python 3.7+, you can also unpack a dictionary, as order is guaranteed. However, the linter will not warn you if you try to unpack too many or too few values.

a, b = my_dict.values()
a, b = my_dataclass.__dict__.values()

This isn’t as nice with a slotted object. You’ll have to explicitly iterate through slots, and you still won’t have any help from the linter.

a, b = (getattr(my_slots, x) for x in my_slots.__slots__)

In 3.7+, unpack any user-defined class attributes with vars (except slots). There are pitfalls here. Vars returns a dictionary. The order of items will be consistent, but may not match your intuition, especially if you’ve added attributes in post_init. And, again, no help from the linter.

a, b = vars(my_dataclass).values()
struct options: unpacking speed comparison

exploding

Exploding each is similar to unpacking.

function_with_kwargs(**my_namedtuple._asdict())
function_with_kwargs(**my_dictionary)
function_with_kwargs(**{x: getattr(my_slots, x) for x in my_slots.__slots__})
function_with_kwargs(**vars(my_dataclass))

Dictionary has an extra trick here. I won’t argue for it, but you *can *sneak illegal attribute names in as kwargs with dictionary exploding. This allows you to pass around e.g. xml or css identifiers.

from xml.etree import ElementTree

def new_line_element(**attributes: str) -> ElementTree.Element:
    element = ElementTree.Element("line")
    for attribute, value in attributes.items():
    element.set(attribute, value)
    return element

# pass in illegal Python identifier "stroke-linecap"
new_line_element(x1="0", x2="1", **{"stroke-linecap": "round"})

type hinting

As of Python 3.8, you can now type hint dictionaries per key. This makes all options equivalent in this respect. With older versions of Python, you will not get type hints for dictionaries beyond a union of expected value types.

Dict[str, Union[float, str]]

PyCharm will offer hints when creating namedtuples and the user-defined object types. (And dictionaries as of Python 3.8.)

my_namedtuple = ScoreGameNT( # ... you'd see hint "score: float, game: str"

autocomplete / linting

This is a short section, but it may be the most important section in the article.

The linter will let you get away with mis-typed dictionary keys.

a = my_dict["scoer"]  # this looks fine till you run it

The linter would pick this up for the other “struct” types.

a = my_namedtuple.scoer  # here the linter would start barking
a = my_slots.scoer  # here the linter would start barking
a = my_dataclass.scoer  # here the linter would start barking

With the extendable types (dictionary and mutable dataclass), you are vulnerable to mis-typing assignments.

my_dict['scoer'] = 4
my_dataclass.scoer = 4

But at least with dataclass, you’ll have autocomplete to help you out.

refactoring

You cannot refactor dictionary keys. Your strings might feel like identifiers, but they are not; they’re just hashable object instances that happen to be human readable.

attribute access

Not much difference here. Namedtuple is a little slower because it’s a tuple underneath. The others are dictionaries underneath. If you had 500 attributes per struct, it would make more of a difference, but I don’t envision anyone’s explicitly naming 500 attributes.

These speed differences are fractions of a second over 1,000,000 iterations.

Best of the Python Struct Options?

Named Tuple: Best when you’ll be unpacking and exploding more than creating and accessing, and when your data doesn’t have a more natural mapping. Definitely best when you’re using a named tuple mostly for documentation / standardization purposes and would allow callers to substitute a standard tuple most of the time.

Dictionary: Best when both the values and keys are part of your data (e.g., mapping rows to column headers, employees to positions, etc.).

Slots: Should use less memory than a non-slotted dataclass. That’s its real purpose. As a data container, a slotted object is arguably best when you want to add values over time (safe assignment without __setattr__ hacks and no default values or explicit exclusions required in init).

Dataclass: Best (and fastest) for creating and accessing named data fields with linting, autocomplete, and refactoring. I don’t cover it here, but dataclass has a lot of flexibility with the dataclasses.field object.