The Data in Data Class
Python has well-known data holders. If you need a sequence, you can use a list. If you need an immutable sequence, use a tuple. If you need a homogeneous array, use the array module. If you need a hash table, a dictionary is likely what you want.
However, sometimes we need more.
More control, more flexibility, more objects that perfectly fit into the domains of problems we are to solve.
Named Tuples for the Win (or Almost)
Named tuples offer a friendly and clean way to define data holders that behave like objects from a class we defined.
I'm going to use a
Payment instance to illustrate the next examples.
from collections import namedtuple Payment = namedtuple("Payment", ["id_", "amount", "method"]) payment = Payment(1, amount=123, method="CC") print(payment) >>> Payment(id_=1, amount=123, method='CC')
If you like typing, you can also use the typed version of the named tuple:
from typing import NamedTuple class Payment(NamedTuple): id_: int amount: int method: str payment = Payment(id_=2, amount=1234, method="ACH") >>> print(payment) Payment(id_=2, value=123, method='CC') >>> payment.id_, payment.amount, payment.method (2, 1234, 'ACH')
That seems to solve the data holder quest quickly, right? Simple API, available in the standard library, no class boilerplate.
However, it is essential to remember that Payment, our NamedTuple, is still a tuple behind the scenes.
And why is that important? Because every instance of Payment, for good or bad, will behave just like a tuple.
Named Tuples Are Still Tuples
You can call
len() on your named tuple instance:
>>> len(payment) 3
You can unpack it:
>>> id_, amount, method = payment >>> id_, amount, method (2, 1234, 'ACH')
Named tuples are immutable:
>>> payment.amount = 345 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: can't set attribute
This one can be very counterintuitive: -If you compare the instance created by the
Payment named tuple with a tuple having the same values, the comparison will be true:
>>> payment = Payment(id_=2, amount=1234, method="ACH") >>> payment == (2, 1234, "ACH") True
Data Classes as Data Holders
If named tuples are almost what you want, but you are still missing some flexibility, data classes might be a good fit.
We can smoothly go from a typed named tuple to a data class. Remove the
namedtuple inheritance and decorate the class with
from dataclasses import dataclass @dataclass class Payment: id_: int amount: int method: str
The Payment instance will work almost like the namedtuple for what a data holder is concerned:
>>> payment Payment(id_=2, amount=1234, method='ACH') >>> payment.id_, payment.amount, payment.method (2, 1234, 'ACH')
However, the instance is now mutable, has no
len(), and can no longer compare with a tuple:
>>> assert payment == (2, 1234, "ACH") Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError
Field by Field Implementation
If we aim for more control and flexibility, the default fields in a data class may not be enough. That is why each field can be implemented in a more granular way using
In the example below, I'm defining quite a few setup options in a few lines:
id_is an auto-generated UUID field that won't be used for comparison with another Payment instance and won't display its value in the object's
Decimalfield initializing as Zero Decimal if no value is passed.
CCon every object.
from dataclasses import dataclass, field from decimal import Decimal from uuid import uuid4 @dataclass(frozen=True) class Payment: id_: int = field(repr=False, compare=False, default_factory=uuid4) amount: Decimal = field(default_factory=Decimal) method: str = field(default='CC') payment = Payment() >>> payment Payment(amount=Decimal('0'), method='CC') >>> payment.id_ UUID('05d37e88-d069-484a-b454-42c2ca3594fc')
Why Not a Regular Class?
Still, this looks like something we could do with a regular class, right? Yes, we could. And data classes are indeed regular classes.
What makes them shine, besides its configurability, is its code-saving approach.
The Class in Data Classes
Data classes save considerable amounts of boilerplate code. They ship with pre-defined dunder methods that we would likely have to write ourselves.
Data classes have built-in string representation for friendly printing, so you don't need to write
payment = Payment(id_=2, amount=1234, method="ACH") # Non-dataclass default printing: <__main__.Payment object at 0x10ae98d00> # Dataclasses buil-int string representation Payment(id_=2, amount=1234, method='ACH')
It is possible to initiate a data class with default values:
@dataclass class Payment: id_: int amount: int method: str = 'CC'
__post_init__() method allows injecting initialization logic in data classes. Useful for things like validation.
Comparable by Default
Data classes implement
__eq__() by default, so the comparison considers each value in the object like a tuple comparison:
payment_1 = Payment(id_=1, amount=1234, method="ACH") payment_2 = Payment(id_=1, amount=1234, method="ACH") >>> payment_1 == payment_2 >>> True
Making it sortable
Undoubtedly one of my favorite capabilities in data classes. So much code saved!
@dataclass(order=True) class Payment: id_: int amount: int method: str >>> payment_1 = Payment(id_=1, amount=1234, method="ACH") >>> payment_2 = Payment(id_=1, amount=3456, method="ACH") >>> payment_2 > payment_1 True >>> payment_1 > payment_2 False
Making it Immutable (or almost)
Python doesn't allow for genuinely immutable objects. However, you can create a frozen data class that emulates immutability pretty well.
@dataclass(frozen=True) class Payment: id_: int amount: int method: str payment = Payment(id_=1, amount=1234, method="ACH") >>> payment.amount = 2345 """ Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in __setattr__ dataclasses.FrozenInstanceError: cannot assign to field 'amount' """ >>> del payment.method """ Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 4, in __delattr__ dataclasses.FrozenInstanceError: cannot delete field 'method' """
Can It Be Hashable?
The short answer is most likely. However, this topic can get quite convoluted, so I'll defer it to a separate article.
Did you find this article helpful? What are some of your use cases for data classes?