Learner's Notes: A Descriptive Dive into Python Descriptors
Quick Announcement:
I hate to break it to you, but updates might be a bit slower around here. Blame work!
Introduction
As Davide Mastromatteo explains in his article on Descriptors, “Descriptors are a specific Python feature that power a lot of the magic hidden under the language’s hood.” Before diving into the details, let’s start with a small, intriguing example:
1
2
3
4
5
6
7
class C:
attr: int = 123
def func(self):
print("I am a method")
c = C()
print(C.func, c.func, C.attr, c.attr, sep = '\n')
Running this code gives the following output:
<function C.func at 0x100e71580>
<bound method C.func of <__main__.C object at 0x100eea570>>
123
123
Notice something interesting? While C.func
and c.func
return different objects, C.attr
and c.attr
behave the same way. Why the difference? This is where descriptors come into play, shedding light on the underlying mechanism.
What is a descriptor?
A descriptor is any object which defines the methods __get__()
, __set__()
, or __delete__()
. More specifically, here are their method signatures:
1
2
3
obj.__get__(self, instance, owner=None)
obj.__set__(self, instance, value)
obj.__delete__(self, instance)
What makes descriptors special is when they are assigned as class attributes. Let’s take a look at what makes them unique:
1
2
3
4
5
6
7
8
9
10
11
12
13
class ExampleDescriptor:
def __get__(self, instance, owner=None):
return f"Calling ExampleDescriptor's __get__({self}, {instance}, {owner})"
def __set__(self, instance, value):
print(f"Calling ExampleDescriptor's __set__({self}, {instance}, {value})")
class ExampleClass:
cnt = 1
ed = ExampleDescriptor()
example_obj = ExampleClass()
print(example_obj.cnt, example_obj.ed, sep='\n')
Notice in our output, when we access ed
, instead of getting the ExampleDescriptor
instance, it actually calls the __get__()
method.
1
Calling ExampleDescriptor's __get__(<__main__.ExampleDescriptor object at 0x105116ba0>, <__main__.ExampleClass object at 0x105116cc0>, <class '__main__.ExampleClass'>)
Indeed, as the astute among you might have guessed, assigning values to ed
also calls the __set__()
method. Now that we know roughly what a descriptor is and how it behaves, lets get into more details about the methods.
__get__() and __set__()
We won’t dive into __delete__()
since it’s not as commonly used as the other two methods. Instead, let’s focus on __get__()
, which plays a crucial role in how descriptors work. Here’s how Python’s official documentation describes it:
Called to get the attribute of the owner class (class attribute access) or of an instance of that class (instance attribute access). The optional owner argument is the owner class, while instance is the instance that the attribute was accessed through, or None when the attribute is accessed through the owner.
If that feels like a lot to digest, let’s break it down with an example from earlier:
1
2
3
4
5
6
7
8
9
10
11
12
13
class ExampleDescriptor:
def __get__(self, instance, owner=None):
return f"Calling ExampleDescriptor's __get__({self}, {instance}, {owner})"
def __set__(self, instance, value):
print(f"Calling ExampleDescriptor's __set__({self}, {instance}, {value})")
class ExampleClass:
cnt = 1
ed = ExampleDescriptor()
example_obj = ExampleClass()
print(example_obj.cnt, example_obj.ed, sep='\n')
Here’s how it works:
self
refers to the descriptor instance itself, which is assigned to the class attributeed
.instance
is the instance of the owner class, i.e.example_obj
.owner
is the owner class, i.e. ExampleClass.
All of this is clearly visible in the output. When you access example_obj.ed
, it triggers the __get__()
method, and you can see the values passed for self, instance, and owner.
Let’s contrast this with __set__()
.
Called to set the attribute on an instance
instance
of the owner class to a new value,value
.
If we run the following line:
1
example_obj.ed = 100
We get the following output:
Calling ExampleDescriptor's __set__(<__main__.ExampleDescriptor object at 0x102d6ac30>, <__main__.ExampleClass object at 0x102d6ad50>, 100)
Here, the new parameter is value
, which holds the new value being assigned.
It’s also worth noting that you can call __get__()
by accessing the descriptor through the class itself. In that case, running ExampleClass.ed would yield:
Calling ExampleDescriptor's __get__(<__main__.ExampleDescriptor object at 0x100b6eb40>, None, <class '__main__.ExampleClass'>)
The key difference here is that instance is None because we’re accessing the descriptor directly from the class, not through an instance. It is usual practice to return the descriptor instance when this is the case. Importantly, when assigning a value to the attribute that holds the descriptor directly through the class (instead of through an instance), the __set__()
method won’t be triggered. As a result, there aren’t any scenarios where instance is None
during a __set__()
call (unless explicitly done by the user).
While there’s still more to unpack regarding __get__()
and __set__()
, we’ll save that for later discussions.
Are Methods Descriptors?
Looping back to our initial example, let’s dig into whether there’s anything special about func
.
1
2
3
4
5
6
class C:
def func(self):
print("I am a method")
c = C()
print(C.func.__get__(c) == c.func)
Surprisingly, they are equal! This shows that C.func
is indeed a descriptor, and by calling __get__()
manually, we can achieve the same result as simply accessing c.func
.
The following examples are inspired by the Descriptor Guide from the Python documentation:
It turns out that methods can be created manually using types.MethodType
, which is roughly equivalent to this implementation:
1
2
3
4
5
6
7
8
9
10
11
class MethodType:
"Emulate PyMethod_Type in Objects/classobject.c"
def __init__(self, func, obj):
self.__func__ = func
self.__self__ = obj
def __call__(self, *args, **kwargs):
func = self.__func__
obj = self.__self__
return func(obj, *args, **kwargs)
To support the automatic creation of methods, functions include the __get__()
method, which handles binding methods during attribute access. Here’s how it works under the hood:
1
2
3
4
5
6
7
8
class Function:
...
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return MethodType(self, obj)
With this, we now know how methods work. Function objects are descriptors that return a MethodType
when accessed via an instance (obj
is not None). This MethodType
is callable and automatically inserts the instance (obj
) as the first argument, followed by the remaining arguments. Essentially, if f
is the Function instance that creates the MethodType
, then calling it results in f(obj, *args, **kwargs)
.
Armed with this knowledge, we can even do something a little unconventional:
1
2
3
4
5
6
7
8
def f(self):
print(f"This is not a method, but I still printed {self}")
class A:
def __repr__(self):
return "object of type A"
a = A()
f.__get__(a)()
In this case, we’re manually using __get__()
to bind a non-method function f to the instance a, and it behaves just like a method!
When a method is defined in class, it is stored as a function which are non-data descriptors that return bound methods when accessed through an instance.
Data Descriptors and Non-Data Descriptors
Back to __get__()
and __set__()
. If an object defines __set__()
or __delete__()
, it is considered a data descriptor. Descriptors that only define __get__()
are called non-data descriptors. But why make this distinction?
Let’s take a look at the following example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class DataDescriptor:
def __get__(self, inst, owner = None):
return "Calling the __get__() of DataDescriptor"
def __set__(self, inst, val):
print("Calling the __set__() of DataDescriptor")
class NonDataDescriptor:
def __get__(self, inst, owner = None):
return "Calling the __get__() of NonDataDescriptor"
class Test:
dd = DataDescriptor()
ndd = NonDataDescriptor()
t = Test()
print(vars(t))
print(t.dd, t.ndd, sep=' | ')
t.__dict__['dd'] = t.__dict__['ndd'] = 100
print(t.dd, t.ndd, sep=' | ')
Here’s the output:
{}
Calling the __get__() of DataDescriptor | Calling the __get__() of NonDataDescriptor
Calling the __get__() of DataDescriptor | 100
What’s happening here?
This is due to how instance attribute lookup works. When Python looks for an attribute, it follows a specific order:
- Data Descriptors (like
dd
) get the highest priority. - Instance Variables come next.
- Non-data Descriptors (like
ndd
). - Class Variables.
- If none of these work, it falls back to
__getattr__()
if defined.
In this example, even though we assigned 100 to both dd
and ndd
in the instance’s __dict__
, the data descriptor dd
still has higher priority. However, the non-data descriptor ndd allows the instance variable to take precedence, so we see 100 when accessing t.ndd
after the assignment.
This also explains why we can shadow methods with instance variables:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Test:
def func(self):
return
t = Test()
decriptor_inst = Test.func
print(f"Does __get__() exist? {callable(getattr(decriptor_inst.__class__, '__get__', None))}")
print(f"Does __set__() exist? {callable(getattr(decriptor_inst.__class__, '__set__', None))}")
print(vars(t))
print(f"Before instance variable func is defined: {t.func}")
t.__dict__['func'] = 100
print(f"After instance variable func is defined: {t.func}")
del t.func
print(f"After instance variable func is deleted: {t.func}")
Here’s the output:
Does __get__() exist? True
Does __set__() exist? False
{}
Before instance variable func is defined: <bound method Test.func of <__main__.Test object at 0x104a6ebd0>>
After instance variable func is defined: 100
After instance variable func is deleted: <bound method Test.func of <__main__.Test object at 0x104a6ebd0>>
Notice a few things here:
Test.func
has a__get__()
method, but not__set__()
, making it a non-data descriptor.- Before we define an instance variable
func
, accessingt.func
returns the bound method (via the descriptor). - After assigning 100 to
t.func
, the instance variable takes precedence and shadows the non-data descriptor. - Once we delete the instance variable
func
, the original method is accessible again, thanks to the__get__()
method of the descriptor.
This priority system explains why functions, being non-data descriptors, allow instance variables to temporarily override them.
@property
You might be thinking, “Wait, doesn’t @property
already modify behavior for attribute access?” And you’d be right! But here’s the kicker: property itself is a descriptor.
Let’s take a look:
1
2
3
4
5
6
7
8
9
10
11
12
class Test:
@property
def property_test(self):
return "This is from property_test"
t = Test()
print(f"Does __get__() exist? {callable(getattr(Test.property_test.__class__, '__get__', None))}")
print(f"Does __set__() exist? {callable(getattr(Test.property_test.__class__, '__set__', None))}")
print(f"Does __delete__() exist? {callable(getattr(Test.property_test.__class__, '__delete__', None))}")
t.__dict__["property_test"] = 100
print(t.property_test)
The output is shown below:
Does __get__() exist? True
Does __set__() exist? True
Does __delete__() exist? True
This is from property_test
Notice a few key things:
__get__()
,__set__()
and__delete__()
are all defined for the property class, making it a data descriptor.- Even after we set an attribute in
t.__dict__
, accessingt.property_test
still triggers the__get__()
method of the descriptor.
We can further understand this behavior by looking at a simplified Python equivalent of the built-in property:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class Property:
"Emulate PyProperty_Type() in Objects/descrobject.c"
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
if doc is None and fget is not None:
doc = fget.__doc__
self.__doc__ = doc
self._name = ''
def __set_name__(self, owner, name):
self._name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
if self.fget is None:
raise AttributeError(
f'property {self._name!r} of {type(obj).__name__!r} object has no getter'
)
return self.fget(obj)
def __set__(self, obj, value):
if self.fset is None:
raise AttributeError(
f'property {self._name!r} of {type(obj).__name__!r} object has no setter'
)
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
raise AttributeError(
f'property {self._name!r} of {type(obj).__name__!r} object has no deleter'
)
self.fdel(obj)
def getter(self, fget):
prop = type(self)(fget, self.fset, self.fdel, self.__doc__)
prop._name = self._name
return prop
def setter(self, fset):
prop = type(self)(self.fget, fset, self.fdel, self.__doc__)
prop._name = self._name
return prop
def deleter(self, fdel):
prop = type(self)(self.fget, self.fset, fdel, self.__doc__)
prop._name = self._name
return prop
This Property class behaves like the built-in property descriptor. It stores a getter (fget
), setter (fset
), and deleter (fdel
). When accessed, it checks if these methods exist and, if so, invokes them with the instance as the argument.
This makes the @property
decorator both flexible and powerful, as it essentially wraps functions into a data descriptor that controls attribute access.
Introduction of __set_name__()
In the Property class above, there’s an interesting method, __set_name__()
. This feature was added in Python 3.6 to help descriptors manage attribute names within their owner class. The method is automatically called when the class is created, and it has the following signature:
1
obj.__set_name__(self, owner, name)
Here:
owner
refers to the class that owns the descriptor (i.e., the class where the descriptor is defined as an attribute).name
is the name of the attribute within that class that holds the descriptor.
This method is particularly useful for situations where the descriptor needs to know the name of the attribute it’s bound to, which can be helpful when generating error messages or logging. You’ll often see it used in descriptors that handle multiple attributes dynamically or when customizing behavior based on the attribute name.
Use Cases
To demonstrate how descriptors can simplify repetitive tasks, let’s consider a scenario where we want to log access to attributes. Typically, with Python’s @property
decorator, we might start by writing something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
class Test:
def __init__(self):
self._x = None
@property
def x(self):
print("[INFO] Accessing x through getter")
return self._x
@x.setter
def x(self, value):
print("[INFO] Accessing x through setter")
self._x = value
In this example, every time we access or modify x
, a message is printed. This works great for one or two attributes, but what happens if we add more attributes like y
and z
? We’d have to repeat the same property logic for each of them:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Test:
def __init__(self):
self._x = None
self._y = None
self._z = None
@property
def x(self):
print("[INFO] Accessing x through getter")
return self._x
@x.setter
def x(self, value):
print("[INFO] Accessing x through setter")
self._x = value
@property
def y(self):
print("[INFO] Accessing y through getter")
return self._y
@y.setter
def y(self, value):
print("[INFO] Accessing y through setter")
self._y = value
@property
def z(self):
print("[INFO] Accessing z through getter")
return self._z
@z.setter
def z(self, value):
print("[INFO] Accessing z through setter")
self._z = value
It works, but there’s a lot of repetition here—violating the DRY (Don’t Repeat Yourself) principle. Maintaining this approach becomes cumbersome, especially as the number of attributes grows.
This is where descriptors come to the rescue. With descriptors, we can abstract the repetitive behavior into a reusable component. Instead of writing properties for each attribute, we can define a descriptor that handles the logging for us. Let’s refactor the example to use a custom descriptor:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class PrivateLogger:
def __init__(self, name):
self._name = name
self._actual_name = "_" + name
def __get__(self, instance, owner):
if instance is None:
return self
print(f"[INFO] Accessing {self._name} through getter")
return getattr(instance, self._actual_name)
def __set__(self, instance, value):
print(f"[INFO] Accessing {self._name} through setter")
setattr(instance, self._actual_name, value)
Now, we can easily apply this descriptor to multiple attributes without repeating ourselves:
1
2
3
4
5
6
7
8
9
class Test:
x = PrivateLogger('x')
y = PrivateLogger('y')
z = PrivateLogger('z')
def __init__(self):
self._x = None
self._y = None
self._z = None
While the above solution is better, manually passing the attribute name (x
, y
, z
) to the PrivateLogger
still leaves room for error. A typo in the name would cause unexpected behavior. Fortunately, we can improve our PrivateLogger
by utilizing __set_name__()
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class PrivateLogger:
def __init__(self):
self._name = ""
self._actual_name = ""
def __set_name__(self, owner, name):
self._name = name
self._actual_name = "_" + name
def __get__(self, instance, owner):
if instance is None:
return self
print(f"[INFO] Accessing {self._name} through getter")
return getattr(instance, self._actual_name)
def __set__(self, instance, value):
print(f"[INFO] Accessing {self._name} through setter")
setattr(instance, self._actual_name, value)
With this new version, we no longer need to pass the attribute names manually. The descriptor takes care of everything:
1
2
3
4
5
6
7
8
9
class Test:
x = PrivateLogger()
y = PrivateLogger()
z = PrivateLogger()
def __init__(self):
self._x = None
self._y = None
self._z = None
Now, the code is cleaner, and we’ve successfully reduced the redundancy!
Conclusion
I hope this article has shed light on how powerful and versatile descriptors can be in Python. From simplifying repetitive tasks to revealing some of the language’s hidden magic, descriptors offer a lot of flexibility for managing attribute access. By understanding how they work, you’ll be better equipped to harness their potential in your future Python projects.
Till next time!