X Tutup
The Wayback Machine - https://web.archive.org/web/20250426155034/https://github.com/python/cpython/issues/108611
Skip to content

Processing of fields in dataclasses is inconsistent with MRO #108611

Open
@eltoder

Description

@eltoder

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker,
    and am confident this bug has not been reported before

CPython versions tested on:

3.9, 3.10, 3.11

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.11.4 (main, Jul 24 2023, 08:22:29) [GCC 8.3.0]

A clear and concise description of the bug:

Apologies if this was already reported. I could not find github issues with the same problem:

from dataclasses import dataclass

@dataclass
class A:
    field: int = 10
    attr = 10

@dataclass
class B(A):
    pass

@dataclass
class C(A):
    field: int = 50
    attr = 50

@dataclass
class D(B, C):
    pass

d = D()
print(d.field, d.attr)
assert d.field == d.attr

In this example I expect that both field and attr are inherited from the same base class and so the default value of field equals to attr. However, this is not the case:

10 50
Traceback (most recent call last):
  File "/home/eltoder/dev/scratch/dataclass_mro.py", line 23, in <module>
    assert d.field == d.attr
           ^^^^^^^^^^^^^^^^^
AssertionError

The issue stems from the logic in _process_class:

for f in base_fields.values():

While it processes base classes in the correct order, for each base class it uses __dataclass_fields__, which includes both fields introduced by this base and well as all inherited fields. These inherited fields are causing the issue, because fields inherited from an earlier base can overwrite fields from a later base. A simple proof-of-concept fix for this is:

@@ -914,8 +914,10 @@ def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen,
         base_fields = getattr(b, _FIELDS, None)
         if base_fields is not None:
             has_dataclass_bases = True
+            base_annotations = b.__dict__.get('__annotations__', {})
             for f in base_fields.values():
-                fields[f.name] = f
+                if f.name in base_annotations:
+                    fields[f.name] = f
             if getattr(b, _PARAMS).frozen:
                 any_frozen_base = True

This uses base class' annotations to only pick fields added by the base itself. AFAICT, this works except for handling of KW_ONLY annotation.

A more proper solution can be taken from attrs: each field has the inherited property, indicating if it was defined in this class or inherited from a base. Then in the loop above we skip all inherited fields. A variation on this can be to instead have an owner property that is set to the class that defined the property. Then in the loop we only pick fields with f.owner is b. This has the advantage that we don't need to copy fields (to set inherited=True) and this provides more information for introspection.

Metadata

Metadata

Assignees

Labels

stdlibPython modules in the Lib dirtopic-dataclassestype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    X Tutup