Skip to content

Commit 9dda8d1

Browse files
committed
- Update polymorphic queryset iterators to honor chunk_size.
- Add iteration tests. - Update documentation with iteration performance considerations.
1 parent 9cb89d9 commit 9dda8d1

File tree

7 files changed

+263
-15
lines changed

7 files changed

+263
-15
lines changed

docs/api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ API Documentation
1616
polymorphic.formsets
1717
polymorphic.managers
1818
polymorphic.models
19+
polymorphic.query
1920
polymorphic.showfields
2021
polymorphic.templatetags
2122
polymorphic.utils

docs/api/polymorphic.query.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
polymorphic.query
2+
=================
3+
4+
.. automodule:: polymorphic.query
5+
:members:
6+
:show-inheritance:
7+

docs/changelog.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
Changelog
22
=========
33

4-
v4.2.0 (2025-12-01)
4+
v4.2.0 (2025-12-04)
55
-------------------
66

7+
* Implemented `Defer to chunk_size parameter on .iterators for fetching get_real_instances() <https://github.com/jazzband/django-polymorphic/pull/672>`_
78
* Fixed `Show full admin context (breadcrumb and logout nav) in model type selection admin form <https://github.com/jazzband/django-polymorphic/pull/580>`_
89
* Fixed `Issue with Autocomplete Fields in StackedPolymorphicInline.Child Inline <https://github.com/jazzband/django-polymorphic/issues/546>`_
910
* Support Python 3.14 and Django 6.0, drop support for EOL python 3.9, Django 3.2, 4.0, 4.1 and 5.0.

docs/performance.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,54 @@ if all are class ``ModelA``. If 50 objects are ``ModelA`` and 50 are ``ModelB``,
2727
are executed. The pathological worst case is 101 db queries if result_objects contains 100 different
2828
object types (with all of them subclasses of ``ModelA``).
2929

30+
Iteration: Memory vs DB Round Trips
31+
-----------------------------------
32+
33+
When iterating over large QuerySets, there is a trade-off between memory consumption and number
34+
of round trips to the database. One additional query is needed per model subclass present in the
35+
QuerySet and these queries take the form of ``SELECT ... WHERE pk IN (....)`` with a potentially
36+
large number of IDs in the IN clause. All models in the IN clause will be loaded into memory during
37+
iteration.
38+
39+
To balance this trade-off, by default a maximum of 2000 objects are requested at once. This means
40+
that if your QuerySet contains 10,000 objects of 3 different subclasses, then 16 queries will be
41+
executed: 1 to fetch the base objects, and 5 (10/2 == 5) * 3 more to fetch the subclasses.
42+
43+
The `chunk_size` parameter on :meth:`~django.db.models.query.QuerySet.iterator` can be used to
44+
change the number of objects loaded into memory at once during iteration. For example, to load 5000 objects at once:
45+
46+
.. code-block:: python
47+
48+
for obj in ModelA.objects.all().iterator(chunk_size=5000):
49+
process(obj)
50+
51+
.. note::
52+
53+
``chunk_size`` on non-polymorphic QuerySets controls the number of rows fetched from the
54+
database at once, but for polymorphic QuerySets the behavior is more analogous to its behavior
55+
when :meth:`~django.db.models.query.QuerySet.prefetch_related` is used.
56+
57+
Some database backends limit the number of parameters in a query. For those backends the
58+
``chunk_size`` will be restricted to be no greater than that limit. This limit can be checked in:
59+
60+
.. code-block:: python
61+
62+
from django.db import connection
63+
64+
print(connection.features.max_query_params)
65+
66+
67+
You may change the global default fallback ``chunk_size`` by modifying the
68+
:attr:`polymorphic.query.Polymorphic_QuerySet_objects_per_request` attribute. Place code like
69+
this somewhere that will be executed during startup:
70+
71+
.. code-block:: python
72+
73+
from polymorphic import query
74+
75+
query.Polymorphic_QuerySet_objects_per_request = 5000
76+
77+
3078
:class:`~django.contrib.contenttypes.models.ContentType` retrieval
3179
------------------------------------------------------------------
3280

justfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,11 +182,11 @@ test-lock +PACKAGES: _lock-python
182182
test *TESTS:
183183
@just run pytest --cov-append {{ TESTS }}
184184

185-
test-db DB_CLIENT="dev":
185+
test-db DB_CLIENT="dev" *TESTS:
186186
# No Optional Dependency Unit Tests
187187
# todo clean this up, rerunning a lot of tests
188188
uv sync --group {{ DB_CLIENT }}
189-
@just run pytest --cov-append
189+
@just run pytest --cov-append {{ TESTS }}
190190

191191
# run the pre-commit checks
192192
precommit:

src/polymorphic/query.py

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
from django.contrib.contenttypes.models import ContentType
99
from django.core.exceptions import FieldDoesNotExist
10+
from django.db import connections
1011
from django.db.models import FilteredRelation
1112
from django.db.models.query import ModelIterable, Q, QuerySet
1213

@@ -17,9 +18,11 @@
1718
translate_polymorphic_Q_object,
1819
)
1920

20-
# chunk-size: maximum number of objects requested per db-request
21-
# by the polymorphic queryset.iterator() implementation
22-
Polymorphic_QuerySet_objects_per_request = 100
21+
Polymorphic_QuerySet_objects_per_request = 2000
22+
"""
23+
The maximum number of objects requested per db-request by the polymorphic
24+
queryset.iterator() implementation
25+
"""
2326

2427

2528
class PolymorphicModelIterable(ModelIterable):
@@ -44,26 +47,37 @@ def _polymorphic_iterator(self, base_iter):
4447
for o in real_results: yield o
4548
4649
but it requests the objects in chunks from the database,
47-
with Polymorphic_QuerySet_objects_per_request per chunk
50+
with QuerySet.iterator(chunk_size) per chunk
4851
"""
52+
53+
# some databases have a limit on the number of query parameters, we must
54+
# respect this for generating get_real_instances queries because those
55+
# queries do a large WHERE IN clause with primary keys
56+
max_chunk = connections[self.queryset.db].features.max_query_params
57+
sql_chunk = self.chunk_size if self.chunked_fetch else None
58+
if max_chunk:
59+
sql_chunk = (
60+
max_chunk
61+
if not self.chunked_fetch # chunk_size was not provided
62+
else min(max_chunk, self.chunk_size or max_chunk)
63+
)
64+
65+
sql_chunk = sql_chunk or Polymorphic_QuerySet_objects_per_request
66+
4967
while True:
5068
base_result_objects = []
5169
reached_end = False
5270

53-
# Make sure the base iterator is read in chunks instead of
54-
# reading it completely, in case our caller read only a few objects.
55-
for i in range(Polymorphic_QuerySet_objects_per_request):
71+
# Fetch in chunks
72+
for _ in range(sql_chunk):
5673
try:
5774
o = next(base_iter)
5875
base_result_objects.append(o)
5976
except StopIteration:
6077
reached_end = True
6178
break
6279

63-
real_results = self.queryset._get_real_instances(base_result_objects)
64-
65-
for o in real_results:
66-
yield o
80+
yield from self.queryset._get_real_instances(base_result_objects)
6781

6882
if reached_end:
6983
return

src/polymorphic/tests/test_orm.py

Lines changed: 178 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,11 @@
33
import uuid
44

55
from django.contrib.contenttypes.models import ContentType
6-
from django.db import models
6+
from django.db import models, connection
77
from django.db.models import Case, Count, FilteredRelation, Q, Sum, When, F
88
from django.db.utils import IntegrityError, NotSupportedError
99
from django.test import TransactionTestCase
10+
from django.test.utils import CaptureQueriesContext
1011

1112
from polymorphic import query_translate
1213
from polymorphic.managers import PolymorphicManager
@@ -1234,3 +1235,179 @@ def test_refresh_from_db_fields(self):
12341235
def test_non_polymorphic_parent(self):
12351236
obj = NonPolymorphicParent.objects.create()
12361237
assert obj.delete()
1238+
1239+
def test_iteration(self):
1240+
Model2A.objects.all().delete()
1241+
1242+
for i in range(250):
1243+
Model2B.objects.create(field1=f"B1-{i}", field2=f"B2-{i}")
1244+
for i in range(1000):
1245+
Model2C.objects.create(
1246+
field1=f"C1-{i + 250}", field2=f"C2-{i + 250}", field3=f"C3-{i + 250}"
1247+
)
1248+
for i in range(2000):
1249+
Model2D.objects.create(
1250+
field1=f"D1-{i + 1250}",
1251+
field2=f"D2-{i + 1250}",
1252+
field3=f"D3-{i + 1250}",
1253+
field4=f"D4-{i + 1250}",
1254+
)
1255+
1256+
with CaptureQueriesContext(connection) as base_all:
1257+
for _ in Model2A.objects.non_polymorphic().all():
1258+
pass # Evaluating the queryset
1259+
1260+
len_base_all = len(base_all)
1261+
assert len_base_all == 1, (
1262+
f"Expected 1 queries for chunked iteration over 3250 base objects. {len_base_all}"
1263+
)
1264+
1265+
with CaptureQueriesContext(connection) as base_iterator:
1266+
for _ in Model2A.objects.non_polymorphic().iterator():
1267+
pass # Evaluating the queryset
1268+
1269+
len_base_iterator = len(base_iterator)
1270+
assert len_base_iterator == 1, (
1271+
f"Expected 1 queries for chunked iteration over 3250 base objects. {len_base_iterator}"
1272+
)
1273+
1274+
with CaptureQueriesContext(connection) as base_chunked:
1275+
for _ in Model2A.objects.non_polymorphic().iterator(chunk_size=1000):
1276+
pass # Evaluating the queryset
1277+
1278+
len_base_chunked = len(base_chunked)
1279+
assert len_base_chunked == 1, (
1280+
f"Expected 1 queries for chunked iteration over 3250 base objects. {len_base_chunked}"
1281+
)
1282+
1283+
with CaptureQueriesContext(connection) as poly_all:
1284+
b, c, d = 0, 0, 0
1285+
for idx, obj in enumerate(reversed(list(Model2A.objects.order_by("-pk").all()))):
1286+
if isinstance(obj, Model2D):
1287+
d += 1
1288+
assert obj.field1 == f"D1-{idx}"
1289+
assert obj.field2 == f"D2-{idx}"
1290+
assert obj.field3 == f"D3-{idx}"
1291+
assert obj.field4 == f"D4-{idx}"
1292+
elif isinstance(obj, Model2C):
1293+
c += 1
1294+
assert obj.field1 == f"C1-{idx}"
1295+
assert obj.field2 == f"C2-{idx}"
1296+
assert obj.field3 == f"C3-{idx}"
1297+
elif isinstance(obj, Model2B):
1298+
b += 1
1299+
assert obj.field1 == f"B1-{idx}"
1300+
assert obj.field2 == f"B2-{idx}"
1301+
else:
1302+
assert False, "Unexpected model type"
1303+
assert (b, c, d) == (250, 1000, 2000)
1304+
1305+
assert len(poly_all) <= 7, (
1306+
f"Expected < 7 queries for chunked iteration over 3250 "
1307+
f"objects with 3 child models and the default chunk size of 2000, encountered "
1308+
f"{len(poly_all)}"
1309+
)
1310+
1311+
with CaptureQueriesContext(connection) as poly_all:
1312+
b, c, d = 0, 0, 0
1313+
for idx, obj in enumerate(Model2A.objects.order_by("pk").iterator(chunk_size=None)):
1314+
if isinstance(obj, Model2D):
1315+
d += 1
1316+
assert obj.field1 == f"D1-{idx}"
1317+
assert obj.field2 == f"D2-{idx}"
1318+
assert obj.field3 == f"D3-{idx}"
1319+
assert obj.field4 == f"D4-{idx}"
1320+
elif isinstance(obj, Model2C):
1321+
c += 1
1322+
assert obj.field1 == f"C1-{idx}"
1323+
assert obj.field2 == f"C2-{idx}"
1324+
assert obj.field3 == f"C3-{idx}"
1325+
elif isinstance(obj, Model2B):
1326+
b += 1
1327+
assert obj.field1 == f"B1-{idx}"
1328+
assert obj.field2 == f"B2-{idx}"
1329+
else:
1330+
assert False, "Unexpected model type"
1331+
assert (b, c, d) == (250, 1000, 2000)
1332+
1333+
assert len(poly_all) <= 7, (
1334+
f"Expected < 7 queries for chunked iteration over 3250 "
1335+
f"objects with 3 child models and a chunk size of 2000, encountered "
1336+
f"{len(poly_all)}"
1337+
)
1338+
1339+
with CaptureQueriesContext(connection) as poly_iterator:
1340+
b, c, d = 0, 0, 0
1341+
for idx, obj in enumerate(Model2A.objects.order_by("pk").iterator()):
1342+
if isinstance(obj, Model2D):
1343+
d += 1
1344+
assert obj.field1 == f"D1-{idx}"
1345+
assert obj.field2 == f"D2-{idx}"
1346+
assert obj.field3 == f"D3-{idx}"
1347+
assert obj.field4 == f"D4-{idx}"
1348+
elif isinstance(obj, Model2C):
1349+
c += 1
1350+
assert obj.field1 == f"C1-{idx}"
1351+
assert obj.field2 == f"C2-{idx}"
1352+
assert obj.field3 == f"C3-{idx}"
1353+
elif isinstance(obj, Model2B):
1354+
b += 1
1355+
assert obj.field1 == f"B1-{idx}"
1356+
assert obj.field2 == f"B2-{idx}"
1357+
else:
1358+
assert False, "Unexpected model type"
1359+
assert (b, c, d) == (250, 1000, 2000)
1360+
1361+
assert len(poly_iterator) <= 7, (
1362+
f"Expected <= 7 queries for chunked iteration over 3250 "
1363+
f"objects with 3 child models and a default chunk size of 2000, encountered "
1364+
f"{len(poly_iterator)}"
1365+
)
1366+
1367+
with CaptureQueriesContext(connection) as poly_chunked:
1368+
b, c, d = 0, 0, 0
1369+
for idx, obj in enumerate(Model2A.objects.order_by("pk").iterator(chunk_size=4000)):
1370+
if isinstance(obj, Model2D):
1371+
d += 1
1372+
assert obj.field1 == f"D1-{idx}"
1373+
assert obj.field2 == f"D2-{idx}"
1374+
assert obj.field3 == f"D3-{idx}"
1375+
assert obj.field4 == f"D4-{idx}"
1376+
elif isinstance(obj, Model2C):
1377+
c += 1
1378+
assert obj.field1 == f"C1-{idx}"
1379+
assert obj.field2 == f"C2-{idx}"
1380+
assert obj.field3 == f"C3-{idx}"
1381+
elif isinstance(obj, Model2B):
1382+
b += 1
1383+
assert obj.field1 == f"B1-{idx}"
1384+
assert obj.field2 == f"B2-{idx}"
1385+
else:
1386+
assert False, "Unexpected model type"
1387+
assert (b, c, d) == (250, 1000, 2000)
1388+
1389+
assert len(poly_chunked) <= 7, (
1390+
f"Expected <= 7 queries for chunked iteration over 3250 objects with 3 child "
1391+
f"models and a chunk size of 4000, encountered {len(poly_chunked)}"
1392+
)
1393+
1394+
if connection.vendor == "postgresql":
1395+
assert len(poly_chunked) == 4, "On postgres with a 4000 chunk size, expected 4 queries"
1396+
1397+
try:
1398+
result = Model2A.objects.all().delete()
1399+
assert result == (
1400+
11500,
1401+
{
1402+
"tests.Model2D": 2000,
1403+
"tests.Model2C": 3000,
1404+
"tests.Model2A": 3250,
1405+
"tests.Model2B": 3250,
1406+
},
1407+
)
1408+
except AttributeError:
1409+
if connection.vendor == "oracle":
1410+
# FIXME
1411+
# known deletion issue with oracle
1412+
# https://github.com/jazzband/django-polymorphic/issues/673
1413+
pass

0 commit comments

Comments
 (0)