You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/advanced/training_tricks.rst
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -331,8 +331,7 @@ However, for in-memory datasets, that means that each process will hold a (redun
331
331
For example, when training Graph Neural Networks, a common strategy is to load the entire graph into CPU memory for fast access to the entire graph structure and its features, and to then perform neighbor sampling to obtain mini-batches that fit onto the GPU.
332
332
333
333
A simple way to prevent redundant dataset replicas is to rely on :obj:`torch.multiprocessing` to share the `data automatically between spawned processes via shared memory <https://pytorch.org/docs/stable/notes/multiprocessing.html>`_.
334
-
For this, all data pre-loading should be done on the main process inside :meth:`DataModule.__init__`. As a result, all tensor-data will get automatically shared when using the :class:`~pytorch_lightning.plugins.strategies.ddp_spawn.DDPSpawnStrategy`
335
-
training type strategy:
334
+
For this, all data pre-loading should be done on the main process inside :meth:`DataModule.__init__`. As a result, all tensor-data will get automatically shared when using the :class:`~pytorch_lightning.plugins.strategies.ddp_spawn.DDPSpawnStrategy` strategy.
Copy file name to clipboardExpand all lines: docs/source/common/checkpointing.rst
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -315,6 +315,7 @@ and the Lightning Team will be happy to integrate/help integrate it.
315
315
316
316
-----------
317
317
318
+
.. _customize_checkpointing:
318
319
319
320
***********************
320
321
Customize Checkpointing
@@ -392,7 +393,7 @@ Custom Checkpoint IO Plugin
392
393
393
394
.. note::
394
395
395
-
Some ``TrainingTypePlugins`` like ``DeepSpeedStrategy`` do not support custom ``CheckpointIO`` as checkpointing logic is not modifiable.
396
+
Some strategies like :class:`~pytorch_lightning.strategies.deepspeed.DeepSpeedStrategy` do not support custom :class:`~pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO` as checkpointing logic is not modifiable.
More information regarding precision with Lightning can be found :doc:`here <../advanced/precision>`
56
+
57
+
-----------
58
+
59
+
********************
60
+
CheckpointIO Plugins
61
+
********************
77
62
63
+
As part of our commitment to extensibility, we have abstracted Lightning's checkpointing logic into the :class:`~pytorch_lightning.plugins.io.CheckpointIO` plugin.
64
+
With this, you have the ability to customize the checkpointing logic to match the needs of your infrastructure.
65
+
66
+
Below is a list of built-in plugins for checkpointing.
67
+
68
+
.. currentmodule:: pytorch_lightning.plugins.io
69
+
70
+
.. autosummary::
71
+
:nosignatures:
72
+
:template: classtemplate.rst
73
+
74
+
CheckpointIO
75
+
HPUCheckpointIO
76
+
TorchCheckpointIO
77
+
XLACheckpointIO
78
+
79
+
You could learn more about custom checkpointing with Lightning :ref:`here <customize_checkpointing>`.
80
+
81
+
-----------
82
+
83
+
********************
78
84
Cluster Environments
79
-
--------------------
85
+
********************
86
+
87
+
You can define the interface of your own cluster environment based on the requirements of your infrastructure.
0 commit comments