Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions content/WhatIsAnARG_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ def setup():
"<style type='text/css'>" +
".exercise {background-color: yellow; color: black; font-family: 'serif'; font-size: 1.2em}" +
".exercise code {font-size: 0.7em}" +
"p a > code {color: #0000EE !important; } p a:visited > code {color: #551A8B !important; }" +
"</style>" +
"<h4>✅ Your notebook is ready to go!</h4>" +
("This notebook is not running in JupyterLite: you may need to install tskit, tszip, etc."
Expand Down
16 changes: 8 additions & 8 deletions content/WhatIsAnARG_workbook1.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@
"\n",
"The [_tskit_arg_visualizer_](https://github.com/kitchensjn/tskit_arg_visualizer) software uses the [D3js library](https://d3js.org) to visualise ARGs and other tree sequences interactively, in a browser or Jupyter notebook. As is conventional, the oldest nodes are drawn at the top, with the youngest, usually at time 0, at the bottom.\n",
"\n",
"It works by creating a new [`D3ARG`](https://github.com/kitchensjn/tskit_arg_visualizer/blob/main/docs/tutorial.md#what-is-a-d3arg) object from the _tskit_ ARG. This `D3ARG` object can then be plotted using `.draw()`.\n",
"The visualiser creates a [`D3ARG`](https://github.com/kitchensjn/tskit_arg_visualizer/blob/main/docs/tutorial.md#what-is-a-d3arg) object from the _tskit_ ARG. This object can then be plotted using `.draw()`.\n",
"\n",
"<div class=\"alert alert-block alert-info\"><b>Note:</b> You'll see that some nodes in this plot have two IDs. Don't worry about this: as we'll see later it's because the simulator has represented recombination using 2 nodes, which have been overlaid in the visualizer</div>\n",
"\n",
Expand Down Expand Up @@ -212,7 +212,7 @@
"source": [
"### Visualising local trees\n",
"\n",
"By default, `tskit` displays each local tree as a summary table, as above. To draw the tree out, you can use the [`.draw_svg()`](https://tskit.dev/tutorials/viz.html#svg-format) method, suitable for small trees of tens or hundreds of nodes each."
"By default, `tskit` displays each local tree as a summary table, as above. To draw the tree, you can use the [`.draw_svg()`](https://tskit.dev/tutorials/viz.html#svg-format) method, suitable for small trees of tens or hundreds of nodes each."
]
},
{
Expand Down Expand Up @@ -294,7 +294,7 @@
"source": [
"## Coalescent and non-coalescent regions\n",
"\n",
"Looking at the tree-by-tree plot, it should be clear that some of the nodes in a local tree have one child in some trees, and two children in others. There are even some nodes that have only one child in every tree in which they appear (e.g. node 26). We can classify nodes into\n",
"Looking at the tree-by-tree plot, it should be clear that some of the nodes in a local tree have one child in some trees, and two children in others. There are even some nodes that have only one child in every tree in which they appear (e.g. node 26). We can classify nodes into:\n",
"\n",
"0. **non-coalescent**, sometimes called _always unary_ (i.e. one child in all local trees, e.g. node 26)\n",
"1. **part-coalescent**, sometimes called _locally unary_ (i.e. one child in some local trees, coalescent in others, e.g. node 18)\n",
Expand Down Expand Up @@ -403,7 +403,7 @@
"id": "f0876be3-5bfb-42ee-884d-c844b4c19743",
"metadata": {},
"source": [
"The ARG was actually simulated using a model of human evolution that reflects the Out of Africa event. As well as having a value denoting the <code>individual</code>, each node also has a value indicating a <code>population</code> it belongs to.\n",
"The ARG was actually simulated using a model of human evolution that reflects the Out of Africa event. As well as having a value denoting the <code>individual</code>, each node also has a value indicating a <code>population</code> to which it belongs.\n",
"\n",
"<dl class=\"exercise\"><dt>Exercise E</dt>\n",
" <dd>Change the code above to colour by <code>node.population</code> ID rather than <code>node.individual</code> ID. You could also stop colouring the recombination nodes as black if you like.</dd>\n",
Expand Down Expand Up @@ -853,7 +853,7 @@
"source": [
"It should be reasonably obvious how this works. E.g. edge 0 connects parent node 10 to child node 6 in the part of the genome that spans 0 to 930 bp. For further information see [https://tskit.dev/tskit/docs/stable/data-model.html](https://tskit.dev/tskit/docs/stable/data-model.html), and for a tutorial approach, see [https://tskit.dev/tutorials/tables_and_editing.html](https://tskit.dev/tutorials/tables_and_editing.html).\n",
"\n",
"As a brief introduction, you can access particular edges, nodes, sites, etc. as Python objects using `arg.edge(i)`, `arg.node(i)`, `arg.site(i)`, and so on."
"We previously used [`arg.nodes()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.nodes), [`arg.individuals()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.individuals), and [`arg.populations()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.populations) to return Python objects, created by iterating over all the rows in a table. Similarly, methods exist for [`arg.edges()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.edges), [`arg.sites()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.sites), and [`arg.mutations()`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.mutations). To access a specific edge, node, site, etc. as a Python object you can also use [`arg.edge(i)`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.edge), [`arg.node(i)`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.node), [`arg.site(i)`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.site), and so on. "
]
},
{
Expand Down Expand Up @@ -915,7 +915,7 @@
"source": [
"#### High performance data access\n",
"\n",
"However, the most performant way to access the underlying data is to use the [efficient column accessors](https://tskit.dev/tskit/docs/stable/python-api.html#efficient-table-column-access), which provide _numpy_ arrays that are a direct view into memory. For example, to find all the site positions along the genome, you can use `arg.tables.sites.position` (or the shortcut `arg.sites_position`). This is particularly relevant when dealing with ARGs containing large tables (e.g. millions of rows)."
"Using Python objects is convenient, but can be inefficient for large ARGs. The most performant way to access the underlying data is to use the [efficient column accessors](https://tskit.dev/tskit/docs/stable/python-api.html#efficient-table-column-access), which provide _numpy_ arrays that are a direct view into memory. For example, to find all the site positions along the genome, you can use `arg.tables.sites.position` (or the shortcut [`arg.sites_position`](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.sites_position)). This is particularly relevant when dealing with ARGs containing large tables (e.g. millions of rows)."
]
},
{
Expand Down Expand Up @@ -977,7 +977,7 @@
"source": [
"#### High performance trees\n",
"\n",
"There are also [fast array access methods](https://tskit.dev/tskit/docs/stable/python-api.html#array-access) for local trees in a tree sequence. \n"
"Local trees in a tree sequence are not stored in a table, but iteratively constructed on the fly using the `arg.trees()` method. However, a tree object has a set of [fast array access methods](https://tskit.dev/tskit/docs/stable/python-api.html#array-access) to provide efficient access to tree-based information, such as the [parents of nodes](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.Tree.parent_array) in a tree, the [number of children](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.Tree.num_children_array) of tree nodes, or the [edge above each node](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.Tree.edge_array).\n"
]
},
{
Expand All @@ -987,7 +987,7 @@
"metadata": {},
"outputs": [],
"source": [
"tree = arg.first()\n",
"tree = arg.first()\n",
"\n",
"# Simple access to the parent of node 0 in the tree\n",
"print(\"Parent of node 0 in the first tree is\", tree.parent(0))\n",
Expand Down
23 changes: 12 additions & 11 deletions content/WhatIsAnARG_workbook2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,19 @@
"metadata": {},
"outputs": [],
"source": [
"try:\n",
"import sys\n",
"\n",
"if sys.platform != 'emscripten':\n",
" # allow the notebook to be downloaded and run locally too\n",
" print(\"This notebook is not running in JupyterLite: you may need to install tskit, tszip, etc.\")\n",
"else:\n",
" import micropip\n",
" await micropip.install('tszip')\n",
" await micropip.install('pyslim==1.0.4') # Need an older version for SLiM < 5\n",
" await micropip.install('stdpopsim')\n",
" await micropip.install('demesdraw')\n",
" await micropip.install('jupyterquiz')\n",
" await micropip.install('tskit_arg_visualizer')\n",
"except(ModuleNotFoundError):\n",
" pass # allow to be run outside of JupyterLite too\n",
"\n",
"from jupyterquiz import display_quiz\n",
"from matplotlib import pyplot as plt\n",
Expand Down Expand Up @@ -419,9 +422,7 @@
"source": [
"### Nodes represent genomes\n",
"\n",
"Philosophically, the nodes in a simplified ARG no longer represent historical events, but the genomes that are produced as an *outcome* of those (unknown) events. For example, sample node 6 now has 2 parents, even though it does not itself represent a recombination event. We know a recombination event occured some time between the time of node 6 and the time of its youngest parent, node 15, but we can't say exactly when.\n",
"\n",
"We can also use the `simplify()` method to identify the genome sequence at internal nodes:"
"Philosophically, the nodes in a simplified ARG no longer represent historical events, but the genomes that are produced as an *outcome* of those (unknown) events. For example, sample node 6 now has 2 parents, even though it does not itself represent a recombination event. We know a recombination event occurred some time between the time of node 6 and the time of its youngest parent, node 15, but we can't say exactly when."
]
},
{
Expand All @@ -431,7 +432,9 @@
"source": [
"### Simplifying with different focal nodes\n",
"\n",
"One of the main uses of `simplify()` is to change which nodes are treated as samples. By default the existing sample nodes are taken, but we can This allows us to look at the genome of internal nodes in the ancestry, and emphasises that each node is a (possibly partially known) genome. For instance "
"One of the main uses of `simplify()` is to change which nodes are treated as samples. By default the existing sample nodes are taken, but we can specify a subset of the existing samples to make a smaller ARG, which is the topic of the next subsection.\n",
"\n",
"However, for demonstration purposes, the code below (unusually) passes a previously non-sample node to `simplify()`. This is a shortcut that allows us to output the assumed genome of internal nodes in the ancestry: "
]
},
{
Expand Down Expand Up @@ -986,12 +989,10 @@
"source": [
"## Stdpopsim: easily run verified simulations\n",
"\n",
"Even though the demographic model above is relatively simple, it still contains many parameters, and specifying it, or something similar, can be tricky and prone to error.\n",
"Even though the demographic model above is relatively simple, it still contains many parameters, and specifying it, or something similar, can be tricky and prone to error. For this reason, rather than using <em>msprime</em> directly, it is often much easier to use the <a href=\"https://popsim-consortium.github.io/stdpopsim-docs/\">Standard Library for Population Genetic Simulation Models</a> (<em>stdpopsim</em>).\n",
"\n",
"<img style=\"float:right; margin: 0.5em;\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/9/9a/NASA_Joins_Jane_Goodall_to_Conserve_Chimp_Habitats_%28SVS14410%29.jpg/330px-NASA_Joins_Jane_Goodall_to_Conserve_Chimp_Habitats_%28SVS14410%29.jpg\" />\n",
"For this reason, rather than using <em>msprime</em> directly, it is often much easier to use the <a href=\"https://popsim-consortium.github.io/stdpopsim-docs/\">Standard Library for Population Genetic Simulation Models</a> (<em>stdpopsim</em>). This is a set of tried and tested genomic and demographic models for various species. We will demonstrate using the Python API (as documented in the <a href=\"https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html#running-stdpopsim-with-the-python-interface-api\">tutorial documentation</a>.\n",
"\n",
"For instance, here is an example of a demographic model of populations in the genus <em>Pan</em> (common chimpanzees and bonobos)."
"<em>Stdpopsim</em> is a set of tried and tested genomic and demographic models for various species. We will demonstrate using the Python API (as documented in the <a href=\"https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html#running-stdpopsim-with-the-python-interface-api\">tutorial documentation</a>). For instance, here is an example of a demographic model of populations in the genus <em>Pan</em> i.e. common chimpanzees and bonobos."
]
},
{
Expand Down
Loading