diff --git a/_chapters/single-cell-analysis/04-preprocessing/index.md b/_chapters/single-cell-analysis/04-preprocessing/index.md
index 42a6b90..a8b43e8 100644
--- a/_chapters/single-cell-analysis/04-preprocessing/index.md
+++ b/_chapters/single-cell-analysis/04-preprocessing/index.md
@@ -2,7 +2,7 @@
title: ' Data Filtering and Preprocessing'
---
-Single-cell datasets can have a lot of technical variability issues. Each cell will generally capture a varying number of reads. This will cause some cells to have too low of a signal to be useful. Additionally, genes range from ever-active housekeeping genes to specialized genes that are only expressed in particular cell types or under certain conditions. Employing filtering techniques and preprocessing steps becomes crucial to prepare the data for subsequent analyses.
+Single-cell datasets can have a lot of **technical variability issues**. In contrast to bulk RNA sequencing, where measurements come from many cells at once and random noise is averaged out, in single-cell data each measurement comes from just one cell. Because the amount of RNA is very small, **small technical differences and random sampling effects have a much larger impact**, leading to high variability and many low or zero measurements (dropouts). As a result, some cells may have signal that is too low to be useful. In addition, **gene expression levels vary widely**, from consistently expressed housekeeping genes to genes that are only active in specific cell types or conditions. Therefore, filtering and preprocessing steps are essential to obtain reliable results for downstream analysis.
## Filtering or quality control
diff --git a/_chapters/single-cell-analysis/05-batch-effects/index.md b/_chapters/single-cell-analysis/05-batch-effects/index.md
index 21eaff9..1d7038c 100644
--- a/_chapters/single-cell-analysis/05-batch-effects/index.md
+++ b/_chapters/single-cell-analysis/05-batch-effects/index.md
@@ -4,12 +4,12 @@ title: 'Batch Effects Correction'
## Batch Effects Correction
+
+A batch is a group of samples (cells) that are processed together under the same experimental conditions
+In single-cell analysis, we often work with data from multiple sources, such as different experiments, laboratories, or patient samples. A batch refers to a group of samples that were processed under the same technical conditions, for example, in the same lab, using the same protocol, reagents, or sequencing run. Differences between batches can introduce batch effects, which are sources of unwanted technical variation. These must be distinguished from biological variation which are typically the focus of analysis.
-
-In single cell analysis, we often deal with data from several different sources, be it from a different provider, different experiments or simply different batches. A “batch” refers to an individual group of samples that are processed differently relative to other samples. This different processing when gathering a batch of data can affect variation in the obtained data. The technical, non-biological factors that affect variation in batches are reffered to as batch effects.
-
-Batch effects are problematic because they hinder our ability to measure true biological variation between samples, which is what we are interested in. Luckily, they can be dealt with computationally by aligning data from different batches.
+Batch effects are problematic because they obscure these biological differences and make comparisons between samples unreliable. Luckily, they can be dealt with computationally by aligning data from different batches.
There are different approaches to align data sets and remove batch effects. Orange currently implements three: one through the [Batch Effect Removal](https://orangedatamining.com/widget-catalog/single-cell/batch_effect_removal/) widget, the second, more standard, using canonical correlation implemented in the widget [Align Datasets](https://orangedatamining.com/widget-catalog/single-cell/align_datasets/) and the third, most recently added, in the Harmony widget.
@@ -39,13 +39,13 @@ In our exploration of data integration methods, we'll first look at a technique
Next, let's revisit the Align Datasets widget to fine-tune the parameters for potentially improved clustering. A common strategy is to start with a reduced number of components. Orange seamlessly propagates the transformed data to t-SNE, where we see an updated plot. The alignment between the two datasets appears significantly improved.
-Let's explore the second data alignment method implemented by the Batch Effect Removal widget. We add this widget to our canvas and feed it the combined data. When opening the widget we have to set the distinguishing feature for different batches; in our case, this is the Source ID. We'll also leave the "Skip zero expressions" option unchecked. After applying this correction and visualizing the results in t-SNE, with colors representing cell classes and shapes representing data sources, we observe an interesting result. While some clusters of identical cell types from different sources merge into cohesive units, others remain distinct. It appears that the Align Datasets widget outperformed the Batch Effect Removal method in this case.
+Let's explore the second data alignment method implemented by the Batch Effect Removal widget. We add this widget to our canvas and feed it the combined data. When opening the widget we have to set the distinguishing feature for different batches: in our case, this is the Source ID. We'll also leave the "Skip zero expressions" option unchecked. After applying this correction and visualizing the results in t-SNE, with colors representing cell classes and shapes representing data sources, we observe an interesting result. While some clusters of identical cell types from different sources merge into cohesive units, others remain distinct. It appears that the Align Datasets widget outperformed the Batch Effect Removal method in this case.

-Let us now try Harmony, the third method available in Orange. Harmony has achieved good results in the [Batch integration benchmark study](https://openproblems.bio/benchmarks/batch_integration?version=v2.0.0), making it a robust general-purpose method for integrating single-cell datasets across batches. As before, we pass the concatenated data to the Harmony widget and specify Source ID as the batch-defining variable, leaving the remaining parameters at their default values for now. We then visualize the transformed data using t-SNE. The resulting plot shows that cells of the same type remain clustered together. What about the batch effect correction? To assess batch mixing, we set the Shape parameter to indicate the data source; this reveals that cells from different batches are now well mixed rather than forming separate clusters. Thus by using Harmony we have effectively reduced batch effects while preserving biologically meaningful structure for downstream analysis.
+Let us now try Harmony, the third method available in Orange. Harmony has achieved good results in the [Batch integration benchmark study](https://openproblems.bio/benchmarks/batch_integration?version=v2.0.0), making it a robust general-purpose method for integrating single-cell datasets across batches. As before, we pass the concatenated data to the Harmony widget and specify Source ID as the batch-defining variable, leaving the remaining parameters at their default values for now. We then visualize the transformed data using t-SNE. The resulting plot shows that cells of the same type remain clustered together. What about the batch effect correction? To assess batch mixing, we set the Shape parameter to indicate the data source: this reveals that cells from different batches are now well mixed rather than forming separate clusters. By using Harmony we have effectively reduced batch effects while preserving biologically meaningful structure for downstream analysis.
Among the three main parameters in Harmony (sigma, theta, and lambda), it is often useful to adjust theta, which controls the strength of batch correction: lower values result in weaker batch correction, whereas higher values enforce stronger mixing between batches.
diff --git a/_chapters/single-cell-analysis/06-marker-genes/index.md b/_chapters/single-cell-analysis/06-marker-genes/index.md
index 08f5e9c..4718d9f 100644
--- a/_chapters/single-cell-analysis/06-marker-genes/index.md
+++ b/_chapters/single-cell-analysis/06-marker-genes/index.md
@@ -24,7 +24,8 @@ The output of the widget is a table that includes a gene name and cell type, bot
The idea is now that we would select the gene(s) from the data table, and then score the cells according to the mean expression of selected genes. Widget Score Cells assigns a numerical score to each cell that is proportional to an average expression of the marker genes at the input of the widget. The score is added as a meta attribute to the cell data on the output of Score Cells. Check this using the Data Table! We can now feed this data into t-SNE and set the color and size of the points to the cell score.
-Notice that with any change in the selection of marker genes, we find a group of cells in t-SNE plot where these genes are expressed. Looks like T cells are in the bottom right cluster, B cells somewhere in the middle, and erythrocytes in the left cluster. Did we say cluster? Oh, we are not there yet…
+Notice that with any change in the selection of marker genes, we find a group of cells in t-SNE plot where these genes are expressed. Looks like T cells are in the bottom right cluster, B cells somewhere in the middle, and erythrocytes in the left cluster. Did we say cluster?
+

diff --git a/_chapters/single-cell-analysis/quiz-02/index.md b/_chapters/single-cell-analysis/quiz-02/index.md
index bb36cd2..e891e15 100644
--- a/_chapters/single-cell-analysis/quiz-02/index.md
+++ b/_chapters/single-cell-analysis/quiz-02/index.md
@@ -150,51 +150,70 @@ Plot the preprocessed and annotated data in a new t-SNE plot and compare it to t
### Task 4 - Batch Effect Correction
-Download the sample of a pancreas single cell gene expression dataset ([pancreas_sampled_1k5k.tab](http://file.biolab.si/datasets/pancreas_sampled_1k5k.tab)) and load it into Orange. Generate a t-SNE plot.
-
answer === "3"}
- options={["11", "2", "3"]}
+ question="What is a batch in single cell analysis?"
+ scorer={(answer) => answer === "a group of cells processed under the same technical conditions"}
+ options={["A group of cells from the same tissue", "A group of cells from the same patient", "A group of cells processed under the same technical conditions", "A procedure that removes unwanted technical variability from the data"]}
neutralOptions={["I don't understand the question."]}
trials={2}
timeout={10}>
-
-
-
- 
-
+
answer === "to align datasets from different sources"}
- options={["To normalize the data", "To align datasets from different sources", "To reduce the size of the dataset", "To separate datasets from different sources"]}
+ scorer={(answer) => answer === "to correct for technical differences so datasets can be compared"}
+ options={["To normalize the data", "To correct for technical differences so datasets can be compared", "To reduce the size of the dataset", "To separate datasets from different sources"]}
+ neutralOptions={["I don't understand the question."]}
+ trials={2}
+ timeout={10}>
+
+
+
+
+**Perform batch effect correction on the following data:**
+
+a) Download a sample of a pancreas single-cell gene expression dataset ([pancreas_sampled_1k5k.tab](http://file.biolab.si/datasets/pancreas_sampled_1k5k.tab)) and load it into Orange. The dataset already includes a metafeature, _Batch_, which indicates the sequencing procedure used to obtain each measurement. Generate a t-SNE plot.
+
+
+ answer === "3"}
+ options={["11", "2", "3"]}
neutralOptions={["I don't understand the question."]}
trials={2}
timeout={10}>
+
+
+
+ 
+
-**Apply two different batch-effect correction methods to the dataset:**
-a) Using Align Datasets widget (set the Data source indicator to Batch and leave all other parameters at default values)
+b) Apply two different batch-effect correction methods to the dataset:
+i) Using Align Datasets widget (set the Data source indicator to Batch and leave all other parameters at default values)
-b) Using Harmony widget (leave all parameters at their default values)
-**For each method, generate a t-SNE embedding of the corrected data. Compare t-SNE plots (uncorrected, Align Datasets corrected, Harmony corrected) side by side.**
+ii) Using Harmony widget (leave all parameters at their default values)
+
+For each method, generate a t-SNE embedding of the corrected data. Compare t-SNE plots (uncorrected, Align Datasets corrected, Harmony corrected) side by side.
answer === "pink"}
- options={["Orange", "Pink", "Yellow", "Red"]}
+ question="What are marker genes?"
+ scorer={(answer) => answer === "genes whose expression is characteristic of specific cell types or states"}
+ options={[
+ "Genes whose expression is characteristic of specific cell types or states",
+ "Genes that are expressed in all cells at the same level",
+ "Genes used to normalize gene expression data",
+ "Genes consistently expressed in most or all cells because they are required for basic cellular functions necessary for survival"
+ ]}
neutralOptions={["I don't understand the question."]}
trials={2}
timeout={10}>
+Perform cluster exploration on the retinal dataset. Use the data table of known marker genes ([sc-quiz-marker-genes.xlsx](http://file.biolab.si/datasets/sc-quiz-marker-genes.xlsx)) for each cell type (don't forget to pass the marker genes data though the Genes widget to annotate!) and set the aggregation parameter in the Score Cells widget to **Fraction of expressed markers**.
+
+
+
answer === "red"}
- options={["Red", "Light Blue", "Green", "Pink"]}
+ question="Which cell type is likely picked out in the t-SNE plot above?"
+ scorer={(answer) => answer === "rods"}
+ options={["Cones", "Horizontal cells", "Retinal ganglion cells", "Rods", "Amacrine cells"]}
neutralOptions={["I don't understand the question."]}
- trials={2}
+ trials={4}
timeout={10}>
+
+
+ 
+
-Liang et al. report that in the peripheral tissue the proportion of rods is higher than the proportion of rods in the macular tissue. Does this hold for our dataset sample? Try using the Distributions widget to figure this out.
+
+
+
answer === "horizontal cells"}
+ options={["Cones", "Horizontal cells", "Retinal ganglion cells", "Rods", "Amacrine cells"]}
+ neutralOptions={["I don't understand the question."]}
+ trials={4}
+ timeout={10}>
+
+
+ 
+
+
+
+
+Liang et al. report that in the peripheral tissue the proportion of rods in comparison to other cell types is higher than the proportion of rods in comparison to other cells in the macular tissue. Does this hold for our dataset sample? Try using the Distributions widget to figure this out.
+
+
+(This is a hard question, so here is a hint: In the Distributions widget you need to differentiate between Rods (Selected) and non-Rods (Not Selected) (this means sending all data to Distributions, not just the selected data - rewire!) as well as between tissue Source (Macular or Peripheral). In addition, check _Stack columns_, _Show probabilities_ and _Show cummulative distribution_)
+
+ answer === "true"}
options={["True", "False"]}
neutralOptions={["I don't understand the question."]}
trials={1}
timeout={10}>
+
+
+ 
+ 
+
Select the top 100 genes that are differentially expressed in cones in comparison to non-cones (T-test). Forward them to the GO widget. Sort the lower list by increasing p-value.