diff --git a/docs/Association_Testing_Using_VCM.ipynb b/docs/Association_Testing_Using_VCM.ipynb new file mode 100644 index 0000000..36a347f --- /dev/null +++ b/docs/Association_Testing_Using_VCM.ipynb @@ -0,0 +1,582 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Association Testing Using Variance Component Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook, we read in the Mendel Option 29 (Ped-GWAS) data and demonstrate association study using our most associated SNP and trait2. Is a consequence of the analysis we can also estimate the heritability. If you want to test your Julia skills, try changing the code to (1) run another snp (or a set of snps); (2) run the same analysis with trait 1 or (3) run the bivariate analysis (both traits). Note that for the bivariate analysis concept of heritability is not well defined so omit that part. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Acknowledgement: Hua Zhou wrote the vast majority of this notebook with a little tweaking by Janet Sinsheimer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data files\n", + "\n", + "We start from the following 3 files from [Mendel Option 29 (Ped-GWAS) example](https://www.genetics.ucla.edu/software/Mendel_current_doc.pdf#page=294). Following shell commands assumes MacOS or Linux environment. Julia commands should run regardless of OS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + ";ls -l Ped29c.in SNP_data29a.bin SNP_def29a.in" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because SnpArray function requires input file name ending in .bed rather than .bin, we create a symbolic link SNP_data29a.bed to SNP_data29a.bin. (If you have trouble with getting this command to work on your computer you can copy the file outside of julia).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + ";ln -s ./SNP_data29a.bin ./SNP_data29a.bed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + ";ls" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read in Mendel Option 29 data\n", + "\n", + "Take a look at the first 10 lines of the pedigree file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + ";head Ped29c.in" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the pedigree file. This file is in the classic Mendel format, Family Id, Person ID, Father ID, Mother Id, sex as F (female) or M (male), monozygotic twin indicator, simtrait1 and simtrait2. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# columns are: :famid, :id, :moid, :faid, :sex, :twin, :simtrait1, :simtrait2, :group\n", + "ped29c = readcsv(\"Ped29c.in\", Any; header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't need to retain the ids so we retrieve the two phenotype data and put them in an array Y." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "simtrait1 = convert(Vector{Float64}, ped29c[:, 7])\n", + "simtrait2 = convert(Vector{Float64}, ped29c[:, 8])\n", + "Y = [simtrait1 simtrait2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Retrieve sex data coded as 0 (male) or 1 (female) so male is the reference group." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sex = map(x -> strip(x) == \"F\"? 1.0 : 0.0, ped29c[:, 5])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Take a look at the first 10 lines of the SNP definition file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + ";head SNP_def29a.in" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the SNP definition file, skipping the first 2 lines." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# columns are: :snpid, :chrom, :pos, :allele1, :allele2, :groupname\n", + "snpdef29c = readcsv(\"SNP_def29a.in\", Any; skipstart = 2, header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will be analyzing SNPs one at a time so we don't need the position of the snps just the SNP IDs so we retrieve SNP IDs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "snpid = map(x -> strip(string(x)), snpdef29c[:, 1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the SNP binary file using the SnpArray.jl package." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "using SnpArrays\n", + "\n", + "snpbin29a = SnpArray(\"SNP_data29a\"; people = size(ped29c, 1), snps = size(snpdef29c, 1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Kinship via Genetic Relationship Matrix (GRM)\n", + "\n", + "Recall that in using variance components (linear mixed models) we need a measure of the relatedness among individuals. Under the GRM formulation, the estimate of the global kinship coefficient of individuals $i$ and $j$ is\n", + "$$ \\widehat\\Phi_{GRMij}^ = \\frac{1}{2S} \\sum_{k=1}^S \\frac{(x_{ik} -2p_k)(x_{jk} - 2p_k)}{2 p_k (1-p_k)}$$,\n", + "where $k$ ranges over the selected $S$ SNPs, $p_k$ is the minor allele frequency of SNP $k$, and $x_{ik}$ is the number of minor alleles in individual $i$s genotype at SNP $k$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Calculate the GRM matrix\n", + "\n", + "By default, `grm` excludes SNPs with maf < 0.01." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Φgrm = grm(snpbin29a; method = :GRM)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Fit the null variance component model\n", + "\n", + "Recall that we are using a variance component model with simtrait2 as the outcome. Under the null hypothesis simtrait2 is associated with sex (as a fixed effect). We also need to account for the relatedness among individuals. To do that we include a random effect and use the GRM matrix to describe the covariation structure. \n", + " $$ Y_{2i} = \\mu +\\beta_{sex} sex_i + A_i + e_i$$ \n", + " $$ A_i \\sim N(0,\\sigma^2_a)$$ $$e_i \\sim N(0,\\sigma^2_e)$$\n", + " $$ Cov(Y_{2i},Y_{2j})=2\\Phi_{ij} \\sigma^2_a + 1_{i = j}\\sigma^2_e$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "using VarianceComponentModels\n", + "\n", + "# form data as VarianceComponentVariate\n", + "X = [ones(length(simtrait1)) sex]\n", + "#change this next command if you want to run trait 1 or both traits (Y)\n", + "nulldata = VarianceComponentVariate(Y[:,2], X, (2Φgrm, eye(length(simtrait2))))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When we run the alternative model, it can be helpful to start from our best estimates from the null model. Initialize the variance component model parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nullmodel = VarianceComponentModel(nulldata)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@time nulllogl, nullmodel, = fit_mle!(nullmodel, nulldata; algo = :FS)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# null model log-likelihood\n", + "nulllogl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# null model mean effects\n", + "nullmodel.B" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# null model additive genetic variance\n", + "nullmodel.Σ[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# null model environmental variance\n", + "nullmodel.Σ[2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Heritability \n", + "Calculate the proportion of the variance that can be attributed to additive genetic effects, the narrow sense heritability. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "her_null = nullmodel.Σ[1]/(nullmodel.Σ[1]+nullmodel.Σ[2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Fit variance component model with the causal SNP\n", + "\n", + "These data were simulated under a scenario so that a male has a value of 20 and a female has a value of 16. The trait is simulated with a major locus, rs10412915, with an additive effect of 1.5 per minor allele such that a heterozygote male has a value of 20. There is also a strong residual genetic variation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ind_rs10412915 = find(x -> x == \"rs10412915\", snpid)[1]\n", + "# Use can change this SNP if you would like to assess another's snps effect on the trait, e.g.:\n", + "#ind_rs56343121 = find(x -> x == \"rs56343121\", snpid)[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "snp_rs10412915 = convert(Vector{Float64}, snpbin29a[:, ind_rs10412915])\n", + "#snp_rs56343121 = convert(Vector{Float64}, snpbin29a[:, ind_rs56343121])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# form data as VarianceComponentVariate\n", + "Xalt = [ones(length(simtrait2)) sex snp_rs10412915]\n", + "#Xalt = [ones(length(simtrait1)) sex snp_rs56343121]\n", + "altdata = VarianceComponentVariate(Y[:,2], Xalt, (2Φgrm, eye(length(simtrait2))))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "altmodel = VarianceComponentModel(altdata)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set the starting values for the maximum likelihood estimation\n", + "Use the null model estimates as start values for the alternative model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "altmodel.B[1:2, :] = nullmodel.B\n", + "altmodel.B" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "copy!(altmodel.Σ[1], nullmodel.Σ[1])\n", + "copy!(altmodel.Σ[2], nullmodel.Σ[2])\n", + "altmodel.Σ" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@time altlogl, altmodel, = fit_mle!(altmodel, altdata; algo = :FS)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# alt model log-likelihood\n", + "altlogl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# alt model mean effects\n", + "altmodel.B" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# alt model additive genetic variance\n", + "altmodel.Σ[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# alt model environmental variance\n", + "altmodel.Σ[2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To test the significance of the SNP, we use LRT" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "using Distributions\n", + "LRT=2(altlogl - nulllogl)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#change the degrees of freedom if running a bivariate outcome\n", + "pval_rs10412915 = ccdf(Chisq(1), LRT)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Residual Heritability. The proportion of additive genetic variation remaining after including the SNP in the model. Note that heritability is difficult to describe for a bivariate outcome so it is usually not provided. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ignore if running a bivariate outcome\n", + "her_alt=altmodel.Σ[1]/(altmodel.Σ[1]+altmodel.Σ[2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Portion of the genetic variation explained by the snp is a measure of the effect of the snp on a signal trait. Omit if running a bivariate trait. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "add_proport=(nullmodel.Σ[1]-altmodel.Σ[1])/nullmodel.Σ[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Portion of total variation explained by the snp is an alterative to the above. Omit if running a bivariate trait. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pheno_proport=(nullmodel.Σ[1]+nullmodel.Σ[2]-altmodel.Σ[1]-altmodel.Σ[2])/(nullmodel.Σ[1]+nullmodel.Σ[2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Julia 0.6.2", + "language": "julia", + "name": "julia-0.6" + }, + "language_info": { + "file_extension": ".jl", + "mimetype": "application/julia", + "name": "julia", + "version": "0.6.2" + }, + "toc": { + "colors": { + "hover_highlight": "#DAA520", + "running_highlight": "#FF0000", + "selected_highlight": "#FFD700" + }, + "moveMenuLeft": true, + "nav_menu": { + "height": "30px", + "width": "252px" + }, + "navigate_menu": true, + "number_sections": true, + "sideBar": true, + "threshold": 4, + "toc_cell": false, + "toc_section_display": "block", + "toc_window_display": false, + "widenNotebook": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/Heritability.ipynb b/docs/Heritability.ipynb new file mode 100644 index 0000000..71d6316 --- /dev/null +++ b/docs/Heritability.ipynb @@ -0,0 +1,1628 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Heritability Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an application of the variance component model, this note demonstrates the workflow for heritability analysis in genetics, using a sample data set `SNP_29C` with **212** individuals and **253,141** SNPs. \n", + "\n", + "`SNP_29C.bed`, `SNP_29C.bim`, and `SNP_29C.fam` is a set of Plink files in binary format. `SNP_29Ctraitdata.txt` contains 2 phenotypes of the 212 individuals, column 1 identifies the individuals family ID, column 2 individual ID, and columns 3 and 4 are traits 1 and 2 respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `SNP_29C` dataset:\n", + "\n", + "Simulated with realistic linkage disequilibrium (LD) structure and constructed from phased sequence data from chromosome 19 on 85 individuals of northern and western European ancestry. After removing mono-allelic markers this set of in- dividuals, 253,141 SNPs remained. Almost half of the SNPs have minor allele frequencies (MAF) below 5%. The haplotype pairs attributed to the 85 CEPH members were reas- signed to the 85 founders of 27 pedigree structures selected from the Framingham Heart Study (FHS, http://www.framinghamheartstudy.org). The selected Framingham pedi- grees were chosen to reflect the kind of pedigrees commonly collected in family-based genetic studies. The 27 pedigrees encompass 212 people, range in size from 1 to 36 peo- ple and from 1 to 5 generations, and contain sibships of 1 to 5 children. The genotypes of non-founders were simulated, using Option 17, conditional on the haplotypes imposed on the founders. All genotypes were recorded as unordered for subsequent analyses." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SNP_29C.bed\n", + "SNP_29C.bim\n", + "SNP_29C.fam\n", + "SNP_29Ctraitdata.csv\n" + ] + } + ], + "source": [ + ";ls \"SNP_29C*.*\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Machine information:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Julia Version 0.6.0\n", + "Commit 903644385b (2017-06-19 13:05 UTC)\n", + "Platform Info:\n", + " OS: macOS (x86_64-apple-darwin13.4.0)\n", + " CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz\n", + " WORD_SIZE: 64\n", + " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)\n", + " LAPACK: libopenblas64_\n", + " LIBM: libopenlibm\n", + " LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)\n" + ] + } + ], + "source": [ + "versioninfo()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read in binary SNP data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will use the [`SnpArrays.jl`](https://github.com/OpenMendel/SnpArrays.jl) package to read in binary SNP data and compute the empirical kinship matrix. Issue \n", + "```julia\n", + "Pkg.clone(\"https://github.com/OpenMendel/SnpArrays.jl.git\")\n", + "```\n", + "within `Julia` to install the `SnpArrays` package." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "using SnpArrays" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.609499 seconds (220.93 k allocations: 24.445 MiB, 1.32% gc time)\n" + ] + }, + { + "data": { + "text/plain": [ + "212×253141 SnpArrays.SnpArray{2}:\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) … (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " ⋮ ⋱ ⋮ \n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# read in genotype data from Plink binary file (~50 secs on my laptop)\n", + "@time SNP_29C = SnpArray(\"SNP_29C\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary statistics of SNP data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(212, 253141)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "people, snps = size(SNP_29C)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.410520 seconds (19.62 k allocations: 4.798 MiB, 1.75% gc time)\n" + ] + } + ], + "source": [ + "# summary statistics (~50 secs on my laptop)\n", + "@time maf, _, missings_by_snp, = summarize(SNP_29C);" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "([0.00235849 0.0117925 … 0.228774 0.5], 0.13381461170905357)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 5 number summary and average MAF (minor allele frequencies)\n", + "quantile(maf, [0.0 .25 .5 .75 1.0]), mean(maf)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Pkg.add(\"Plots\")\n", + "# Pkg.add(\"PyPlot\")\n", + "using Plots\n", + "pyplot()\n", + "\n", + "histogram(maf, xlab = \"Minor Allele Frequency (MAF)\", label = \"MAF\")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# proportion of missing genotypes\n", + "sum(missings_by_snp) / length(SNP_29C)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.4558724189285813" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# proportion of rare SNPs with maf < 0.05\n", + "countnz(maf .< 0.05) / length(maf)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Empirical Kinship Matrix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we estimate the empirical kinships based on all 253,141 SNPs by Genetic Relationship Matrix.\n", + "\n", + "Both the GRM and the MoM methods are fairly quick to calculate and provide good estimates for the kinship coefficients, given reasonably dense genome-wide data. When using the GRM method, very rare SNPs should not be used since they become overweighted. In general, one can think of the GRM method centering and scaling each genotype, while the MoM method uses the raw genotypes and then centers and scales the final result." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Kinship via Genetic Relationship Matrix (GRM)\n", + "\n", + "Under the GRM formulation, the estimate of the global kinship coefficient of individuals $i$ and $j$ is\n", + "$$ \\widehat\\Phi_{GRMij}^ = \\frac{1}{2S} \\sum_{k=1}^S \\frac{(x_{ik} -2p_k)(x_{jk} - 2p_k)}{2 p_k (1-p_k)}$$,\n", + "where $k$ ranges over the selected $S$ SNPs, $p_k$ is the minor allele frequency of SNP $k$, and $x_{ik}$ is the number of minor alleles in individual $i$s genotype at SNP $k$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is often necessary to filter SNPs according to minor allele frequency and LD before calculating empirical kinship matrix. By default, the \"grm\" function exlcudes SNPs with minor allele frequency below 0.01. This can be changed by the keyword argument maf_threshold.\n", + "\n", + "In this example we sample every fifth SNP, and exclude very rare SNPs (those with minor allele counts less than 3) since they become over weighted. Thus we set the maf_threshold = 3/ 212/ 2 = 0.0071." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 3.330394 seconds (20.67 M allocations: 484.741 MiB, 12.06% gc time)\n" + ] + }, + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.503736 0.00450238 0.00560925 … 0.0202415 -0.00146227\n", + " 0.00450238 0.514562 -0.0192975 -0.0211945 -0.0190527 \n", + " 0.00560925 -0.0192975 0.492619 -0.0150127 -0.00337255\n", + " 0.248505 -0.00189301 0.262585 0.000939722 -0.00397642\n", + " 0.123444 0.26985 0.120157 -0.011537 -0.00704346\n", + " -0.00873656 -0.000536772 -0.00104013 … 0.00274189 0.0117686 \n", + " -0.0122605 0.00754381 -0.0084298 -0.0109589 0.00124671\n", + " -0.0116844 -0.00831492 -0.0103027 -0.010893 -0.00465248\n", + " -0.00847343 0.0016468 -0.0120892 -0.0212743 -0.00557705\n", + " -0.00893916 -0.006521 -0.0138499 -0.00648165 -0.0127073 \n", + " -0.0107549 0.00852362 0.00136741 … -0.00404222 0.011256 \n", + " -0.0112584 0.00481192 -0.00523649 -0.00627449 0.0065575 \n", + " -0.0194832 0.000914529 -0.011264 0.00177211 0.0166304 \n", + " ⋮ ⋱ ⋮ \n", + " 0.00100953 -0.0112222 -0.0133715 … 0.00528263 -0.019073 \n", + " 0.00484842 0.000194152 -0.0141461 0.000918254 0.00240293\n", + " -0.0142276 -0.0248316 -0.00288667 -0.0132311 -0.0133404 \n", + " -0.0163315 -0.0137613 -0.0170462 -0.0206065 0.0046778 \n", + " -0.0029135 0.0130943 -0.00471949 -0.00479935 -0.00691155\n", + " 0.000657715 -0.000730383 -0.00182715 … 0.0242018 0.0256279 \n", + " -0.012718 -0.00525706 -0.000972297 -0.0115407 0.0083822 \n", + " 0.0144501 0.0174049 0.0181185 0.00445693 0.00695606\n", + " -0.00967986 0.00129945 0.0202481 0.00491677 -0.0170569 \n", + " 0.00195329 -0.0141308 0.0122068 0.00477851 0.00815976\n", + " 0.0202415 -0.0211945 -0.0150127 … 0.5561 0.0402404 \n", + " -0.00146227 -0.0190527 -0.00337255 0.0402404 0.508551 " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@time ΦGRM = grm(SNP_29C[:, 1:5:end]; method = :GRM, maf_threshold = 0.0071)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Kinship via Method of Moments (MoM)\n", + "Using the Method of Moments (MoM) formula, the estimate of the global kinship coefficient of individuals $i$ and $j$ is\n", + "$$ \\widehat \\Phi_{MoMij} = \\frac{e_{ij} - \\sum_{k=1}^S \\left[p_k^2 + (1-p_k)^2 \\right]}{S - \\sum_{k=1}^S\\left[p_k^2 + (1-p_k)^2\\right]}, $$\n", + "where\n", + "$$ e_{ij} = \\frac{1}{4} \\sum_{k=1}^S \\left[ x_{ik}x_{jk} + (2-x_{ik})(2-x_{jk}) \\right] $$\n", + "is the observed fraction of alleles identical by state (IBS) between $i$ and $j$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!!! note\n", + "\n", + " It is important to note that sometimes MoM formulation does not guarantee Positive Semi-Definiteness. Thus, for the rest of this example we will use the GRM formulation of the empirical Kinship coefficients. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 3.064000 seconds (85.80 M allocations: 1.604 GiB, 15.40% gc time)\n" + ] + }, + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.506634 0.00429712 0.0260383 … -0.00798769 0.00311508 \n", + " 0.00429712 0.507848 -0.0168319 -0.0591744 -0.0186789 \n", + " 0.0260383 -0.0168319 0.504523 -0.0498764 0.000360494\n", + " 0.243028 -0.0193543 0.268178 -0.0446627 -0.0185839 \n", + " 0.111198 0.241117 0.115525 -0.0691584 -0.0173807 \n", + " -0.005898 0.004371 0.00643958 … -0.0297183 0.0214895 \n", + " -0.0165786 0.00881422 -0.00794547 -0.0459819 -0.00239409 \n", + " -0.0109111 -0.00265794 -0.0128531 -0.0387419 0.00140534 \n", + " -0.00260517 0.0155688 0.00462429 -0.0568525 0.00846594 \n", + " -0.0177184 -0.0109639 -0.0168741 -0.0413593 -0.0171169 \n", + " -0.0187949 0.00050825 0.00475094 … -0.0519238 0.00726278 \n", + " -0.00375555 0.0086348 0.00653456 -0.0407472 0.0139012 \n", + " -0.0263305 0.00670342 -0.00597188 -0.0301827 0.0178484 \n", + " ⋮ ⋱ ⋮ \n", + " 0.00241852 0.0075794 -0.00946524 … -0.0122937 -0.0145734 \n", + " 0.00189082 0.00246073 -0.0205258 -0.0350903 0.000634897\n", + " -0.0070906 -0.0178029 0.00851871 -0.0438923 -0.00790326 \n", + " -0.0211063 -0.0141195 -0.0225733 -0.059491 -0.00670011 \n", + " 0.00405438 0.0218061 0.00382219 -0.0314175 -0.000832103\n", + " -0.010014 -0.0219189 -0.0149322 … -0.0239136 0.00100429 \n", + " -0.0204625 -0.0118399 -0.0052331 -0.0563037 0.000856531\n", + " 0.0253101 0.0251095 0.0249196 -0.018014 0.0256794 \n", + " -0.0185311 -0.01832 0.0242441 -0.0370639 -0.0313753 \n", + " 0.00256627 -0.00921195 0.0223233 -0.024093 0.0219856 \n", + " -0.00798769 -0.0591744 -0.0498764 … 0.500565 0.00406493 \n", + " 0.00311508 -0.0186789 0.000360494 0.00406493 0.494434 " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@time ΦMoM = grm(SNP_29C; method = :MoM)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## User Supplied Kinship Matrix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the case where the kinship matrix has already been calculated, we can read in an existing GRM file.\n", + "\n", + "It is important to be mindful of what type of format your kinship matrix is in, as the command to read in different file types may change. In this example we use a Comma-Separated-Values (CSV) file, ΦGRM_ped.csv." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Theoretical Kinship Matrix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Classic Mendel was used to calculate the theoretical kinships based on the pedigree structure.\n", + "The documentation on the theoretical kinship formulation can be found in Chapter 5 of \"Mathematical and Statistical Methods for Genetic Analysis\" (1997), Dr. Kenneth Lange.\n", + "\n", + "https://books.google.com/books?id=QYqeYTftPNwC&lpg=PP1&pg=PA83#v=onepage&q&f=false" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!!! note\n", + "\n", + " All missing genotypes in this file must be imputed before any further analysis" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.5 0.0 0.0 0.25 0.125 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.5 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.5 0.25 0.125 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.25 0.0 0.25 0.5 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.125 0.25 0.125 0.25 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.5 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.25 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.125 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " ⋮ ⋮ ⋱ ⋮ ⋮ \n", + " 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 … 0.5 0.0 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.5 0.0\n", + " 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ΦGRM_ped = readcsv(\"kinship_ped29c.csv\", Float64; header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Phenotypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the phenotype data and compute descriptive statistics." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mRecompiling stale cache file /Users/sarahji/.julia/lib/v0.6/DataArrays.ji for module DataArrays.\n", + "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mRecompiling stale cache file /Users/sarahji/.julia/lib/v0.6/DataFrames.ji for module DataFrames.\n", + "\u001b[39m" + ] + }, + { + "data": { + "text/plain": [ + "(212, 5)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Pkg.add(\"DataFrames\")\n", + "using DataFrames\n", + "\n", + "SNP_29C_trait = readtable(\n", + " \"SNP_29Ctraitdata.csv\"; header = false,\n", + " separator = ',',\n", + " names = [:FID; :IID; :SEX; :Trait1; :Trait2], \n", + " eltypes = [Int; Int; Int; Float64; Float64]\n", + " )\n", + "\n", + "size(SNP_29C_trait)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
FIDIIDSEXTrait1Trait2
1116230.205649.2421
218228235.8214315.27458
3117008136.0529819.50496
419218138.9635118.98575
513226233.7391121.10412
6229234.8883519.01142
722294137.7010519.16556
823416145.1317119.84088
9217893235.1559914.14228
1026952142.4513619.92713
11214695235.642617.41914
1226790140.6344123.6845
1323916234.8618316.86836
14339237.2552516.39703
1534521233.0975115.32872
1638366137.9160320.66932
17316693234.8389617.66506
18321688136.2021921.77733
19325532230.9314317.80424
20326294141.5898722.07719
21316795231.555217.79886
22317445135.594720.01312
2332039136.0480317.7814
2432831139.1821720.74701
25454234.733949.88673
2645072137.5677514.21644
27417240232.9781312.98087
28570240.2271621.07721
29524010141.7586824.1248
30521999145.4634727.55542
" + ], + "text/plain": [ + "212×5 DataFrames.DataFrame\n", + "│ Row │ FID │ IID │ SEX │ Trait1 │ Trait2 │\n", + "├─────┼───────┼───────┼─────┼─────────┼─────────┤\n", + "│ 1 │ 1 │ 16 │ 2 │ 30.2056 │ 9.2421 │\n", + "│ 2 │ 1 │ 8228 │ 2 │ 35.8214 │ 15.2746 │\n", + "│ 3 │ 1 │ 17008 │ 1 │ 36.053 │ 19.505 │\n", + "│ 4 │ 1 │ 9218 │ 1 │ 38.9635 │ 18.9857 │\n", + "│ 5 │ 1 │ 3226 │ 2 │ 33.7391 │ 21.1041 │\n", + "│ 6 │ 2 │ 29 │ 2 │ 34.8884 │ 19.0114 │\n", + "│ 7 │ 2 │ 2294 │ 1 │ 37.7011 │ 19.1656 │\n", + "│ 8 │ 2 │ 3416 │ 1 │ 45.1317 │ 19.8409 │\n", + "│ 9 │ 2 │ 17893 │ 2 │ 35.156 │ 14.1423 │\n", + "│ 10 │ 2 │ 6952 │ 1 │ 42.4514 │ 19.9271 │\n", + "│ 11 │ 2 │ 14695 │ 2 │ 35.6426 │ 17.4191 │\n", + "⋮\n", + "│ 201 │ 31 │ 9277 │ 1 │ 40.0522 │ 21.5122 │\n", + "│ 202 │ 31 │ 16139 │ 1 │ 39.3161 │ 24.8508 │\n", + "│ 203 │ 31 │ 10439 │ 1 │ 41.7913 │ 22.5294 │\n", + "│ 204 │ 31 │ 63 │ 2 │ 36.3301 │ 17.0813 │\n", + "│ 205 │ 10006 │ 66 │ 1 │ 42.9442 │ 17.1984 │\n", + "│ 206 │ 10008 │ 92 │ 1 │ 39.8927 │ 20.9043 │\n", + "│ 207 │ 10014 │ 186 │ 1 │ 42.5795 │ 15.9365 │\n", + "│ 208 │ 10027 │ 374 │ 1 │ 47.8619 │ 19.8943 │\n", + "│ 209 │ 10029 │ 434 │ 1 │ 41.0531 │ 25.1045 │\n", + "│ 210 │ 10033 │ 333 │ 1 │ 39.9502 │ 19.7227 │\n", + "│ 211 │ 10040 │ 234 │ 1 │ 35.4778 │ 21.935 │\n", + "│ 212 │ 10045 │ 789 │ 1 │ 44.3932 │ 26.1222 │" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "SNP_29C_trait" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Univariate Phenotype Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For both of the phenotypic traits, first check summary statistics and distribution-patterns to ensure the proper statistical methods for our analysis. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!!! note\n", + "\n", + " With a small sample size, it is important to look for any potential outliers or influential observations as they may influence your results greatly. Additionally, it is important to consider the small sample size when looking at histograms. " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trait1\n", + "Summary Stats:\n", + "Mean: 37.860176\n", + "Minimum: 29.240310\n", + "1st Quartile: 34.694473\n", + "Median: 37.653830\n", + "3rd Quartile: 41.589585\n", + "Maximum: 47.861930\n", + "Length: 212\n", + "Type: Float64\n", + "Number Missing: 0\n", + "% Missing: 0.000000\n", + "\n", + "Trait2\n", + "Summary Stats:\n", + "Mean: 18.471994\n", + "Minimum: 9.242100\n", + "1st Quartile: 15.769725\n", + "Median: 18.534045\n", + "3rd Quartile: 20.840625\n", + "Maximum: 27.555420\n", + "Length: 212\n", + "Type: Float64\n", + "Number Missing: 0\n", + "% Missing: 0.000000\n", + "\n" + ] + } + ], + "source": [ + "describe(SNP_29C_trait[:, 4:end])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Y = convert(Matrix{Float64}, SNP_29C_trait[:, 4:end])\n", + "histogram(Y, layout = 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pre-processing data for heritability analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### Inverse-Normal Transformation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, we consider a Rank-Based Inverse Normal Blom Transformations to remedy non-Normal Trait Values." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "elapsed time: 0.06404246 seconds\n" + ] + }, + { + "data": { + "text/plain": [ + "0.06404246" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "using Rmath\n", + "\n", + "c = 3/8 # continuity correction c = 3/8\n", + "N = size(Y)[1]\n", + "\n", + "Yt = zeros(Matrix{Float64}(size(Y)[1],size(Y)[2]))\n", + "\n", + "tic()\n", + "for i in 1:size(Y)[2]\n", + " for j in 1:size(Y)[1]\n", + " value = (tiedrank(Y[:,i]) - c)/(N - 2c + 1)\n", + " qz = qnorm.(value)\n", + " Yt[j, i] = qz[j]\n", + " end\n", + "end\n", + "toc() " + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "histogram(Yt, layout = 2 )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To prepare variance component model fitting, we form an instance of `VarianceComponentVariate`. The variance components are $(2\\Phi, I)$." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "WARNING: deprecated syntax \"inner constructor VarianceComponentModel(...) around /Users/sarahji/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:41\".\n", + "Use \"VarianceComponentModel{T,M,BT,ΣT}(...) where {T,M,BT,ΣT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor TwoVarCompModelRotate(...) around /Users/sarahji/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:91\".\n", + "Use \"TwoVarCompModelRotate{T,BT}(...) where {T,BT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor VarianceComponentVariate(...) around /Users/sarahji/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:149\".\n", + "Use \"VarianceComponentVariate{T,M,YT,XT,VT}(...) where {T,M,YT,XT,VT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor TwoVarCompVariateRotate(...) around /Users/sarahji/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:204\".\n", + "Use \"TwoVarCompVariateRotate{T,YT,XT}(...) where {T,YT,XT}\" instead.\n" + ] + }, + { + "data": { + "text/plain": [ + "3-element Array{Symbol,1}:\n", + " :Y\n", + " :X\n", + " :V" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "using VarianceComponentModels\n", + "using VarianceComponentModels\n", + "sex = convert(Vector{Float64}, SNP_29C_trait[:, :SEX])\n", + "sex = sex - 1\n", + "X = [ones(length(sex)) sex]\n", + "# form data as VarianceComponentVariate\n", + "SNP_29Cdata_emp = VarianceComponentVariate(Y, X, (2ΦGRM, eye(length(Y))))\n", + "fieldnames(SNP_29Cdata_emp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before fitting the variance component model, we pre-compute the eigen-decomposition of $2\\Phi_{\\text{GRM}}$, the rotated responses, and the constant part in log-likelihood, and store them as a `TwoVarCompVariateRotate` instance, which is re-used in various variane component estimation procedures." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.843530 seconds (407.42 k allocations: 21.374 MiB, 1.71% gc time)\n" + ] + }, + { + "data": { + "text/plain": [ + "5-element Array{Symbol,1}:\n", + " :Yrot \n", + " :Xrot \n", + " :eigval \n", + " :eigvec \n", + " :logdetV2" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@time SNP_29Cdata_rotated_emp = TwoVarCompVariateRotate(SNP_29Cdata_emp);\n", + "fieldnames(SNP_29Cdata_rotated_emp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Heritability of Single Traits" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Heritability via Empirical Kinship Matrix with Sex as a Covariate" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We use Fisher scoring algorithm to fit variance component model for each single trait." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trait1\n", + "(σ2a[j], σ2e[j]) = (4.1558406725548895, 2.0745023385298738)\n", + "Trait2\n", + "(σ2a[j], σ2e[j]) = (4.462117090728428, 1.9981701701240648)\n", + "elapsed time: 2.855807793 seconds\n" + ] + }, + { + "data": { + "text/plain": [ + "2.855807793" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "using VarianceComponentModels\n", + "hST_emp = zeros(2)\n", + "# standard errors of estimated heritability\n", + "hST_se_emp = zeros(2)\n", + "# additive genetic effects\n", + "σ2a = zeros(2)\n", + "# enviromental effects\n", + "σ2e = zeros(2)\n", + "\n", + "tic()\n", + "for j in 1:2\n", + "println(names(SNP_29C_trait)[j + 3])\n", + "traitj_data = TwoVarCompVariateRotate(SNP_29Cdata_rotated_emp.Yrot[:, j], SNP_29Cdata_rotated_emp.Xrot, SNP_29Cdata_rotated_emp.eigval, SNP_29Cdata_rotated_emp.eigvec, SNP_29Cdata_rotated_emp.logdetV2)\n", + " # initialize model parameters\n", + "traitj_model = VarianceComponentModel(traitj_data)\n", + " # estimate variance components\n", + "_, _, _, Σcov, _, _ = mle_fs!(traitj_model, traitj_data; solver=:Ipopt, verbose=false)\n", + "σ2a[j] = traitj_model.Σ[1][1]\n", + "σ2e[j] = traitj_model.Σ[2][1]\n", + "@show σ2a[j], σ2e[j]\n", + "h, hse = heritability(traitj_model.Σ, Σcov)\n", + "hST_emp[j] = h[1]\n", + "hST_se_emp[j] = hse[1]\n", + "end\n", + "toc()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2×2 Array{Float64,2}:\n", + " 0.667032 0.690699 \n", + " 0.0761146 0.0699609" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Heritability_GRM_sex = [hST_emp'; hST_se_emp']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Heritability via Theoretical Kinship Matrix with Sex as a Covariate" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "SNP_29Cdata_ped = VarianceComponentVariate(Y, X, (2ΦGRM_ped, eye(length(Y))))\n", + "SNP_29Cdata_rotated_ped = TwoVarCompVariateRotate(SNP_29Cdata_ped);" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trait1\n", + "(σ2a[j], σ2e[j]) = (3.2750662710799565, 3.1569249886262725)\n", + "Trait2\n", + "(σ2a[j], σ2e[j]) = (3.008905077008311, 3.5727082358711293)\n", + "elapsed time: 0.211059102 seconds\n" + ] + }, + { + "data": { + "text/plain": [ + "0.211059102" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hST_peds = zeros(2)\n", + "# standard errors of estimated heritability\n", + "hST_se_peds = zeros(2)\n", + "# additive genetic effects\n", + "σ2a = zeros(2)\n", + "# enviromental effects\n", + "σ2e = zeros(2)\n", + "\n", + "tic()\n", + "for j in 1:2\n", + "println(names(SNP_29C_trait)[j + 3])\n", + "traitj_data = TwoVarCompVariateRotate(SNP_29Cdata_rotated_ped.Yrot[:, j], SNP_29Cdata_rotated_ped.Xrot, SNP_29Cdata_rotated_ped.eigval, SNP_29Cdata_rotated_ped.eigvec, SNP_29Cdata_rotated_ped.logdetV2)\n", + " # initialize model parameters\n", + "traitj_model = VarianceComponentModel(traitj_data)\n", + " # estimate variance components\n", + "_, _, _, Σcov, _, _ = mle_fs!(traitj_model, traitj_data; solver=:Ipopt, verbose=false)\n", + "σ2a[j] = traitj_model.Σ[1][1]\n", + "σ2e[j] = traitj_model.Σ[2][1]\n", + "@show σ2a[j], σ2e[j]\n", + "h, hse = heritability(traitj_model.Σ, Σcov)\n", + "hST_peds[j] = h[1]\n", + "hST_se_peds[j] = hse[1]\n", + "end\n", + "toc()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2×2 Array{Float64,2}:\n", + " 0.509184 0.457168\n", + " 0.124119 0.137528" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Heritability_ped_sex = [hST_peds'; hST_se_peds']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pairwise traits\n", + "\n", + "Joint analysis of multiple traits is subject to intensive research recently. Following code snippet does joint analysis of all pairs of traits." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Joint Analysis via Empirical Kinship Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trait1Trait2\n", + "This is Ipopt version 3.12.4, running with linear solver mumps.\n", + "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", + "\n", + "Number of nonzeros in equality constraint Jacobian...: 0\n", + "Number of nonzeros in inequality constraint Jacobian.: 0\n", + "Number of nonzeros in Lagrangian Hessian.............: 21\n", + "\n", + "Total number of variables............................: 6\n", + " variables with only lower bounds: 0\n", + " variables with lower and upper bounds: 0\n", + " variables with only upper bounds: 0\n", + "Total number of equality constraints.................: 0\n", + "Total number of inequality constraints...............: 0\n", + " inequality constraints with only lower bounds: 0\n", + " inequality constraints with lower and upper bounds: 0\n", + " inequality constraints with only upper bounds: 0\n", + "\n", + "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", + " 0 1.1400404e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 \n", + " 5 9.5032051e+02 0.00e+00 7.24e-01 -11.0 8.90e-02 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 10 9.5030788e+02 0.00e+00 7.62e-03 -11.0 6.29e-04 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 15 9.5030788e+02 0.00e+00 2.47e-04 -11.0 2.13e-05 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 20 9.5030788e+02 0.00e+00 8.03e-06 -11.0 7.23e-07 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 25 9.5030788e+02 0.00e+00 2.63e-07 -11.0 2.47e-08 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 30 9.5030788e+02 0.00e+00 8.64e-09 -11.0 8.43e-10 - 1.00e+00 1.00e+00f 1 MaxS\n", + "\n", + "Number of Iterations....: 30\n", + "\n", + " (scaled) (unscaled)\n", + "Objective...............: 4.1443698228928793e+02 9.5030787995026230e+02\n", + "Dual infeasibility......: 8.6386134406442122e-09 1.9808421485798065e-08\n", + "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Overall NLP error.......: 8.6386134406442122e-09 1.9808421485798065e-08\n", + "\n", + "\n", + "Number of objective function evaluations = 31\n", + "Number of objective gradient evaluations = 31\n", + "Number of equality constraint evaluations = 0\n", + "Number of inequality constraint evaluations = 0\n", + "Number of equality constraint Jacobian evaluations = 0\n", + "Number of inequality constraint Jacobian evaluations = 0\n", + "Number of Lagrangian Hessian evaluations = 30\n", + "Total CPU secs in IPOPT (w/o function evaluations) = 0.198\n", + "Total CPU secs in NLP function evaluations = 0.250\n", + "\n", + "EXIT: Optimal Solution Found.\n", + "(Σa[i, j], Σe[i, j]) = ([4.126 1.236; 1.236 4.43915], [2.09245 -0.0355116; -0.0355116 2.01215])\n", + "traitij_model.B = [40.893 20.4858; -6.62851 -4.40122]\n", + "elapsed time: 1.21899643 seconds\n" + ] + }, + { + "data": { + "text/plain": [ + "1.21899643" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# additive genetic effects (2x2 psd matrices) from bavariate trait analysis;\n", + "Σa = Array{Matrix{Float64}}(2, 2)\n", + "# environmental effects (2x2 psd matrices) from bavariate trait analysis;\n", + "Σe = Array{Matrix{Float64}}(2, 2)\n", + "\n", + "tic()\n", + "for i in 1:2\n", + " for j in (i+1):2\n", + " println(names(SNP_29C_trait)[i + 3], names(SNP_29C_trait)[j + 3])\n", + " # form data set for (trait1, trait2)\n", + " traitij_data = TwoVarCompVariateRotate(SNP_29Cdata_rotated_emp.Yrot[:, [i;j]], SNP_29Cdata_rotated_emp.Xrot, \n", + " SNP_29Cdata_rotated_emp.eigval, SNP_29Cdata_rotated_emp.eigvec, SNP_29Cdata_rotated_emp.logdetV2)\n", + " # initialize model parameters\n", + " traitij_model = VarianceComponentModel(traitij_data)\n", + " # estimate variance components\n", + " mle_fs!(traitij_model, traitij_data; solver=:Ipopt, verbose=true)\n", + " Σa[i, j] = traitij_model.Σ[1]\n", + " Σe[i, j] = traitij_model.Σ[2]\n", + " @show Σa[i, j], Σe[i, j]\n", + " @show traitij_model.B\n", + " end\n", + "end\n", + "toc()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Joint Analysis via Theoretical Kinship Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trait1Trait2\n", + "(Σa_ped[i, j], Σe_ped[i, j]) = ([3.29155 0.868436; 0.868436 3.04316], [3.14546 0.427296; 0.427296 3.54909])\n", + "elapsed time: 0.192791778 seconds\n" + ] + }, + { + "data": { + "text/plain": [ + "0.192791778" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# additive genetic effects (2x2 psd matrices) from bavariate trait analysis;\n", + "Σa_ped = Array{Matrix{Float64}}(2, 2)\n", + "# environmental effects (2x2 psd matrices) from bavariate trait analysis;\n", + "Σe_ped = Array{Matrix{Float64}}(2, 2)\n", + "\n", + "tic()\n", + "for i in 1:2\n", + " for j in (i+1):2\n", + " println(names(SNP_29C_trait)[i + 3], names(SNP_29C_trait)[j + 3])\n", + " # form data set for (trait1, trait2)\n", + " traitij_data = TwoVarCompVariateRotate(SNP_29Cdata_rotated_ped.Yrot[:, [i;j]], SNP_29Cdata_rotated_ped.Xrot, \n", + " SNP_29Cdata_rotated_ped.eigval, SNP_29Cdata_rotated_ped.eigvec, SNP_29Cdata_rotated_ped.logdetV2)\n", + " # initialize model parameters\n", + " traitij_model = VarianceComponentModel(traitij_data)\n", + " # estimate variance components\n", + " mle_fs!(traitij_model, traitij_data; solver=:Ipopt, verbose=false)\n", + " Σa_ped[i, j] = traitij_model.Σ[1]\n", + " Σe_ped[i, j] = traitij_model.Σ[2]\n", + " @show Σa_ped[i, j], Σe_ped[i, j]\n", + " end\n", + "end\n", + "toc()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## N-joint traits analysis\n", + "\n", + "In some situations, such as studying the genetic covariance, we need to jointly analyze all N > 2 traits. \n", + "We first try the **Fisher scoring algorithm**. (Demo: If SNP_29C dataset had N > 2 traits.)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Joint Analysis via Empirical Kinship Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is Ipopt version 3.12.4, running with linear solver mumps.\n", + "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", + "\n", + "Number of nonzeros in equality constraint Jacobian...: 0\n", + "Number of nonzeros in inequality constraint Jacobian.: 0\n", + "Number of nonzeros in Lagrangian Hessian.............: 21\n", + "\n", + "Total number of variables............................: 6\n", + " variables with only lower bounds: 0\n", + " variables with lower and upper bounds: 0\n", + " variables with only upper bounds: 0\n", + "Total number of equality constraints.................: 0\n", + "Total number of inequality constraints...............: 0\n", + " inequality constraints with only lower bounds: 0\n", + " inequality constraints with lower and upper bounds: 0\n", + " inequality constraints with only upper bounds: 0\n", + "\n", + "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", + " 0 1.1400404e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 \n", + " 5 9.5032051e+02 0.00e+00 7.24e-01 -11.0 8.90e-02 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 10 9.5030788e+02 0.00e+00 7.62e-03 -11.0 6.29e-04 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 15 9.5030788e+02 0.00e+00 2.47e-04 -11.0 2.13e-05 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 20 9.5030788e+02 0.00e+00 8.03e-06 -11.0 7.23e-07 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 25 9.5030788e+02 0.00e+00 2.63e-07 -11.0 2.47e-08 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 30 9.5030788e+02 0.00e+00 8.64e-09 -11.0 8.43e-10 - 1.00e+00 1.00e+00f 1 MaxS\n", + "\n", + "Number of Iterations....: 30\n", + "\n", + " (scaled) (unscaled)\n", + "Objective...............: 4.1443698228928793e+02 9.5030787995026230e+02\n", + "Dual infeasibility......: 8.6386134406442122e-09 1.9808421485798065e-08\n", + "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Overall NLP error.......: 8.6386134406442122e-09 1.9808421485798065e-08\n", + "\n", + "\n", + "Number of objective function evaluations = 31\n", + "Number of objective gradient evaluations = 31\n", + "Number of equality constraint evaluations = 0\n", + "Number of inequality constraint evaluations = 0\n", + "Number of equality constraint Jacobian evaluations = 0\n", + "Number of inequality constraint Jacobian evaluations = 0\n", + "Number of Lagrangian Hessian evaluations = 30\n", + "Total CPU secs in IPOPT (w/o function evaluations) = 0.011\n", + "Total CPU secs in NLP function evaluations = 0.127\n", + "\n", + "EXIT: Optimal Solution Found.\n", + " 0.156838 seconds (40.21 k allocations: 3.386 MiB)\n" + ] + }, + { + "data": { + "text/plain": [ + "(-950.3078799502623, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([40.893 20.4858; -6.62851 -4.40122], ([4.126 1.236; 1.236 4.43915], [2.09245 -0.0355116; -0.0355116 2.01215]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf), ([0.965563 0.984083; 0.984083 1.00285], [0.562486 0.563763; 0.563763 0.564966]), [0.932311 0.1833 … -0.056442 -0.00782437; 0.1833 0.968419 … -0.00782437 -0.0574465; … ; -0.056442 -0.00782437 … 0.317829 0.0321524; -0.00782437 -0.0574465 … 0.0321524 0.319187], [0.186128 0.194696; 0.319451 0.336197], [0.0346436 -0.0466923 0.0121335 -0.0180862; -0.0466923 0.102049 -0.0180862 0.0395286; 0.0121335 -0.0180862 0.0379067 -0.0517159; -0.0180862 0.0395286 -0.0517159 0.113029])" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# initialize model parameters\n", + "traitall_model_emp = VarianceComponentModel(SNP_29Cdata_rotated_emp)\n", + "# estimate variance components using Fisher scoring algorithm\n", + "@time mle_fs!(traitall_model_emp, SNP_29Cdata_rotated_emp; solver=:Ipopt, verbose=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the output we can see the Fisher scoring algorithm converged after ~30 iterations. Let's try the **MM algorithm**." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " MM Algorithm\n", + " Iter Objective \n", + "-------- -------------\n", + " 0 -1.901925e+05\n", + " 1 -1.114684e+03\n", + " 2 -1.014803e+03\n", + " 3 -9.803274e+02\n", + " 4 -9.688978e+02\n", + " 5 -9.645507e+02\n", + " 6 -9.622363e+02\n", + " 7 -9.605340e+02\n", + " 8 -9.590777e+02\n", + " 9 -9.577832e+02\n", + " 10 -9.566347e+02\n", + " 20 -9.512965e+02\n", + " 30 -9.504622e+02\n", + " 40 -9.503336e+02\n", + " 50 -9.503123e+02\n", + " 60 -9.503087e+02\n", + " 70 -9.503080e+02\n", + "\n", + " 0.523347 seconds (247.33 k allocations: 12.507 MiB)\n" + ] + }, + { + "data": { + "text/plain": [ + "(-950.3079289938598, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([40.8929 20.4857; -6.6283 -4.40116], ([4.11795 1.23231; 1.23231 4.43644], [2.0972 -0.0334839; -0.0334839 2.01363]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf), ([0.96503 0.98374; 0.98374 1.00269], [0.562887 0.564037; 0.564037 0.565109]), [0.931283 0.18302 … -0.056492 -0.00784266; 0.18302 0.967744 … -0.00784266 -0.057492; … ; -0.056492 -0.00784266 … 0.318138 0.0322739; -0.00784266 -0.057492 … 0.0322739 0.319348], [0.186079 0.194854; 0.319264 0.336419], [0.0346255 -0.0466375 0.0121489 -0.018098; -0.0466375 0.101929 -0.018098 0.0395544; 0.0121489 -0.018098 0.0379679 -0.0517843; -0.018098 0.0395544 -0.0517843 0.113178])" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# reset model parameters\n", + "traitall_model_emp = VarianceComponentModel(SNP_29Cdata_rotated_emp)\n", + "# estimate variance components using Fisher scoring algorithm\n", + "@time mle_mm!(traitall_model_emp, SNP_29Cdata_rotated_emp; verbose=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It converges after ~70 iterations." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Joint Analysis via Theoretical Kinship Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is Ipopt version 3.12.4, running with linear solver mumps.\n", + "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", + "\n", + "Number of nonzeros in equality constraint Jacobian...: 0\n", + "Number of nonzeros in inequality constraint Jacobian.: 0\n", + "Number of nonzeros in Lagrangian Hessian.............: 21\n", + "\n", + "Total number of variables............................: 6\n", + " variables with only lower bounds: 0\n", + " variables with lower and upper bounds: 0\n", + " variables with only upper bounds: 0\n", + "Total number of equality constraints.................: 0\n", + "Total number of inequality constraints...............: 0\n", + " inequality constraints with only lower bounds: 0\n", + " inequality constraints with lower and upper bounds: 0\n", + " inequality constraints with only upper bounds: 0\n", + "\n", + "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", + " 0 1.2104252e+03 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 \n", + " 5 9.7597348e+02 0.00e+00 1.48e-01 -11.0 1.35e-02 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 10 9.7597181e+02 0.00e+00 4.20e-03 -11.0 5.82e-04 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 15 9.7597180e+02 0.00e+00 1.25e-04 -11.0 2.86e-05 - 1.00e+00 1.00e+00f 1 MaxS\n", + " 20 9.7597180e+02 0.00e+00 7.79e-06 -11.0 1.57e-06 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 25 9.7597180e+02 0.00e+00 7.03e-07 -11.0 9.33e-08 - 1.00e+00 1.00e+00f 1 MaxSA\n", + " 30 9.7597180e+02 0.00e+00 5.17e-08 -11.0 5.79e-09 - 1.00e+00 1.00e+00f 1 MaxSA\n", + "\n", + "Number of Iterations....: 34\n", + "\n", + " (scaled) (unscaled)\n", + "Objective...............: 3.2912163183208270e+02 9.7597180429975811e+02\n", + "Dual infeasibility......: 6.0764530569090382e-09 1.8019012669212817e-08\n", + "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", + "Overall NLP error.......: 6.0764530569090382e-09 1.8019012669212817e-08\n", + "\n", + "\n", + "Number of objective function evaluations = 35\n", + "Number of objective gradient evaluations = 35\n", + "Number of equality constraint evaluations = 0\n", + "Number of inequality constraint evaluations = 0\n", + "Number of equality constraint Jacobian evaluations = 0\n", + "Number of inequality constraint Jacobian evaluations = 0\n", + "Number of Lagrangian Hessian evaluations = 34\n", + "Total CPU secs in IPOPT (w/o function evaluations) = 0.012\n", + "Total CPU secs in NLP function evaluations = 0.143\n", + "\n", + "EXIT: Optimal Solution Found.\n", + " 0.169836 seconds (45.27 k allocations: 3.810 MiB)\n" + ] + }, + { + "data": { + "text/plain": [ + "(-975.9718042997582, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([40.8046 20.2384; -6.31975 -4.36967], ([3.29155 0.868436; 0.868436 3.04316], [3.14546 0.427296; 0.427296 3.54909]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf), ([1.01209 1.01438; 1.01438 1.01625], [0.732787 0.754108; 0.754108 0.775871]), [1.02433 0.218799 … -0.107296 -0.0211344; 0.218799 1.02896 … -0.0211344 -0.110622; … ; -0.107296 -0.0211344 … 0.568679 0.102172; -0.0211344 -0.110622 … 0.102172 0.601976], [0.294843 0.248448; 0.354237 0.327564], [0.0869326 -0.056033 0.0346471 -0.0167951; -0.056033 0.125484 -0.0167951 0.0385817; 0.0346471 -0.0167951 0.0617266 -0.048602; -0.0167951 0.0385817 -0.048602 0.107298])" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# initialize model parameters\n", + "traitall_model_ped = VarianceComponentModel(SNP_29Cdata_rotated_ped)\n", + "# estimate variance components using Fisher scoring algorithm\n", + "@time mle_fs!(traitall_model_ped, SNP_29Cdata_rotated_ped; solver=:Ipopt, verbose=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the output we can see the Fisher scoring algorithm converged after ~35 iterations. Let's try the **MM algorithm**." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " MM Algorithm\n", + " Iter Objective \n", + "-------- -------------\n", + " 0 -4.886585e+04\n", + " 1 -1.086822e+03\n", + " 2 -1.009878e+03\n", + " 3 -9.856915e+02\n", + " 4 -9.787536e+02\n", + " 5 -9.768659e+02\n", + " 6 -9.763558e+02\n", + " 7 -9.762082e+02\n", + " 8 -9.761556e+02\n", + " 9 -9.761287e+02\n", + " 10 -9.761099e+02\n", + " 20 -9.760162e+02\n", + " 30 -9.759866e+02\n", + " 40 -9.759768e+02\n", + " 50 -9.759735e+02\n", + " 60 -9.759724e+02\n", + " 70 -9.759720e+02\n", + "\n", + " 0.210188 seconds (29.79 k allocations: 2.321 MiB, 6.45% gc time)\n" + ] + }, + { + "data": { + "text/plain": [ + "(-975.971885281387, VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}([40.8044 20.2382; -6.31953 -4.36959], ([3.30612 0.873603; 0.873603 3.04595], [3.1354 0.4238; 0.4238 3.54718]), Array{Float64}(0,4), Char[], Float64[], -Inf, Inf), ([1.01345 1.0152; 1.0152 1.01651], [0.732282 0.753807; 0.753807 0.775775]), [1.02709 0.219542 … -0.107393 -0.021154; 0.219542 1.03064 … -0.021154 -0.110695; … ; -0.107393 -0.021154 … 0.568225 0.102017; -0.021154 -0.110695 … 0.102017 0.601827], [0.295078 0.248066; 0.354281 0.327321], [0.0870707 -0.0560476 0.0344887 -0.0167139; -0.0560476 0.125515 -0.0167139 0.0383889; 0.0344887 -0.0167139 0.0615366 -0.0485389; -0.0167139 0.0383889 -0.0485389 0.107139])" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# reset model parameters\n", + "traitall_model_ped = VarianceComponentModel(SNP_29Cdata_rotated_ped)\n", + "# estimate variance components using Fisher scoring algorithm\n", + "@time mle_mm!(traitall_model_ped, SNP_29Cdata_rotated_ped; verbose=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "It converges after ~70 iterations." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save analysis results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "using JLD\n", + "@save \"SNP_29C_results.jld\"\n", + "whos()" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Julia 0.6.0", + "language": "julia", + "name": "julia-0.6" + }, + "language_info": { + "file_extension": ".jl", + "mimetype": "application/julia", + "name": "julia", + "version": "0.6.0" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/docs/MendelianRandomization_VCM.ipynb b/docs/MendelianRandomization_VCM.ipynb new file mode 100644 index 0000000..2e7e8d1 --- /dev/null +++ b/docs/MendelianRandomization_VCM.ipynb @@ -0,0 +1,2189 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Mendelian Randomization (MR) Using Variance Component Models (VCM)" + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "Authors: Sarah Ji, Jin Zhou, Janet Sinsheimer, Hua Zhou" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook, we demonstrate in 4 examples how to conduct Mendelian randomization (MR) using variance components as a method of causal inference. We don't go into all the assumptions, caveats, extension, disadvantages or advantages of Mendelian randomization. For these aspects we refer the user to the book \"Mendelian Randomization: Methods for using Genetic Variants in Causal Estimation\" by Steven Burgess and Simon G. Thompson. The difference in our approach versus most examples of MR is that we are using family data and so we must also take into account the correlations among relatives in our analyses - thus we use variance component analysis (aka linear mixed modeling). We simulate the trait, exposure and have genotypes all on the family members. In principle the exposure IV relationship and the trait IV relationship can be examined in different samples but this requires additional assumptions. The notebook is organized as follows: \n", + "\n", + "We simulated data outside of this notebook using starting the Ped-GWAS data from example 28d of Mendel v16.0 which available as part of the Fortran Mendel package (available at http://www.genetics.ucla.edu/software/). \n", + "\n", + "Example 1: In this example we use MR-VCM with SNP rs11672206 simulated as a strong instrumental variable (IV) and simulated phenotype data. This first example provides the user with an example in which the genetic variant is a strong IV and there is a direct effect of the exposure on the trait. This example shows one scenario in which MR can be quite useful. \n", + "\n", + "Example 2: In this example we use MR-VCM with SNP rs11672206 again simulated as a strong IV and simulated phenotype data. In this second example there is still a strong IV but any association between the trait and the environmental predictor is due to confounding. This example shows another scenario in which MR can be quite useful. \n", + "\n", + "Example 3: In this final example we use MR-VCM with SNP rs7255584 as the IV. The actual IV is still rs11672206 but this time we simulate it to be a relatively weak IV. Furthermore we don't use rs11672206, we use a proxy IV rs192667878 that is in LD with the true IV. In this third example, we demonstrate that there can be considerable bias if the marker is a weak IV. This one of the problems that can occur in practice with MR. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2 Stage Regression Framework for Mendelian Randomization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we introduce the module MRVC which calls the variance components model along with the other modules needed to run variance component models. " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPrecompiling module MathProgBase.\n", + "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPrecompiling module Ipopt.\n", + "\u001b[39m\n", + "WARNING: deprecated syntax \"inner constructor VarianceComponentModel(...) around /Users/janets/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:41\".\n", + "Use \"VarianceComponentModel{T,M,BT,ΣT}(...) where {T,M,BT,ΣT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor TwoVarCompModelRotate(...) around /Users/janets/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:91\".\n", + "Use \"TwoVarCompModelRotate{T,BT}(...) where {T,BT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor VarianceComponentVariate(...) around /Users/janets/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:149\".\n", + "Use \"VarianceComponentVariate{T,M,YT,XT,VT}(...) where {T,M,YT,XT,VT}\" instead.\n", + "\n", + "WARNING: deprecated syntax \"inner constructor TwoVarCompVariateRotate(...) around /Users/janets/.julia/v0.6/VarianceComponentModels/src/VarianceComponentModels.jl:204\".\n", + "Use \"TwoVarCompVariateRotate{T,YT,XT}(...) where {T,YT,XT}\" instead.\n", + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPrecompiling module GLM.\n", + "\u001b[39m" + ] + }, + { + "data": { + "text/plain": [ + "MendelianRandomizationVCM" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "module MendelianRandomizationVCM\n", + "\n", + "export MendelianRandomization\n", + "using DataFrames\n", + "using StatsBase\n", + "using VarianceComponentModels\n", + "using GLM\n", + "using SnpArrays\n", + "using Distributions\n", + "\n", + "#function MR(IV,Expose,Trait)\n", + "function MendelianRandomization(Y::AbstractVecOrMat,\n", + " X::AbstractVecOrMat,\n", + " #IV::SnpArrays.SnpArray{2},\n", + " IV::AbstractVecOrMat,\n", + " exposure::AbstractVecOrMat,\n", + " GRM::AbstractMatrix,\n", + " algorithm::Symbol = :FS)\n", + "\n", + " T = eltype(X)\n", + " n = length(IV)\n", + "\n", + " ## fit null model without IV effects ##\n", + " nulldata = VarianceComponentVariate([Y exposure], X, (2GRM, eye(n)))\n", + " nulldatarot = TwoVarCompVariateRotate(nulldata)\n", + " nullmodel = VarianceComponentModel(nulldatarot)\n", + "\n", + " traitdata_null = TwoVarCompVariateRotate(nulldatarot.Yrot[:, 1], nulldatarot.Xrot,\n", + " nulldatarot.eigval, nulldatarot.eigvec, nulldatarot.logdetV2)\n", + " traitmodel_null = VarianceComponentModel(traitdata_null)\n", + "\n", + " if algorithm == :MM\n", + " logl, = mle_mm!(traitmodel_null, traitdata_null; verbose = false)\n", + " elseif algorithm == :FS\n", + " logl, = mle_fs!(traitmodel_null, traitdata_null; verbose = false)\n", + " end\n", + "\n", + " ### regress IV with trait\n", + " traitdata = TwoVarCompVariateRotate(nulldatarot.Yrot[:, 1],\n", + " [nulldatarot.Xrot At_mul_B(nulldatarot.eigvec, IV)],\n", + " nulldatarot.eigval, nulldatarot.eigvec, nulldatarot.logdetV2)\n", + " traitmodel = VarianceComponentModel(traitdata)\n", + " #Set the starting values for the maximum likelihood estimation\n", + " #Use the null model estimates as start values for the alternative model.\n", + " fill!(traitmodel.B, zero(T))\n", + " copy!(traitmodel.B, traitmodel_null.B)\n", + " if algorithm == :MM\n", + " _, _, _, _, traitBseMat, _= mle_mm!(traitmodel, traitdata; verbose = false)\n", + " elseif algorithm == :FS\n", + " _, _, _, _, traitBseMat, _ = mle_fs!(traitmodel, traitdata; verbose = false)\n", + " end\n", + " traitB = traitmodel.B[end]\n", + " traitBse = traitBseMat[end]\n", + "\n", + " ### regress IV with exposure\n", + " expdata = TwoVarCompVariateRotate(nulldatarot.Yrot[:, 2],\n", + " [nulldatarot.Xrot At_mul_B(nulldatarot.eigvec, IV)],\n", + " nulldatarot.eigval, nulldatarot.eigvec, nulldatarot.logdetV2)\n", + " expmodel = VarianceComponentModel(expdata)\n", + "\n", + " fill!(expmodel.B, zero(T))\n", + " copy!(expmodel.B, traitmodel_null.B)\n", + " if algorithm == :MM\n", + " _, _, _, _, expBseMat, _= mle_mm!(expmodel, expdata; verbose = false)\n", + " elseif algorithm == :FS\n", + " _, _, _, _, expBseMat, _= mle_fs!(expmodel, expdata; verbose = false)\n", + " end\n", + " expB = expmodel.B[end]\n", + " expBse = expBseMat[end]\n", + "\n", + " directbeta = traitB / expB\n", + " \tSEbeta = sqrt(traitBse^2/expB^2+traitB^2*expBse^2/expB^4)\n", + " W = (directbeta/SEbeta)^2\n", + " #change the degrees of freedom if running a bivariate outcome\n", + " pvalue = ccdf(Chisq(1), W)\n", + " println(\"MR direct effects (SE): \", directbeta, \"(\", SEbeta, \"), Pvalue is \", pvalue,\"\\n\")\n", + " println(\"Exposure effect (SE) is: \",expB,\"(\", expBse, \")\",\"\\n\")\n", + " println(\"Trait effect (SE) is: \", traitB,\"(\",traitBse,\")\",\"\\n\")\n", + " return directbeta, SEbeta\n", + " end\n", + "end # module" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "using MendelianRandomizationVCM" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exposure Effect on Trait using Variance Component Models " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we use the ExposureEffectVCM function which calls the variance component models module along with the other modules needed to run variance component models. \n", + " \n", + " Because we have similated the trait, exposure and genotypes on the same family members, we can also estimate the beta coefficient for the regression of trait on exposure adjusting for the correlation in family members trait values. " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "ExposureEffectVCM (generic function with 1 method)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "using DataFrames\n", + "using StatsBase\n", + "using VarianceComponentModels\n", + "using GLM\n", + "using SnpArrays\n", + "using Distributions\n", + "\n", + "function ExposureEffectVCM(Y::AbstractVecOrMat,\n", + " X::AbstractVecOrMat,\n", + " exposure::AbstractVecOrMat,\n", + " GRM::AbstractMatrix)\n", + " \n", + " T = eltype(X)\n", + " n = length(exposure)\n", + "\n", + "nulldata = VarianceComponentVariate(outcome, X, (ΦGRM, eye(length(outcome))))\n", + " nulldatarot = TwoVarCompVariateRotate(nulldata)\n", + " nullmodel = VarianceComponentModel(nulldatarot)\n", + "\n", + "#regress trait~null\n", + " traitdata_null = TwoVarCompVariateRotate(nulldatarot.Yrot[:, 1], nulldatarot.Xrot,\n", + " nulldatarot.eigval, nulldatarot.eigvec, nulldatarot.logdetV2)\n", + " traitmodel_null = VarianceComponentModel(traitdata_null)\n", + "\n", + " # if algorithm == :MM\n", + " # logl, = mle_mm!(traitmodel_null, traitdata_null; verbose = false)\n", + " # elseif algorithm == :FS\n", + " logl, = mle_fs!(traitmodel_null, traitdata_null; verbose = false)\n", + " # end\n", + "\n", + "### regress trait~exposure lmm \n", + " traitdata = TwoVarCompVariateRotate(nulldatarot.Yrot[:, 1],\n", + " [nulldatarot.Xrot At_mul_B(nulldatarot.eigvec, exposure)],\n", + " nulldatarot.eigval, nulldatarot.eigvec, nulldatarot.logdetV2)\n", + " traitmodel = VarianceComponentModel(traitdata)\n", + "\n", + "#Set the starting values for the maximum likelihood estimation\n", + " #Use the null model estimates as start values for the alternative model.\n", + " fill!(traitmodel.B, zero(T))\n", + " copy!(traitmodel.B, traitmodel_null.B)\n", + " # if algorithm == :MM\n", + " # _, _, _, _, traitBseMat, _= mle_mm!(traitmodel, traitdata; verbose = false)\n", + " # elseif algorithm == :FS\n", + " _, _, _, _, traitBseMat, _ = mle_fs!(traitmodel, traitdata; verbose = false)\n", + " # end\n", + " traitBexp = traitmodel.B[end]\n", + " traitBexpSE = traitBseMat[end]\n", + "return traitBexp, traitBexpSE\n", + " end\n", + "#pval = ccdf(Chisq(1), (model_mle.B[2] / Σse[2][1])^2) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 1: Fit MR-VCM with major locus, SNP rs11672206" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we use the Ped28trait.out data file, simulated under the scenario where the genetic variant is a strong instrumental variable (IV) and there is a direct effect of the exposure on the trait." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data files\n", + "\n", + "We start from the following 3 files from [Mendel Option 28 (Trait Simulation) example](https://www.genetics.ucla.edu/software/Mendel_current_doc.pdf#page=279). Following shell commands assumes MacOS or Linux environment. Julia commands should run regardless of OS." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Take a look at the Pedigree file below. The columns are: :famid, :id, :moid, :faid, :sex, :twin, :exp1 :simtrait1 but in this file the labels are not present. If the user has data with headers then they can change the command to \n", + "Data= readcsv(\"filename.in\", Any; header = true)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×9 Array{Any,2}:\n", + " 1 16 \" \" … \" \" 38.8938 90.1804 \"\"\n", + " 1 8228 \" \" \" \" 47.0274 109.327 \"\"\n", + " 1 17008 \" \" \" \" 44.1152 114.15 \"\"\n", + " 1 9218 17008 \" \" 43.0108 109.922 \"\"\n", + " 1 3226 9218 \" \" 42.4626 100.127 \"\"\n", + " 2 29 \" \" … \" \" 40.5792 95.7373 \"\"\n", + " 2 2294 \" \" \" \" 40.5459 108.733 \"\"\n", + " 2 3416 \" \" \" \" 46.5575 121.181 \"\"\n", + " 2 3916 2294 \" \" 43.2133 101.566 \"\"\n", + " 2 6790 2294 \" \" 45.1688 114.688 \"\"\n", + " 2 14695 2294 … \" \" 39.986 96.3424 \"\"\n", + " 2 17893 2294 \" \" 42.3638 99.7993 \"\"\n", + " 2 6952 3416 \" \" 50.7288 129.112 \"\"\n", + " ⋮ ⋱ ⋮ \n", + " 31 24385 23058 … \" \" 44.803 116.241 \"\"\n", + " 31 21403 27016 \" \" 38.4925 92.6177 \"\"\n", + " 31 1365 23058 \" \" 38.9816 91.6206 \"\"\n", + " 31 19392 26817 \" \" 36.289 89.2516 \"\"\n", + " 10006 66 \" \" \" \" 39.9046 110.54 \"\"\n", + " 10008 92 \" \" … \" \" 42.3719 113.432 \"\"\n", + " 10014 186 \" \" \" \" 46.6342 115.746 \"\"\n", + " 10027 374 \" \" \" \" 41.7128 104.298 \"\"\n", + " 10029 434 \" \" \" \" 44.5177 116.291 \"\"\n", + " 10033 333 \" \" \" \" 47.2557 119.839 \"\"\n", + " 10040 234 \" \" … \" \" 47.1437 114.207 \"\"\n", + " 10045 789 \" \" \" \" 45.9784 114.946 \"\"" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Ped28_1 = readcsv(\"Ped28trait1.out\", Any; header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We put the exposure data are placed in the array exp_1" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 38.8938\n", + " 47.0274\n", + " 44.1152\n", + " 43.0108\n", + " 42.4626\n", + " 40.5792\n", + " 40.5459\n", + " 46.5575\n", + " 43.2133\n", + " 45.1688\n", + " 39.986 \n", + " 42.3638\n", + " 50.7288\n", + " ⋮ \n", + " 44.803 \n", + " 38.4925\n", + " 38.9816\n", + " 36.289 \n", + " 39.9046\n", + " 42.3719\n", + " 46.6342\n", + " 41.7128\n", + " 44.5177\n", + " 47.2557\n", + " 47.1437\n", + " 45.9784" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "exp_1 = convert(Array{Float64,1}, Ped28_1[:,7])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't need to retain the IDs so we retrieve the phenotype data and put them in an array Y_1." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 90.1804\n", + " 109.327 \n", + " 114.15 \n", + " 109.922 \n", + " 100.127 \n", + " 95.7373\n", + " 108.733 \n", + " 121.181 \n", + " 101.566 \n", + " 114.688 \n", + " 96.3424\n", + " 99.7993\n", + " 129.112 \n", + " ⋮ \n", + " 116.241 \n", + " 92.6177\n", + " 91.6206\n", + " 89.2516\n", + " 110.54 \n", + " 113.432 \n", + " 115.746 \n", + " 104.298 \n", + " 116.291 \n", + " 119.839 \n", + " 114.207 \n", + " 114.946 " + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Y_1 = convert(Array{Float64,1}, Ped28_1[:,8])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Retrieve sex data coded as 0 (male) or 1 (female) so male is the reference group." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " ⋮ \n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sex_1 = map(x -> strip(x) == \"F\"? 1.0 : 0.0, Ped28_1[:, 5])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Take a look at the first 10 lines of the SNP definition file. Notice the first two lines are included by default for formatting, and should be excluded from the analysis. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 3.00 = FILE FORMAT VERSION NUMBER.\n", + " \n", + "rs3020701 ,19, 90974, 1, 2\n", + "rs56343121 ,19, 91106, 1, 2\n", + "rs143501051 ,19, 93542, 1, 2\n", + "rs56182540 ,19, 95981, 1, 2\n", + "rs7260412 ,19, 105021, 1, 2\n", + "rs11669393 ,19, 107866, 1, 2\n", + "rs181646587 ,19, 107894, 1, 2\n", + "rs8106297 ,19, 107958, 1, 2\n" + ] + } + ], + "source": [ + ";head SNP_def28trait1.out" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141×6 Array{Any,2}:\n", + " \"rs3020701 \" 19 90974 1 2 \"\"\n", + " \"rs56343121 \" 19 91106 1 2 \"\"\n", + " \"rs143501051 \" 19 93542 1 2 \"\"\n", + " \"rs56182540 \" 19 95981 1 2 \"\"\n", + " \"rs7260412 \" 19 105021 1 2 \"\"\n", + " \"rs11669393 \" 19 107866 1 2 \"\"\n", + " \"rs181646587 \" 19 107894 1 2 \"\"\n", + " \"rs8106297 \" 19 107958 1 2 \"\"\n", + " \"rs8106302 \" 19 107962 1 2 \"\"\n", + " \"rs183568620 \" 19 107987 1 2 \"\"\n", + " \"rs186451972 \" 19 108003 1 2 \"\"\n", + " \"rs189699222 \" 19 108032 1 2 \"\"\n", + " \"rs182902214 \" 19 108090 1 2 \"\"\n", + " ⋮ ⋮ \n", + " \"rs188169422 \" 19 59116080 1 2 \"\"\n", + " \"rs144587467 \" 19 59117729 1 2 \"\"\n", + " \"rs139879509 \" 19 59117949 1 2 \"\"\n", + " \"rs143250448 \" 19 59117982 1 2 \"\"\n", + " \"rs145384750 \" 19 59118028 1 2 \"\"\n", + " \"rs149215836 \" 19 59118040 1 2 \"\"\n", + " \"rs139221927 \" 19 59118044 1 2 \"\"\n", + " \"rs181848453 \" 19 59118114 1 2 \"\"\n", + " \"rs138318162 \" 19 59118148 1 2 \"\"\n", + " \"rs186913222 \" 19 59118616 1 2 \"\"\n", + " \"rs141816674 \" 19 59118779 1 2 \"\"\n", + " \"rs150801216 \" 19 59118783 1 2 \"\"" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpdef28_1 = readcsv(\"SNP_def28trait1.out\", Any; skipstart = 2, header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we will analyze a single SNP, rs10412915, so we don't need the position of the snps just the SNP IDs so we just retrieve SNP ID not the bps." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141-element Array{AbstractString,1}:\n", + " \"rs3020701\" \n", + " \"rs56343121\" \n", + " \"rs143501051\"\n", + " \"rs56182540\" \n", + " \"rs7260412\" \n", + " \"rs11669393\" \n", + " \"rs181646587\"\n", + " \"rs8106297\" \n", + " \"rs8106302\" \n", + " \"rs183568620\"\n", + " \"rs186451972\"\n", + " \"rs189699222\"\n", + " \"rs182902214\"\n", + " ⋮ \n", + " \"rs188169422\"\n", + " \"rs144587467\"\n", + " \"rs139879509\"\n", + " \"rs143250448\"\n", + " \"rs145384750\"\n", + " \"rs149215836\"\n", + " \"rs139221927\"\n", + " \"rs181848453\"\n", + " \"rs138318162\"\n", + " \"rs186913222\"\n", + " \"rs141816674\"\n", + " \"rs150801216\"" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpid = map(x -> strip(string(x)), snpdef28_1[:, 1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the SNP binary file using the SnpArray.jl package." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because SnpArray function requires input file name ending in .bed rather than .bin, we create a symbolic link SNP_data29a.bed to SNP_data29a.bin. (If you have trouble with getting this command to work on your computer you can copy the file outside of julia).\n", + "## NOTE:\n", + "It is VERY important that the SNP data have columns that are in the same order as in the snp definition file and that the rows are in the same order as in the pedigree file. If they are not you will get wrong answers but will not get an error message!" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ln: ./SNP_data28d_trait1.bed: File exists\n" + ] + } + ], + "source": [ + ";ln -s ./SNP_data28d_trait1.bin ./SNP_data28d_trait1.bed" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Control28bivariate.in\n", + "Control_exposure1.in\n", + "Control_exposure3.in\n", + "Control_trait1.in\n", + "Control_trait3.in\n", + "Def28e2.in\n", + "Def28exp1.out\n", + "Def28exp3.out\n", + "Def28exposure.in\n", + "Def28exptrait.out\n", + "Def28exptrait2.out\n", + "Def28trait.out\n", + "Def28trait1.out\n", + "Def28trait3.out\n", + "Mendel28exposure1.out\n", + "Mendel28exposure3.out\n", + "Mendel28exptrait2.out\n", + "Mendel28trait1.out\n", + "Mendel28trait3.out\n", + "MendelianRandomization_LMM.ipynb\n", + "MendelianRandomization_VCM_JanetSarah3152018.ipynb\n", + "MendelianRandomization_VCM_JanetSarahMarch6.ipynb\n", + "Ped28d.in\n", + "Ped28d2.in\n", + "Ped28exp1.out\n", + "Ped28exp3.out\n", + "Ped28exptrait.out\n", + "Ped28exptrait2.out\n", + "Ped28trait1.out\n", + "Ped28trait3.out\n", + "SNP_data28d.bin\n", + "SNP_data28d2.bin\n", + "SNP_data28d_exp1.bin\n", + "SNP_data28d_exp3.bin\n", + "SNP_data28d_trait1.bed\n", + "SNP_data28d_trait1.bin\n", + "SNP_data28d_trait3.bed\n", + "SNP_data28d_trait3.bin\n", + "SNP_data28exptrait.bin\n", + "SNP_data28exptrait2.bed\n", + "SNP_data28exptrait2.bin\n", + "SNP_def28d.in\n", + "SNP_def28d2.in\n", + "SNP_def28exp1.out\n", + "SNP_def28exp3.out\n", + "SNP_def28exptrait.out\n", + "SNP_def28exptrait2.out\n", + "SNP_def28trait1.out\n", + "SNP_def28trait3.out\n", + "Simulation28exp.in\n", + "Simulation28exp3.in\n", + "Simulation28test2.in\n", + "Simulation28trait.in\n", + "Simulation28traitexamp1.in\n", + "Simulation28traitexamp3.in\n", + "Summary28exposure1.out\n", + "Summary28exposure3.out\n", + "Summary28exptrait2.out\n", + "Summary28trait1.out\n", + "Summary28trait3.out\n", + "~$ntrol_trait3.in.txt\n" + ] + } + ], + "source": [ + ";ls" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mv1.0 BED file detected\n", + "\u001b[39m" + ] + }, + { + "data": { + "text/plain": [ + "212×253141 SnpArrays.SnpArray{2}:\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " ⋮ ⋱ ⋮ \n", + " (true, true) (false, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "using SnpArrays\n", + "snpbin28_1 = SnpArray(\"SNP_data28d_trait1\"; people = size(Ped28_1, 1), snps = size(snpdef28_1, 1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Kinship via Genetic Relationship Matrix (GRM)\n", + "\n", + "Recall that in using variance components (linear mixed models) we need a measure of the relatedness among individuals. Under the GRM formulation, the estimate of the global kinship coefficient of individuals $i$ and $j$ is\n", + "$$ \\widehat\\Phi_{GRMij}^ = \\frac{1}{2S} \\sum_{k=1}^S \\frac{(x_{ik} -2p_k)(x_{jk} - 2p_k)}{2 p_k (1-p_k)}$$,\n", + "where $k$ ranges over the selected $S$ SNPs, $p_k$ is the minor allele frequency of SNP $k$, and $x_{ik}$ is the number of minor alleles in individual $i$s genotype at SNP $k$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Calculate the GRM matrix\n", + "\n", + "By default, `grm` excludes SNPs with maf < 0.01." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.488859 0.00422798 0.00978752 … 0.0187809 0.000152741\n", + " 0.00422798 0.515141 -0.0190676 -0.0220944 -0.0162381 \n", + " 0.00978752 -0.0190676 0.489156 -0.0143552 -0.00392193 \n", + " 0.241927 -0.00231651 0.266756 0.00264766 -0.00334162 \n", + " 0.124909 0.265109 0.12103 -0.0118053 -0.00641433 \n", + " -0.0121766 -0.000271784 -0.00316506 … -0.000629837 0.0103776 \n", + " -0.011988 0.00845464 -0.00950518 -0.0105435 0.000340285\n", + " -0.0134368 -0.00713175 -0.0101011 -0.0115942 -0.00288935 \n", + " -0.0234305 -0.00241126 -0.010413 0.000129339 0.0166282 \n", + " -0.0121471 0.00305141 -0.00754556 -0.0104162 0.00554306 \n", + " -0.0128464 0.010258 0.00110475 … -0.00424998 0.0124296 \n", + " -0.0111854 0.00423488 -0.0117065 -0.0244977 -0.00698659 \n", + " -0.0124607 -0.00575043 -0.0124951 -0.00573017 -0.0103429 \n", + " ⋮ ⋱ ⋮ \n", + " -0.0170331 -0.0136246 -0.00834008 … -0.00761561 -0.012038 \n", + " -0.0100435 -0.00757905 -0.00879137 0.0151941 0.00464083 \n", + " -0.0119478 -0.011707 -0.0119966 -0.00531199 -0.0233153 \n", + " -0.00406636 0.00823632 0.00260479 0.00329892 0.00873774 \n", + " -0.00215359 0.0120041 -0.002889 -0.00294264 -0.00290955 \n", + " 0.00353341 -0.00200184 -0.00346206 … 0.0240445 0.0224584 \n", + " -0.0132247 -0.00736188 0.000399022 -0.0115962 0.00801023 \n", + " 0.0136135 0.014919 0.0146852 0.00611176 0.00984425 \n", + " -0.00763248 -0.00216137 0.0227918 0.00711085 -0.0159608 \n", + " 0.000154845 -0.0142528 0.0109832 0.00490078 0.00924588 \n", + " 0.0187809 -0.0220944 -0.0143552 … 0.556933 0.0380902 \n", + " 0.000152741 -0.0162381 -0.00392193 0.0380902 0.502935 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Φgrm_snp28_1 = grm(snpbin28_1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We need to locate SNP rs11672206 and find that it is the 236079th SNP in the file. " + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "236079" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ind_rs11672206 = find(x -> x == \"rs11672206\", snpid)[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 0.0\n", + " 2.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " ⋮ \n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snp_rs11672206 = convert(Array{Float64,1}, snpbin28_1[:, ind_rs11672206])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 1 MR when there is a direct effect of exposure on trait. " + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "******************************************************************************\n", + "This program contains Ipopt, a library for large-scale nonlinear optimization.\n", + " Ipopt is released as open source code under the Eclipse Public License (EPL).\n", + " For more information visit http://projects.coin-or.org/Ipopt\n", + "******************************************************************************\n", + "\n", + "MR direct effects (SE): 1.9758033020896233(0.4044635920194066), Pvalue is 1.0343059935417672e-6\n", + "\n", + "Exposure effect (SE) is: 2.5214901400747523(0.3269495431167835)\n", + "\n", + "Trait effect (SE) is: 4.981968544946122(0.7891739351684997)\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "(1.9758033020896233, 0.4044635920194066)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "###### First Example MR####\n", + "MendelianRandomization(Y_1, [ones(length(Y_1), 1) sex_1], snp_rs11672206, exp_1, Φgrm_snp28_1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, The mean exposure level is 40, males have mean value of 43 and females have a mean value of 37, the effect of the IV on the exposure was simulated increase the exposure level 2.5 units per 2 allele and the heritability of the exposure is 67%. The exposure level is estimated to be 2.52 sd = 0.79, which is very close to this value. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We also simulated a direct effect of the exposure on the trait of 2 units per unit of exposure. The MR direct effect is very close to this value (1.96 sd = 0.40) and leads to inference of a direct effect of exposure on the trait. In this case we conduct the regression of trait on exposure we get a very similar result mean = 1.95. We can see however that the MR estimate has a bigger sd - this is due to estimating the value as a two step procedure and so one might think the regression of trait on exposure is preferable. Although it is in this case, it is not when there is confounding (example 2)." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(1.9471633422983434, 0.059848806495791294)" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "###### First Example Exposure Effect on Trait using VCM ####\n", + "X = [ones(length(Y_1), 1) sex_1]\n", + "exposure = exp_1\n", + "outcome = Y_1\n", + "ΦGRM = Φgrm_snp28_1\n", + "ExposureEffectVCM(outcome, X, exposure, ΦGRM)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Example 2: Confounding - No Direct Effect of Exposure on Trait." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we use the Ped28exptrait2.out data file, simulated under the scenario where there is still a strong IV but any association between the trait and the environmental predictor is due to confounding. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mendel Option 28 data\n", + "\n", + "Take a look at the pedigree file.\n", + "columns are: :famid, :id, :moid, :faid, :sex, :twin, :exp2, :simtrait2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the pedigree file. This file is in the classic Mendel format, Family Id, Person ID, Father ID, Mother Id, sex as F (female) or M (male), monozygotic twin indicator, exposure 2 and simtrait 2. " + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×9 Array{Any,2}:\n", + " 1 16 \" \" … \" \" 32.1022 14.7783 \"\"\n", + " 1 8228 \" \" \" \" 32.0522 16.6838 \"\"\n", + " 1 17008 \" \" \" \" 45.3005 21.6456 \"\"\n", + " 1 9218 17008 \" \" 43.4727 24.2119 \"\"\n", + " 1 3226 9218 \" \" 37.7578 20.511 \"\"\n", + " 2 29 \" \" … \" \" 36.9209 17.0922 \"\"\n", + " 2 2294 \" \" \" \" 43.3864 20.414 \"\"\n", + " 2 3416 \" \" \" \" 45.7102 23.3037 \"\"\n", + " 2 17893 2294 \" \" 33.4674 15.3641 \"\"\n", + " 2 6952 3416 \" \" 39.5669 21.3321 \"\"\n", + " 2 14695 2294 … \" \" 30.7948 16.3268 \"\"\n", + " 2 6790 2294 \" \" 38.5657 22.479 \"\"\n", + " 2 3916 2294 \" \" 34.3997 17.2739 \"\"\n", + " ⋮ ⋱ ⋮ \n", + " 31 9277 4257 … \" \" 43.8188 20.6551 \"\"\n", + " 31 16139 4257 \" \" 38.916 17.9165 \"\"\n", + " 31 10439 4257 \" \" 40.7922 23.3292 \"\"\n", + " 31 63 17673 \" \" 34.847 13.2469 \"\"\n", + " 10006 66 \" \" \" \" 41.9567 23.4621 \"\"\n", + " 10008 92 \" \" … \" \" 39.7982 20.5745 \"\"\n", + " 10014 186 \" \" \" \" 39.3295 20.8389 \"\"\n", + " 10027 374 \" \" \" \" 38.8928 16.6698 \"\"\n", + " 10029 434 \" \" \" \" 46.2767 26.1484 \"\"\n", + " 10033 333 \" \" \" \" 42.8913 23.4916 \"\"\n", + " 10040 234 \" \" … \" \" 44.5806 23.3856 \"\"\n", + " 10045 789 \" \" \" \" 38.9998 22.224 \"\"" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# columns are: :famid, :id, :moid, :faid, :sex, :twin, :simtrait1, :simtrait2\n", + "ped28_2 = readcsv(\"Ped28exptrait2.out\", Any; header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't need to retain the ids so we retrieve the phenotype data and put them in Y_2 and exp_2, to be used as the respective outcome and exposure in example 2." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×2 Array{Float64,2}:\n", + " 32.1022 14.7783\n", + " 32.0522 16.6838\n", + " 45.3005 21.6456\n", + " 43.4727 24.2119\n", + " 37.7578 20.511 \n", + " 36.9209 17.0922\n", + " 43.3864 20.414 \n", + " 45.7102 23.3037\n", + " 33.4674 15.3641\n", + " 39.5669 21.3321\n", + " 30.7948 16.3268\n", + " 38.5657 22.479 \n", + " 34.3997 17.2739\n", + " ⋮ \n", + " 43.8188 20.6551\n", + " 38.916 17.9165\n", + " 40.7922 23.3292\n", + " 34.847 13.2469\n", + " 41.9567 23.4621\n", + " 39.7982 20.5745\n", + " 39.3295 20.8389\n", + " 38.8928 16.6698\n", + " 46.2767 26.1484\n", + " 42.8913 23.4916\n", + " 44.5806 23.3856\n", + " 38.9998 22.224 " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "exp_2 = convert(Vector{Float64}, ped28_2[:, 7])\n", + "Y_2 = convert(Vector{Float64}, ped28_2[:, 8])\n", + "Exposure_Trait2 = [exp_2 Y_2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Retrieve sex data coded as 0 (male) or 1 (female) so male is the reference group." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " ⋮ \n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sex_2 = map(x -> strip(x) == \"F\"? 1.0 : 0.0, ped28_2[:, 5])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the SNP definition file, skipping the first 2 lines." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141×6 Array{Any,2}:\n", + " \"rs3020701 \" 19 90974 1 2 \"\"\n", + " \"rs56343121 \" 19 91106 1 2 \"\"\n", + " \"rs143501051 \" 19 93542 1 2 \"\"\n", + " \"rs56182540 \" 19 95981 1 2 \"\"\n", + " \"rs7260412 \" 19 105021 1 2 \"\"\n", + " \"rs11669393 \" 19 107866 1 2 \"\"\n", + " \"rs181646587 \" 19 107894 1 2 \"\"\n", + " \"rs8106297 \" 19 107958 1 2 \"\"\n", + " \"rs8106302 \" 19 107962 1 2 \"\"\n", + " \"rs183568620 \" 19 107987 1 2 \"\"\n", + " \"rs186451972 \" 19 108003 1 2 \"\"\n", + " \"rs189699222 \" 19 108032 1 2 \"\"\n", + " \"rs182902214 \" 19 108090 1 2 \"\"\n", + " ⋮ ⋮ \n", + " \"rs188169422 \" 19 59116080 1 2 \"\"\n", + " \"rs144587467 \" 19 59117729 1 2 \"\"\n", + " \"rs139879509 \" 19 59117949 1 2 \"\"\n", + " \"rs143250448 \" 19 59117982 1 2 \"\"\n", + " \"rs145384750 \" 19 59118028 1 2 \"\"\n", + " \"rs149215836 \" 19 59118040 1 2 \"\"\n", + " \"rs139221927 \" 19 59118044 1 2 \"\"\n", + " \"rs181848453 \" 19 59118114 1 2 \"\"\n", + " \"rs138318162 \" 19 59118148 1 2 \"\"\n", + " \"rs186913222 \" 19 59118616 1 2 \"\"\n", + " \"rs141816674 \" 19 59118779 1 2 \"\"\n", + " \"rs150801216 \" 19 59118783 1 2 \"\"" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# columns are: :snpid, :chrom, :pos, :allele1, :allele2, :groupname\n", + "snpdef28_2 = readcsv(\"SNP_def28exptrait2.out\", Any; skipstart = 2, header = false)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will be analyzing a single SNP (rather than taking into account LD) so we don't need the position of the snps just the SNP IDs so we retrieve SNP IDs." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141-element Array{AbstractString,1}:\n", + " \"rs3020701\" \n", + " \"rs56343121\" \n", + " \"rs143501051\"\n", + " \"rs56182540\" \n", + " \"rs7260412\" \n", + " \"rs11669393\" \n", + " \"rs181646587\"\n", + " \"rs8106297\" \n", + " \"rs8106302\" \n", + " \"rs183568620\"\n", + " \"rs186451972\"\n", + " \"rs189699222\"\n", + " \"rs182902214\"\n", + " ⋮ \n", + " \"rs188169422\"\n", + " \"rs144587467\"\n", + " \"rs139879509\"\n", + " \"rs143250448\"\n", + " \"rs145384750\"\n", + " \"rs149215836\"\n", + " \"rs139221927\"\n", + " \"rs181848453\"\n", + " \"rs138318162\"\n", + " \"rs186913222\"\n", + " \"rs141816674\"\n", + " \"rs150801216\"" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpid_2 = map(x -> strip(string(x)), snpdef28_2[:, 1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the SNP binary file using the SnpArray.jl package. \n", + "#### Again it is very important to use the bed file that corresponds to the fam and snp definition file. " + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ln: ./SNP_data28exptrait2.bed: File exists\n" + ] + } + ], + "source": [ + ";ln -s ./SNP_data28exptrait2.bin ./SNP_data28exptrait2.bed" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mv1.0 BED file detected\n", + "\u001b[39m" + ] + }, + { + "data": { + "text/plain": [ + "212×253141 SnpArrays.SnpArray{2}:\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) … (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " ⋮ ⋱ ⋮ \n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpbin28_2 = SnpArray(\"SNP_data28exptrait2.bed\"; people = size(ped28_2, 1), snps = size(snpdef28_2, 1))" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.488859 0.00422798 0.00978752 … 0.0187809 0.000152741\n", + " 0.00422798 0.515141 -0.0190676 -0.0220944 -0.0162381 \n", + " 0.00978752 -0.0190676 0.489156 -0.0143552 -0.00392193 \n", + " 0.241927 -0.00231651 0.266756 0.00264766 -0.00334162 \n", + " 0.124909 0.265109 0.12103 -0.0118053 -0.00641433 \n", + " -0.0121766 -0.000271784 -0.00316506 … -0.000629837 0.0103776 \n", + " -0.011988 0.00845464 -0.00950518 -0.0105435 0.000340285\n", + " -0.0134368 -0.00713175 -0.0101011 -0.0115942 -0.00288935 \n", + " -0.0111854 0.00423488 -0.0117065 -0.0244977 -0.00698659 \n", + " -0.0124607 -0.00575043 -0.0124951 -0.00573017 -0.0103429 \n", + " -0.0128464 0.010258 0.00110475 … -0.00424998 0.0124296 \n", + " -0.0121471 0.00305141 -0.00754556 -0.0104162 0.00554306 \n", + " -0.0234305 -0.00241126 -0.010413 0.000129339 0.0166282 \n", + " ⋮ ⋱ ⋮ \n", + " 0.000390703 -0.00829937 -0.0134343 … 0.00203895 -0.0193534 \n", + " 0.00338078 0.00116913 -0.0137822 -0.000709205 0.00437081 \n", + " -0.0131106 -0.0246602 -0.00377814 -0.0167851 -0.0115794 \n", + " -0.016104 -0.014253 -0.0132132 -0.023015 0.000196427\n", + " -0.00215359 0.0120041 -0.002889 -0.00294264 -0.00290955 \n", + " 0.00353341 -0.00200184 -0.00346206 … 0.0240445 0.0224584 \n", + " -0.0132247 -0.00736188 0.000399022 -0.0115962 0.00801023 \n", + " 0.0136135 0.014919 0.0146852 0.00611176 0.00984425 \n", + " -0.00763248 -0.00216137 0.0227918 0.00711085 -0.0159608 \n", + " 0.000154845 -0.0142528 0.0109832 0.00490078 0.00924588 \n", + " 0.0187809 -0.0220944 -0.0143552 … 0.556933 0.0380902 \n", + " 0.000152741 -0.0162381 -0.00392193 0.0380902 0.502935 " + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Φgrm_snp28_2 = grm(snpbin28_2)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "236079" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ind_rs11672206_2 = find(x -> x == \"rs11672206\", snpid_2)[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 0.0\n", + " 2.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " ⋮ \n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snp_rs11672206_2 = convert(Array{Float64,1}, snpbin28_2[:, ind_rs11672206_2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 2 Mendelian Randomization VCM:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MR direct effects (SE): 0.24374813445595547(0.1523385536136043), Pvalue is 0.10958919289246363\n", + "\n", + "Exposure effect (SE) is: -1.8980279594308567(0.25670960108586244)\n", + "\n", + "Trait effect (SE) is: -0.46264077425651523(0.2822910952147574)\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "(0.24374813445595547, 0.1523385536136043)" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Example 2\n", + "MendelianRandomization(Y_2, [ones(length(Y_2), 1) sex_2], snp_rs11672206_2, exp_2, Φgrm_snp28_2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Again we simulated the effect of the IV on exposure to be strong. In this case, \n", + "each 2 allele of the SNP reduces the exposure by -1.5 units. We simulated the trait and exposure so that they are correlated but due to unmeasured confounders not due to a direct effect. \n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.4994751780732781, 0.05399663242249682)" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "###### Second Example Exposure Effect on Trait using VCM ####\n", + "X = [ones(length(Y_2), 1) sex_2]\n", + "exposure = exp_2\n", + "outcome = Y_2\n", + "ΦGRM = Φgrm_snp28_2\n", + "ExposureEffectVCM(outcome, X, exposure, ΦGRM)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Regressing trait on exposure in this case, would lead a research to think that the exposure effects the trait but in fact, from the MR analysis we can see there is little evidence of a direct effect." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3rd Example: Direct effect of exposure on trait exists but the IV is weak. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we use the Ped28trait3.out data file, simulated under the scenario where the genetic marker is a weak IV to demonstrate the problem of violating one of the essential assumptions of MR." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×9 Array{Any,2}:\n", + " 1 16 \" \" … \" \" 40.1878 214.434 \"\"\n", + " 1 8228 \" \" \" \" 47.0623 250.469 \"\"\n", + " 1 17008 \" \" \" \" 44.8772 249.917 \"\"\n", + " 1 9218 17008 \" \" 41.4107 231.419 \"\"\n", + " 1 3226 9218 \" \" 40.3835 217.034 \"\"\n", + " 2 29 \" \" … \" \" 39.6165 212.84 \"\"\n", + " 2 2294 \" \" \" \" 38.8691 220.87 \"\"\n", + " 2 3416 \" \" \" \" 48.9882 271.711 \"\"\n", + " 2 3916 2294 \" \" 45.129 240.725 \"\"\n", + " 2 6790 2294 \" \" 43.2185 240.717 \"\"\n", + " 2 14695 2294 … \" \" 38.846 210.021 \"\"\n", + " 2 17893 2294 \" \" 43.9826 234.954 \"\"\n", + " 2 6952 3416 \" \" 50.7849 280.457 \"\"\n", + " ⋮ ⋱ ⋮ \n", + " 31 24385 23058 … \" \" 41.1275 231.582 \"\"\n", + " 31 21403 27016 \" \" 38.5637 208.184 \"\"\n", + " 31 1365 23058 \" \" 35.6761 192.605 \"\"\n", + " 31 19392 26817 \" \" 32.7235 179.584 \"\"\n", + " 10006 66 \" \" \" \" 37.7896 217.257 \"\"\n", + " 10008 92 \" \" … \" \" 38.5346 219.802 \"\"\n", + " 10014 186 \" \" \" \" 45.7092 252.089 \"\"\n", + " 10027 374 \" \" \" \" 37.4253 209.743 \"\"\n", + " 10029 434 \" \" \" \" 45.5548 254.076 \"\"\n", + " 10033 333 \" \" \" \" 46.7553 258.965 \"\"\n", + " 10040 234 \" \" … \" \" 49.9749 271.941 \"\"\n", + " 10045 789 \" \" \" \" 44.6052 246.865 \"\"" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ped28_3 = readcsv(\"Ped28trait3.out\", Any; header = false)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 40.1878\n", + " 47.0623\n", + " 44.8772\n", + " 41.4107\n", + " 40.3835\n", + " 39.6165\n", + " 38.8691\n", + " 48.9882\n", + " 45.129 \n", + " 43.2185\n", + " 38.846 \n", + " 43.9826\n", + " 50.7849\n", + " ⋮ \n", + " 41.1275\n", + " 38.5637\n", + " 35.6761\n", + " 32.7235\n", + " 37.7896\n", + " 38.5346\n", + " 45.7092\n", + " 37.4253\n", + " 45.5548\n", + " 46.7553\n", + " 49.9749\n", + " 44.6052" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "exp_3 = convert(Array{Float64,1}, ped28_3[:, 7])" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 214.434\n", + " 250.469\n", + " 249.917\n", + " 231.419\n", + " 217.034\n", + " 212.84 \n", + " 220.87 \n", + " 271.711\n", + " 240.725\n", + " 240.717\n", + " 210.021\n", + " 234.954\n", + " 280.457\n", + " ⋮ \n", + " 231.582\n", + " 208.184\n", + " 192.605\n", + " 179.584\n", + " 217.257\n", + " 219.802\n", + " 252.089\n", + " 209.743\n", + " 254.076\n", + " 258.965\n", + " 271.941\n", + " 246.865" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Y_3 = convert(Array{Float64,1}, ped28_3[:, 8])" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " ⋮ \n", + " 0.0\n", + " 1.0\n", + " 1.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sex_3 = map(x -> strip(x) == \"F\"? 1.0 : 0.0, ped28_3[:, 5])" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141×6 Array{Any,2}:\n", + " \"rs3020701 \" 19 90974 1 2 \"\"\n", + " \"rs56343121 \" 19 91106 1 2 \"\"\n", + " \"rs143501051 \" 19 93542 1 2 \"\"\n", + " \"rs56182540 \" 19 95981 1 2 \"\"\n", + " \"rs7260412 \" 19 105021 1 2 \"\"\n", + " \"rs11669393 \" 19 107866 1 2 \"\"\n", + " \"rs181646587 \" 19 107894 1 2 \"\"\n", + " \"rs8106297 \" 19 107958 1 2 \"\"\n", + " \"rs8106302 \" 19 107962 1 2 \"\"\n", + " \"rs183568620 \" 19 107987 1 2 \"\"\n", + " \"rs186451972 \" 19 108003 1 2 \"\"\n", + " \"rs189699222 \" 19 108032 1 2 \"\"\n", + " \"rs182902214 \" 19 108090 1 2 \"\"\n", + " ⋮ ⋮ \n", + " \"rs188169422 \" 19 59116080 1 2 \"\"\n", + " \"rs144587467 \" 19 59117729 1 2 \"\"\n", + " \"rs139879509 \" 19 59117949 1 2 \"\"\n", + " \"rs143250448 \" 19 59117982 1 2 \"\"\n", + " \"rs145384750 \" 19 59118028 1 2 \"\"\n", + " \"rs149215836 \" 19 59118040 1 2 \"\"\n", + " \"rs139221927 \" 19 59118044 1 2 \"\"\n", + " \"rs181848453 \" 19 59118114 1 2 \"\"\n", + " \"rs138318162 \" 19 59118148 1 2 \"\"\n", + " \"rs186913222 \" 19 59118616 1 2 \"\"\n", + " \"rs141816674 \" 19 59118779 1 2 \"\"\n", + " \"rs150801216 \" 19 59118783 1 2 \"\"" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpdef28_3 = readcsv(\"SNP_def28trait3.out\", Any; skipstart = 2, header = false)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "253141-element Array{AbstractString,1}:\n", + " \"rs3020701\" \n", + " \"rs56343121\" \n", + " \"rs143501051\"\n", + " \"rs56182540\" \n", + " \"rs7260412\" \n", + " \"rs11669393\" \n", + " \"rs181646587\"\n", + " \"rs8106297\" \n", + " \"rs8106302\" \n", + " \"rs183568620\"\n", + " \"rs186451972\"\n", + " \"rs189699222\"\n", + " \"rs182902214\"\n", + " ⋮ \n", + " \"rs188169422\"\n", + " \"rs144587467\"\n", + " \"rs139879509\"\n", + " \"rs143250448\"\n", + " \"rs145384750\"\n", + " \"rs149215836\"\n", + " \"rs139221927\"\n", + " \"rs181848453\"\n", + " \"rs138318162\"\n", + " \"rs186913222\"\n", + " \"rs141816674\"\n", + " \"rs150801216\"" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpid = map(x -> strip(string(x)), snpdef28_3[:, 1])" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ln: ./SNP_data28d_trait3.bed: File exists\n" + ] + } + ], + "source": [ + ";ln -s ./SNP_data28d_trait3.bin ./SNP_data28d_trait3.bed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Again remember to use the SNP bed file that corresponds to the fam file and snp ids ordering or you can get incorrect results." + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mv1.0 BED file detected\n", + "\u001b[39m" + ] + }, + { + "data": { + "text/plain": [ + "212×253141 SnpArrays.SnpArray{2}:\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (true, true) (false, false) (true, true) \n", + " (true, true) (false, true) … (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " (true, true) (false, true) (false, false) (true, true) \n", + " ⋮ ⋱ ⋮ \n", + " (true, true) (false, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (false, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (false, true) (false, true) (false, true) \n", + " (true, true) (true, true) (true, true) (false, false)\n", + " (true, true) (true, true) … (true, true) (false, false)\n", + " (true, true) (true, true) (false, true) (false, true) " + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snpbin28_3 = SnpArray(\"SNP_data28d_trait3.bed\"; people = size(ped28_3, 1), snps = size(snpdef28_3, 1))" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212×212 Array{Float64,2}:\n", + " 0.488859 0.00422798 0.00978752 … 0.0187809 0.000152741\n", + " 0.00422798 0.515141 -0.0190676 -0.0220944 -0.0162381 \n", + " 0.00978752 -0.0190676 0.489156 -0.0143552 -0.00392193 \n", + " 0.241927 -0.00231651 0.266756 0.00264766 -0.00334162 \n", + " 0.124909 0.265109 0.12103 -0.0118053 -0.00641433 \n", + " -0.0121766 -0.000271784 -0.00316506 … -0.000629837 0.0103776 \n", + " -0.011988 0.00845464 -0.00950518 -0.0105435 0.000340285\n", + " -0.0134368 -0.00713175 -0.0101011 -0.0115942 -0.00288935 \n", + " -0.0234305 -0.00241126 -0.010413 0.000129339 0.0166282 \n", + " -0.0121471 0.00305141 -0.00754556 -0.0104162 0.00554306 \n", + " -0.0128464 0.010258 0.00110475 … -0.00424998 0.0124296 \n", + " -0.0111854 0.00423488 -0.0117065 -0.0244977 -0.00698659 \n", + " -0.0124607 -0.00575043 -0.0124951 -0.00573017 -0.0103429 \n", + " ⋮ ⋱ ⋮ \n", + " -0.0170331 -0.0136246 -0.00834008 … -0.00761561 -0.012038 \n", + " -0.0100435 -0.00757905 -0.00879137 0.0151941 0.00464083 \n", + " -0.0119478 -0.011707 -0.0119966 -0.00531199 -0.0233153 \n", + " -0.00406636 0.00823632 0.00260479 0.00329892 0.00873774 \n", + " -0.00215359 0.0120041 -0.002889 -0.00294264 -0.00290955 \n", + " 0.00353341 -0.00200184 -0.00346206 … 0.0240445 0.0224584 \n", + " -0.0132247 -0.00736188 0.000399022 -0.0115962 0.00801023 \n", + " 0.0136135 0.014919 0.0146852 0.00611176 0.00984425 \n", + " -0.00763248 -0.00216137 0.0227918 0.00711085 -0.0159608 \n", + " 0.000154845 -0.0142528 0.0109832 0.00490078 0.00924588 \n", + " 0.0187809 -0.0220944 -0.0143552 … 0.556933 0.0380902 \n", + " 0.000152741 -0.0162381 -0.00392193 0.0380902 0.502935 " + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Φgrm_snp28_3 = grm(snpbin28_3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now instead of using the actual IV we use a SNP that is in LD with the IV." + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "236082" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ind_rs7255584 = find(x -> x == \"rs7255584\", snpid)[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "212-element Array{Float64,1}:\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " ⋮ \n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " 0.0" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "snp_rs7255584_3 = convert(Array{Float64,1}, snpbin28_3[:, ind_rs7255584])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 3 Mendelian Randomization VCM:" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MR direct effects (SE): 2.4467025863210314(40.916255981332085), Pvalue is 0.9523166681651045\n", + "\n", + "Exposure effect (SE) is: 0.13008389692292568(0.9573567708345674)\n", + "\n", + "Trait effect (SE) is: 0.31827660704004074(4.779415412555627)\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "(2.4467025863210314, 40.916255981332085)" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "MendelianRandomization(Y_3,[ones(length(Y_3),1) sex_3], snp_rs7255584_3, exp_3, Φgrm_snp28_3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the SNP effect on exposure (exposure effect) is weak. The SNP effect with the trait is also poorly estimated and so the estimate of the MR is underestimated. The true value is 5.0 units change in trait value for every 1.0 unit of exposure but the MR estimate is only half of that. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example 3 Exposure Effect VCM:" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4.976393615287649, 0.021010620094895217)" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "###### Third Example Exposure Effect on Trait using VCM ####\n", + "X = [ones(length(Y_3), 1) sex_3]\n", + "exposure = exp_3\n", + "outcome = Y_3\n", + "ΦGRM = Φgrm_snp28_3\n", + "ExposureEffectVCM(outcome, X, exposure, ΦGRM)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Julia 0.6.2", + "language": "julia", + "name": "julia-0.6" + }, + "language_info": { + "file_extension": ".jl", + "mimetype": "application/julia", + "name": "julia", + "version": "0.6.2" + }, + "toc": { + "colors": { + "hover_highlight": "#DAA520", + "running_highlight": "#FF0000", + "selected_highlight": "#FFD700" + }, + "moveMenuLeft": true, + "nav_menu": { + "height": "30px", + "width": "252px" + }, + "navigate_menu": true, + "number_sections": true, + "sideBar": true, + "threshold": 4, + "toc_cell": false, + "toc_section_display": "block", + "toc_window_display": false, + "widenNotebook": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/heritability.ipynb b/docs/heritability.ipynb deleted file mode 100644 index 75452be..0000000 --- a/docs/heritability.ipynb +++ /dev/null @@ -1,2286 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "# Heritability Analysis" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "As an application of the variance component model, this note demonstrates the workflow for heritability analysis in genetics, using a sample data set `cg10k` with **6,670** individuals and **630,860** SNPs. Person IDs and phenotype names are masked for privacy. `cg10k.bed`, `cg10k.bim`, and `cg10k.fam` is a set of Plink files in binary format. `cg10k_traits.txt` contains 13 phenotypes of the 6,670 individuals." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cg10k.bed\n", - "cg10k.bim\n", - "cg10k.fam\n", - "cg10k_traits.txt\n" - ] - } - ], - "source": [ - ";ls cg10k*.*" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Machine information:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Julia Version 0.5.1\n", - "Commit 6445c82 (2017-03-05 13:25 UTC)\n", - "Platform Info:\n", - " OS: macOS (x86_64-apple-darwin13.4.0)\n", - " CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz\n", - " WORD_SIZE: 64\n", - " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n", - " LAPACK: libopenblas64_\n", - " LIBM: libopenlibm\n", - " LLVM: libLLVM-3.7.1 (ORCJIT, haswell)\n" - ] - } - ], - "source": [ - "versioninfo()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Read in binary SNP data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "We will use the [`SnpArrays.jl`](https://github.com/OpenMendel/SnpArrays.jl) package to read in binary SNP data and compute the empirical kinship matrix. Issue \n", - "```julia\n", - "Pkg.clone(\"https://github.com/OpenMendel/SnpArrays.jl.git\")\n", - "```\n", - "within `Julia` to install the `SnpArrays` package." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [], - "source": [ - "using SnpArrays" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 33.883711 seconds (282.07 k allocations: 1015.002 MB, 0.23% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "6670×630860 SnpArrays.SnpArray{2}:\n", - " (false,true) (false,true) (true,true) … (true,true) (true,true) \n", - " (true,true) (true,true) (false,true) (false,true) (true,false)\n", - " (true,true) (true,true) (false,true) (true,true) (true,true) \n", - " (true,true) (true,true) (true,true) (false,true) (true,true) \n", - " (true,true) (true,true) (true,true) (true,true) (false,true)\n", - " (false,true) (false,true) (true,true) … (true,true) (true,true) \n", - " (false,false) (false,false) (true,true) (true,true) (true,true) \n", - " (true,true) (true,true) (true,true) (true,true) (false,true)\n", - " (true,true) (true,true) (false,true) (true,true) (true,true) \n", - " (true,true) (true,true) (true,true) (false,true) (true,true) \n", - " (true,true) (true,true) (false,true) … (true,true) (true,true) \n", - " (false,true) (false,true) (true,true) (true,true) (false,true)\n", - " (true,true) (true,true) (true,true) (true,true) (false,true)\n", - " ⋮ ⋱ \n", - " (false,true) (false,true) (true,true) (false,true) (false,true)\n", - " (false,true) (false,true) (false,true) (false,true) (true,true) \n", - " (true,true) (true,true) (false,true) … (false,true) (true,true) \n", - " (false,true) (false,true) (true,true) (true,true) (false,true)\n", - " (true,true) (true,true) (true,true) (false,true) (true,true) \n", - " (true,true) (true,true) (true,false) (false,false) (false,true)\n", - " (true,true) (true,true) (true,true) (true,true) (false,true)\n", - " (true,true) (true,true) (false,true) … (true,true) (true,true) \n", - " (true,true) (true,true) (true,true) (false,true) (true,true) \n", - " (true,true) (true,true) (true,true) (true,true) (false,true)\n", - " (false,true) (false,true) (true,true) (true,true) (true,true) \n", - " (true,true) (true,true) (true,true) (true,true) (true,true) " - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# read in genotype data from Plink binary file (~50 secs on my laptop)\n", - "@time cg10k = SnpArray(\"cg10k\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Summary statistics of SNP data" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "(6670,630860)" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "people, snps = size(cg10k)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 26.436140 seconds (33.44 k allocations: 10.920 MB)\n" - ] - } - ], - "source": [ - "# summary statistics (~50 secs on my laptop)\n", - "@time maf, _, missings_by_snp, = summarize(cg10k);" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "(\n", - "[0.00841726 0.124063 … 0.364253 0.5],\n", - "\n", - "0.24536516625042462)" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# 5 number summary and average MAF (minor allele frequencies)\n", - "quantile(maf, [0.0 .25 .5 .75 1.0]), mean(maf)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Pkg.add(\"Plots\")\n", - "# Pkg.add(\"PyPlot\")\n", - "using Plots\n", - "pyplot()\n", - "\n", - "histogram(maf, xlab = \"Minor Allele Frequency (MAF)\", label = \"MAF\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "0.0013128198764010824" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# proportion of missing genotypes\n", - "sum(missings_by_snp) / length(cg10k)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "0.07228069619249913" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# proportion of rare SNPs with maf < 0.05\n", - "countnz(maf .< 0.05) / length(maf)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Empirical kinship matrix\n", - "\n", - "We estimate empirical kinship based on all SNPs by the genetic relation matrix (GRM). Missing genotypes are imputed on the fly by drawing according to the minor allele frequencies." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "2099.911248 seconds (29.50 G allocations: 441.429 GB, 1.50% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "6670×6670 Array{Float64,2}:\n", - " 0.502916 0.00329978 -0.000116213 … -6.46286e-5 -0.00281229 \n", - " 0.00329978 0.49892 -0.00201992 0.000909871 0.00345573 \n", - " -0.000116213 -0.00201992 0.493632 0.000294565 -0.000349854\n", - " 0.000933977 -0.00320391 -0.0018611 -0.00241682 -0.00127078 \n", - " -7.75429e-5 -0.0036075 0.00181442 0.00213976 -0.00158382 \n", - " 0.00200371 0.000577386 0.0025455 … 0.000943753 -1.82994e-6 \n", - " 0.000558503 0.00241421 -0.0018782 0.001217 -0.00123924 \n", - " -0.000659495 0.00319987 -0.00101496 0.00353646 -0.00024093 \n", - " -0.00102619 -0.00120448 -0.00055462 0.00175586 0.00181899 \n", - " -0.00136838 0.00211996 0.000119128 -0.00147305 -0.00105239 \n", - " -0.00206144 0.000148818 -0.000475177 … -0.000265522 -0.00106123 \n", - " 0.000951016 0.00167042 0.00183545 -0.000703658 -0.00313334 \n", - " 0.000330442 -0.000904147 0.00301478 0.000754772 -0.00127413 \n", - " ⋮ ⋱ \n", - " 0.00301137 0.00116042 0.00100426 6.67254e-6 0.00307069 \n", - " -0.00214008 0.00270925 -0.00185054 -0.00109935 0.00366816 \n", - " 0.000546739 -0.00242646 -0.00305264 … -0.000629014 0.00210779 \n", - " -0.00422553 -0.0020713 -0.00109052 -0.000705804 -0.000508055\n", - " -0.00318405 -0.00075385 0.00312377 0.00052883 -3.60969e-5 \n", - " 0.000430196 -0.00197163 0.00268545 -0.00633175 -0.00520337 \n", - " 0.00221429 0.000849792 -0.00101111 -0.000943129 -0.000624419\n", - " -0.00229025 -0.000130598 0.000101853 … 0.000840136 -0.00230224 \n", - " -0.00202917 0.00233007 -0.00131006 0.00197798 -0.000513771\n", - " -0.000964907 -0.000872326 -7.06722e-5 0.00124702 -0.00295844 \n", - " -6.46286e-5 0.000909871 0.000294565 0.500983 0.000525615\n", - " -0.00281229 0.00345573 -0.000349854 0.000525615 0.500792 " - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# GRM using all SNPs (~10 mins on my laptop)\n", - "srand(123)\n", - "@time Φgrm = grm(cg10k; method = :GRM)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "source": [ - "## Phenotypes" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Read in the phenotype data and compute descriptive statistics." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "
Trait1Trait2Trait3Trait4Trait5Trait6Trait7Trait8Trait9Trait10Trait11Trait12Trait13
1-1.81573145026234-0.946150461472831.11363077580442-2.098671211191590.7444166141117480.001391718840801310.934732480409667-1.226773154181031.1160784277875-0.44362803350290.824465656443384-1.02852542216546-0.394049201727681
2-1.244400943787290.1096599925471790.467119394241789-1.621313040975891.05667583556830.9789469794191811.000146339460470.324874271402281.162321752196962.69227069487053.082636724610471.090649547860130.0256616415357438
31.455669145023051.538669329232431.094029593765550.586655272226893-0.32796454430367-0.30337709778827-0.0334354881314741-0.464463064285437-0.3319396273436-0.486839089635991-1.10648681564373-1.42015780427231-0.687463456644413
4-0.7688092766985480.5134908855142490.244263028382142-1.317402544756911.193937743268451.173441277342881.087374266752320.5360225837322610.8027592407620680.2341594117498150.394174866891074-0.7673658924760290.0635385761884935
5-0.264415132547719-0.348240421825694-0.02390650834136060.004739158022449481.256191917121931.20388836676311.298007390426270.3101136602473110.6261598610593520.8992891298312240.549967833508120.5406878095420480.179675416046033
6-1.37617270917293-1.471919677445640.291179894254146-0.803110740704731-0.264239977442213-0.260573027836772-0.165372266287781-0.2192572941183621.04702422290318-0.09858155346164820.9473934380684480.5940148120314380.245407436348479
70.1009416296374-0.191615722103455-0.5674213215966770.378571487240382-0.246656179817904-0.6088107500538580.189081058215596-1.27077787326519-0.4524761991439650.7025628772977240.3326362189571790.00269165036261810.317117176705358
8-0.3198182763674641.357744806572830.818689545938528-1.155655316443520.634483681022590.2914619086346790.933323714954726-0.7410832896824920.647477683507572-0.9708776270779660.2208611654113040.852512250237764-0.225904624283945
9-0.2883341733420320.5660825380907520.254958336116175-0.6525783028697140.6689215592773470.9783091991705580.1228629660419381.47909263782140.06721324241734490.07959039175278270.1675324552432320.2469155794421390.539932616458363
10-1.15759732583991-0.781198583545165-0.595807759833517-1.005549802604020.7898288859333210.5710584133790440.951304176233755-0.2959629829848160.990420024797070.5613093669889830.733100030623233-1.73467772245684-1.35278484330654
110.7405691504590311.408738467554150.7346899994400880.0208322841295094-0.337440968561619-0.458304040611395-0.142582512772326-0.580392297464107-0.684684998101516-0.00785381461893456-0.712244337518008-0.313345561230878-0.345419463162219
12-0.6758924864549950.2798926138296820.267915996308248-1.041036653929850.9107417156458880.8660276185131711.074144317020050.03817510035383020.766355377018601-0.340118016143495-0.8090139585050590.548521663785885-0.0201828675962336
13-0.795410435603455-0.6999899397627380.3991295030063-0.5104762619007361.515522454168441.287430329394671.537723932509030.1339891601177021.020257368860370.499018733899186-0.36948273277931-1.10153460436318-0.598132438886619
14-0.193483122930324-0.286021160323518-0.6914942252629950.01315816787006991.523374706867821.40106380722621.531146204518960.3330664834780751.043724803810990.163206783570466-0.422883765001728-0.383527976713573-0.489221907788158
150.1512462033797182.091851089936142.03800472474384-1.124747171435311.665570243907131.625356751095761.587510704836550.6358521860437760.8425777846059790.450761870778952-1.39479033623028-0.5609841075677680.289349776549287
16-0.4646087408127120.361276947723031.2327673928287-0.8260337310863831.434752247099831.744518238188460.2110968874846382.648164251405481.025114331460960.119757316031840.0596832073448267-0.631231612661616-0.207878671782927
17-0.732977488012215-0.5262234258897790.61657871336593-0.554479743325930.9474848590251040.9368332141381730.9725168063355240.2902510138652271.012853597257230.516207422283291-0.03006891719881940.87873225245830.450254629309513
18-0.1673264596221190.1753271654872370.287467725892572-0.4026525320842460.5511815094180560.5222047432909750.4368376600946530.2995649338455790.583109520896067-0.704415820005353-0.730810367994577-1.95140580379896-0.933504665700164
191.411594857874181.787224079010170.843976395853640.481278083772991-0.0887673728508268-0.499577574268580.304195897924847-1.23884208383369-0.153475724036624-0.8704861027883290.0955473331150403-0.983708050882817-0.3563445644514
20-1.42997091652825-0.4901470450342130.272730237607695-1.610299929541530.9907878171977480.7116875326081841.1885836012715-0.3712291880756381.24703459239952-0.03891623322715160.8834957490728722.589880263210173.33539552370368
21-0.1472472881767650.123284304156520.617549051912237-0.187130771782620.2564381075866940.177949837350830.412611806463263-0.2448091245597370.09476248061364920.723017223849532-0.6839483546334360.0873751276309269-0.262209652750371
22-0.187112676773894-0.270777264595619-1.015568185516060.06028505686002330.2724197577579780.869133161879197-0.6575194614142342.32388522018189-0.9999360115250341.446718441783060.971157886040772-0.358747904241515-0.439657942096136
23-1.82434047163768-0.9334804460680671.29474003766977-1.945452211510360.335846511896540.3592016543028440.513652924365886-0.0731976966969581.571390428120051.533293713267281.820768218595282.227403018678291.50063347195857
24-2.29344084351335-2.491618423444180.40383988742336-2.364880747529481.41052548319561.422441171477921.170241662721720.844766501768551.790268754324950.648181858970515-0.0857231057403538-1.027895352926170.491288088952859
25-0.4341359328883050.7408819890346520.699576357578518-1.024055431877750.7595292239837130.9566561108952880.6332995686565890.7707339322685160.8249885117145261.842874376347691.91045942063443-0.5023172078693660.132670133448219
26-2.1920969546557-2.494656642722710.354854763893431-1.931558486357140.9419794002899380.9789171014141060.8948600972897360.4632394028318731.125371333171631.705284461919550.7177927144791230.6458880491082610.783968250169388
27-1.46602269088422-1.249216771018970.307977693653039-1.550973646609890.6189084944747980.6625081716620420.4759571739060780.4847186745977070.4015648920282490.55987973254026-0.376938143754217-0.9339826292282180.390013151672955
28-1.83317744236881-1.532687878287012.55674262685865-1.518277457838350.7894096017464550.9087477997285880.6499719229414790.6683736499316671.200583035199030.2779632560756371.25049531982753.313704450716382.22035828885342
29-0.7845466282431780.2765825795439313.01104958800057-1.119788432067580.9208238584227070.7502176898861511.26153730009639-0.4033638829224170.400667296857811-0.217597941303479-0.724669537565068-0.391945338467193-0.650023936358253
300.4644559163451351.3326356122229-1.23059563374303-0.3579759589374141.182497469771041.54315938069757-0.603390411540623.383088459584220.823740765148641-0.129951318508883-0.657979878422938-0.499534924074273-0.414476569095651
" - ], - "text/plain": [ - "6670×13 DataFrames.DataFrame\n", - "│ Row │ Trait1 │ Trait2 │ Trait3 │ Trait4 │ Trait5 │\n", - "├──────┼───────────┼───────────┼────────────┼────────────┼────────────┤\n", - "│ 1 │ -1.81573 │ -0.94615 │ 1.11363 │ -2.09867 │ 0.744417 │\n", - "│ 2 │ -1.2444 │ 0.10966 │ 0.467119 │ -1.62131 │ 1.05668 │\n", - "│ 3 │ 1.45567 │ 1.53867 │ 1.09403 │ 0.586655 │ -0.327965 │\n", - "│ 4 │ -0.768809 │ 0.513491 │ 0.244263 │ -1.3174 │ 1.19394 │\n", - "│ 5 │ -0.264415 │ -0.34824 │ -0.0239065 │ 0.00473916 │ 1.25619 │\n", - "│ 6 │ -1.37617 │ -1.47192 │ 0.29118 │ -0.803111 │ -0.26424 │\n", - "│ 7 │ 0.100942 │ -0.191616 │ -0.567421 │ 0.378571 │ -0.246656 │\n", - "│ 8 │ -0.319818 │ 1.35774 │ 0.81869 │ -1.15566 │ 0.634484 │\n", - "│ 9 │ -0.288334 │ 0.566083 │ 0.254958 │ -0.652578 │ 0.668922 │\n", - "│ 10 │ -1.1576 │ -0.781199 │ -0.595808 │ -1.00555 │ 0.789829 │\n", - "│ 11 │ 0.740569 │ 1.40874 │ 0.73469 │ 0.0208323 │ -0.337441 │\n", - "⋮\n", - "│ 6659 │ -0.131005 │ 0.425378 │ -1.09015 │ -0.42093 │ 0.310943 │\n", - "│ 6660 │ -0.52427 │ 1.04173 │ 1.13749 │ -1.27262 │ -0.607382 │\n", - "│ 6661 │ 1.32516 │ 0.905899 │ 0.84261 │ 0.869143 │ -0.691461 │\n", - "│ 6662 │ -1.44368 │ -2.55708 │ -0.868193 │ 0.390457 │ -0.0331364 │\n", - "│ 6663 │ -1.8518 │ -1.25726 │ 1.81724 │ -2.17285 │ 0.948696 │\n", - "│ 6664 │ -0.810034 │ 0.0896703 │ 0.530939 │ -1.14212 │ 1.26049 │\n", - "│ 6665 │ -1.22395 │ -1.48953 │ -2.95847 │ -0.645307 │ -2.67022 │\n", - "│ 6666 │ -0.282847 │ -1.54129 │ -1.38819 │ 1.48327 │ -1.23406 │\n", - "│ 6667 │ 0.475008 │ 1.46697 │ 0.497403 │ -0.46861 │ 0.496553 │\n", - "│ 6668 │ -0.408154 │ -0.325323 │ 0.0850869 │ -0.182984 │ -0.46577 │\n", - "│ 6669 │ 0.886626 │ 0.487408 │ -0.0977307 │ 0.830857 │ -0.168597 │\n", - "│ 6670 │ -1.24394 │ 0.213697 │ 2.74965 │ -1.80285 │ 1.11401 │\n", - "\n", - "│ Row │ Trait6 │ Trait7 │ Trait8 │ Trait9 │ Trait10 │\n", - "├──────┼────────────┼────────────┼───────────┼───────────┼─────────────┤\n", - "│ 1 │ 0.00139172 │ 0.934732 │ -1.22677 │ 1.11608 │ -0.443628 │\n", - "│ 2 │ 0.978947 │ 1.00015 │ 0.324874 │ 1.16232 │ 2.69227 │\n", - "│ 3 │ -0.303377 │ -0.0334355 │ -0.464463 │ -0.33194 │ -0.486839 │\n", - "│ 4 │ 1.17344 │ 1.08737 │ 0.536023 │ 0.802759 │ 0.234159 │\n", - "│ 5 │ 1.20389 │ 1.29801 │ 0.310114 │ 0.62616 │ 0.899289 │\n", - "│ 6 │ -0.260573 │ -0.165372 │ -0.219257 │ 1.04702 │ -0.0985816 │\n", - "│ 7 │ -0.608811 │ 0.189081 │ -1.27078 │ -0.452476 │ 0.702563 │\n", - "│ 8 │ 0.291462 │ 0.933324 │ -0.741083 │ 0.647478 │ -0.970878 │\n", - "│ 9 │ 0.978309 │ 0.122863 │ 1.47909 │ 0.0672132 │ 0.0795904 │\n", - "│ 10 │ 0.571058 │ 0.951304 │ -0.295963 │ 0.99042 │ 0.561309 │\n", - "│ 11 │ -0.458304 │ -0.142583 │ -0.580392 │ -0.684685 │ -0.00785381 │\n", - "⋮\n", - "│ 6659 │ 0.735302 │ -0.0923875 │ 1.35506 │ 1.0652 │ 0.643122 │\n", - "│ 6660 │ -0.613654 │ -0.233233 │ -0.72178 │ 0.948663 │ -1.19573 │\n", - "│ 6661 │ -0.433578 │ -0.890664 │ 0.4462 │ -1.71787 │ -0.457925 │\n", - "│ 6662 │ -2.6614 │ 0.750482 │ -5.62549 │ 0.86061 │ 0.506262 │\n", - "│ 6663 │ -0.351239 │ -0.16338 │ -0.376714 │ 0.781865 │ 1.62379 │\n", - "│ 6664 │ 1.30857 │ 1.00127 │ 0.876074 │ 1.37296 │ -0.727299 │\n", - "│ 6665 │ -1.74132 │ -4.1342 │ 2.52424 │ -1.4819 │ 1.32056 │\n", - "│ 6666 │ -1.10252 │ -0.717352 │ -0.905295 │ -1.07487 │ 0.505224 │\n", - "│ 6667 │ 0.195008 │ 0.661188 │ -0.544159 │ 1.54089 │ -0.290803 │\n", - "│ 6668 │ 0.251995 │ -0.915712 │ 1.62863 │ -2.2893 │ -0.813678 │\n", - "│ 6669 │ -0.586483 │ 0.256554 │ -1.32053 │ -1.10432 │ -1.7194 │\n", - "│ 6670 │ 1.16462 │ 1.1404 │ 0.451617 │ 1.40992 │ -0.482502 │\n", - "\n", - "│ Row │ Trait11 │ Trait12 │ Trait13 │\n", - "├──────┼───────────┼────────────┼───────────┤\n", - "│ 1 │ 0.824466 │ -1.02853 │ -0.394049 │\n", - "│ 2 │ 3.08264 │ 1.09065 │ 0.0256616 │\n", - "│ 3 │ -1.10649 │ -1.42016 │ -0.687463 │\n", - "│ 4 │ 0.394175 │ -0.767366 │ 0.0635386 │\n", - "│ 5 │ 0.549968 │ 0.540688 │ 0.179675 │\n", - "│ 6 │ 0.947393 │ 0.594015 │ 0.245407 │\n", - "│ 7 │ 0.332636 │ 0.00269165 │ 0.317117 │\n", - "│ 8 │ 0.220861 │ 0.852512 │ -0.225905 │\n", - "│ 9 │ 0.167532 │ 0.246916 │ 0.539933 │\n", - "│ 10 │ 0.7331 │ -1.73468 │ -1.35278 │\n", - "│ 11 │ -0.712244 │ -0.313346 │ -0.345419 │\n", - "⋮\n", - "│ 6659 │ 0.35674 │ 0.456428 │ 0.882577 │\n", - "│ 6660 │ 0.366737 │ 1.78286 │ 1.90764 │\n", - "│ 6661 │ -0.418756 │ -0.275519 │ -0.912778 │\n", - "│ 6662 │ 1.31914 │ -1.44981 │ -1.77373 │\n", - "│ 6663 │ 0.770329 │ -0.0470789 │ 1.50496 │\n", - "│ 6664 │ 0.757479 │ 1.10001 │ 1.29115 │\n", - "│ 6665 │ 1.29209 │ 0.697478 │ 0.228819 │\n", - "│ 6666 │ 1.00973 │ -0.362158 │ -1.55022 │\n", - "│ 6667 │ 0.141684 │ 0.183218 │ 0.122664 │\n", - "│ 6668 │ -0.2214 │ -0.575183 │ 0.399583 │\n", - "│ 6669 │ -0.985545 │ -0.636874 │ -0.439825 │\n", - "│ 6670 │ 1.39201 │ 0.299931 │ 0.392809 │" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Pkg.add(\"DataFrames\")\n", - "using DataFrames\n", - "\n", - "cg10k_trait = readtable(\n", - " \"cg10k_traits.txt\"; \n", - " separator = ' ',\n", - " names = [:FID; :IID; :Trait1; :Trait2; :Trait3; :Trait4; :Trait5; :Trait6; \n", - " :Trait7; :Trait8; :Trait9; :Trait10; :Trait11; :Trait12; :Trait13], \n", - " eltypes = [String; String; Float64; Float64; Float64; Float64; Float64; \n", - " Float64; Float64; Float64; Float64; Float64; Float64; Float64; Float64]\n", - " )\n", - "# do not display FID and IID for privacy\n", - "cg10k_trait[:, 3:end]" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Trait1\n", - "Min -3.2041280147118\n", - "1st Qu. -0.645770976594801\n", - "Median 0.12500996951180798\n", - "Mean 0.0022113846331389903\n", - "3rd Qu. 0.7233154897636109\n", - "Max 3.47939787136478\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait2\n", - "Min -3.51165862877157\n", - "1st Qu. -0.6426205239938769\n", - "Median 0.0335172506981786\n", - "Mean 0.0013525291443179934\n", - "3rd Qu. 0.6574666174104795\n", - "Max 4.91342267449592\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait3\n", - "Min -3.93843646263987\n", - "1st Qu. -0.6409067201835312\n", - "Median -0.000782161570259152\n", - "Mean -0.0012959062525954158\n", - "3rd Qu. 0.6371084235689337\n", - "Max 7.91629946619107\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait4\n", - "Min -3.60840330795393\n", - "1st Qu. -0.5460856267376792\n", - "Median 0.228165419346029\n", - "Mean 0.0023089259432067487\n", - "3rd Qu. 0.7152907338009037\n", - "Max 3.12768818152017\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait5\n", - "Min -4.14874907974159\n", - "1st Qu. -0.6907651815712424\n", - "Median 0.03103429560265845\n", - "Mean -0.0017903947913742396\n", - "3rd Qu. 0.7349158784775832\n", - "Max 2.71718436484651\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait6\n", - "Min -3.82479174931095\n", - "1st Qu. -0.66279630694076\n", - "Median 0.03624198629403995\n", - "Mean -0.001195980597331062\n", - "3rd Qu. 0.7411755243742835\n", - "Max 2.58972802240228\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait7\n", - "Min -4.27245540828955\n", - "1st Qu. -0.6389232588654877\n", - "Median 0.0698010018021233\n", - "Mean -0.0019890555724116853\n", - "3rd Qu. 0.7104228734967848\n", - "Max 2.65377857124275\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait8\n", - "Min -5.62548796912517\n", - "1st Qu. -0.6015747053036895\n", - "Median -0.0386301401797661\n", - "Mean 0.0006140754882985941\n", - "3rd Qu. 0.5273417705306229\n", - "Max 5.8057022359485\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait9\n", - "Min -5.38196778211456\n", - "1st Qu. -0.6014287731518225\n", - "Median 0.10657100636146799\n", - "Mean -0.0018096522573535152\n", - "3rd Qu. 0.6985667613073132\n", - "Max 2.57193558386964\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait10\n", - "Min -3.54850550601412\n", - "1st Qu. -0.6336406665339003\n", - "Median -0.0966507436079331\n", - "Mean -0.0004370294352533275\n", - "3rd Qu. 0.498610364573902\n", - "Max 6.53782005410551\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait11\n", - "Min -3.26491021902041\n", - "1st Qu. -0.6736846396608624\n", - "Median -0.0680437088585371\n", - "Mean -0.0006159181111862523\n", - "3rd Qu. 0.6554864114250585\n", - "Max 4.26240968462615\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait12\n", - "Min -8.85190902714652\n", - "1st Qu. -0.5396855871831098\n", - "Median -0.1410985990171995\n", - "Mean -0.0005887830910961934\n", - "3rd Qu. 0.35077884186653374\n", - "Max 13.2114017261714\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n", - "Trait13\n", - "Min -5.59210353493304\n", - "1st Qu. -0.49228904714392474\n", - "Median -0.14102175804213551\n", - "Mean -0.0001512383146454028\n", - "3rd Qu. 0.32480412746681697\n", - "Max 24.174436145414\n", - "NAs 0\n", - "NA% 0.0%\n", - "\n" - ] - } - ], - "source": [ - "describe(cg10k_trait[:, 3:end])" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "Y = convert(Matrix{Float64}, cg10k_trait[:, 3:15])\n", - "histogram(Y, layout = 13)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Pre-processing data for heritability analysis" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "To prepare variance component model fitting, we form an instance of `VarianceComponentVariate`. The two variance components are $(2\\Phi, I)$." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "3-element Array{Symbol,1}:\n", - " :Y\n", - " :X\n", - " :V" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "using VarianceComponentModels\n", - "\n", - "# form data as VarianceComponentVariate\n", - "cg10kdata = VarianceComponentVariate(Y, (2Φgrm, eye(size(Y, 1))))\n", - "fieldnames(cg10kdata)" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "VarianceComponentModels.VarianceComponentVariate{Float64,2,Array{Float64,2},Array{Float64,2},Array{Float64,2}}([-1.81573 -0.94615 … -1.02853 -0.394049; -1.2444 0.10966 … 1.09065 0.0256616; … ; 0.886626 0.487408 … -0.636874 -0.439825; -1.24394 0.213697 … 0.299931 0.392809],,(\n", - "[1.00583 0.00659955 … -0.000129257 -0.00562459; 0.00659955 0.99784 … 0.00181974 0.00691145; … ; -0.000129257 0.00181974 … 1.00197 0.00105123; -0.00562459 0.00691145 … 0.00105123 1.00158],\n", - "\n", - "[1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]))" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cg10kdata" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Before fitting the variance component model, we pre-compute the eigen-decomposition of $2\\Phi_{\\text{GRM}}$, the rotated responses, and the constant part in log-likelihood, and store them as a `TwoVarCompVariateRotate` instance, which is re-used in various variane component estimation procedures." - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 46.051433 seconds (726.58 k allocations: 1.024 GB, 0.45% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "5-element Array{Symbol,1}:\n", - " :Yrot \n", - " :Xrot \n", - " :eigval \n", - " :eigvec \n", - " :logdetV2" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# pre-compute eigen-decomposition (~50 secs on my laptop)\n", - "@time cg10kdata_rotated = TwoVarCompVariateRotate(cg10kdata)\n", - "fieldnames(cg10kdata_rotated)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Save intermediate results\n", - "\n", - "We don't want to re-compute SnpArray and empirical kinship matrices again and again for heritibility analysis." - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true, - "scrolled": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " Base 47039 KB Module\n", - " BinDeps 218 KB Module\n", - " Blosc 59 KB Module\n", - " ColorTypes 8492 KB Module\n", - " Colors 8414 KB Module\n", - " Compat 8100 KB Module\n", - " Conda 8491 KB Module\n", - " Core 19714 KB Module\n", - " DataArrays 8444 KB Module\n", - " DataFrames 8940 KB Module\n", - " DataStructures 234 KB Module\n", - " FileIO 9044 KB Module\n", - " FixedPointNumbers 8573 KB Module\n", - " FixedSizeArrays 8222 KB Module\n", - " GZip 8003 KB Module\n", - " HDF5 8809 KB Module\n", - " IJulia 2082578 KB Module\n", - " Ipopt 32 KB Module\n", - " IterativeSolvers 333 KB Module\n", - " Iterators 44 KB Module\n", - " JLD 9092 KB Module\n", - " JSON 8124 KB Module\n", - " KNITRO 218 KB Module\n", - " LaTeXStrings 4622 bytes Module\n", - " LegacyStrings 67 KB Module\n", - " MacroTools 8190 KB Module\n", - " Main 2498588 KB Module\n", - " MathProgBase 272 KB Module\n", - " Measures 24 KB Module\n", - " Nettle 8048 KB Module\n", - " PlotThemes 7999 KB Module\n", - " PlotUtils 8264 KB Module\n", - " Plots 12383 KB Module\n", - " PyCall 10435 KB Module\n", - " PyPlot 9938 KB Module\n", - " RecipesBase 8158 KB Module\n", - " Reexport 6178 bytes Module\n", - " SHA 71 KB Module\n", - " Showoff 8024 KB Module\n", - " SnpArrays 8191 KB Module\n", - " SortingAlgorithms 28 KB Module\n", - " SpecialFunctions 8257 KB Module\n", - " StatsBase 8662 KB Module\n", - " URIParser 8064 KB Module\n", - " VarianceComponentModels 189 KB Module\n", - " Y 677 KB 6670×13 Array{Float64,2}\n", - " ZMQ 8058 KB Module\n", - " _ 77 KB 630860-element BitArray{1}\n", - " cg10k 1027303 KB 6670×630860 SnpArrays.SnpArray{2}\n", - " cg10k_trait 978 KB 6670×15 DataFrames.DataFrame\n", - " cg10kdata 695816 KB VarianceComponentModels.VarianceCo…\n", - " cg10kdata_rotated 348299 KB VarianceComponentModels.TwoVarComp…\n", - " maf 4928 KB 630860-element Array{Float64,1}\n", - " missings_by_snp 4928 KB 630860-element Array{Int64,1}\n", - " people 8 bytes Int64\n", - " snps 8 bytes Int64\n", - " Φgrm 347569 KB 6670×6670 Array{Float64,2}\n" - ] - } - ], - "source": [ - "# Pkg.add(\"JLD\")\n", - "#using JLD\n", - "#@save \"cg10k.jld\"\n", - "#whos()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "To load workspace" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true, - "scrolled": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " Base 39406 KB Module\n", - " BinDeps 218 KB Module\n", - " Blosc 59 KB Module\n", - " ColorTypes 6371 KB Module\n", - " Colors 6378 KB Module\n", - " Compat 6099 KB Module\n", - " Conda 6460 KB Module\n", - " Core 15532 KB Module\n", - " DataArrays 6388 KB Module\n", - " DataFrames 649 KB Module\n", - " DataStructures 234 KB Module\n", - " FileIO 6870 KB Module\n", - " FixedPointNumbers 6579 KB Module\n", - " FixedSizeArrays 6226 KB Module\n", - " GZip 6004 KB Module\n", - " HDF5 6657 KB Module\n", - " IJulia 7125 KB Module\n", - " Ipopt 32 KB Module\n", - " IterativeSolvers 333 KB Module\n", - " Iterators 44 KB Module\n", - " JLD 6839 KB Module\n", - " JSON 6132 KB Module\n", - " KNITRO 218 KB Module\n", - " LaTeXStrings 4622 bytes Module\n", - " LegacyStrings 68 KB Module\n", - " MacroTools 6196 KB Module\n", - " Main 2489170 KB Module\n", - " MathProgBase 272 KB Module\n", - " Measures 21 KB Module\n", - " Nettle 6068 KB Module\n", - " PlotThemes 6016 KB Module\n", - " PlotUtils 6174 KB Module\n", - " Plots 9209 KB Module\n", - " PyCall 7377 KB Module\n", - " PyPlot 7686 KB Module\n", - " RecipesBase 179 KB Module\n", - " Reexport 6178 bytes Module\n", - " SHA 71 KB Module\n", - " Showoff 27 KB Module\n", - " SnpArrays 98 KB Module\n", - " SortingAlgorithms 28 KB Module\n", - " SpecialFunctions 6277 KB Module\n", - " StatsBase 573 KB Module\n", - " URIParser 6085 KB Module\n", - " VarianceComponentModels 189 KB Module\n", - " Y 677 KB 6670×13 Array{Float64,2}\n", - " ZMQ 6076 KB Module\n", - " cg10k 1027303 KB 6670×630860 SnpArrays.SnpArray{2}\n", - " cg10k_trait 978 KB 6670×15 DataFrames.DataFrame\n", - " cg10kdata 695816 KB VarianceComponentModels.VarianceCo…\n", - " cg10kdata_rotated 348299 KB VarianceComponentModels.TwoVarComp…\n", - " maf 4928 KB 630860-element Array{Float64,1}\n", - " missings_by_snp 4928 KB 630860-element Array{Int64,1}\n", - " people 8 bytes Int64\n", - " snps 8 bytes Int64\n", - " Φgrm 347569 KB 6670×6670 Array{Float64,2}\n" - ] - } - ], - "source": [ - "using SnpArrays, JLD, DataFrames, VarianceComponentModels, Plots\n", - "pyplot()\n", - "@load \"cg10k.jld\"\n", - "whos()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Heritability of single traits" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "We use Fisher scoring algorithm to fit variance component model for each single trait." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true, - "scrolled": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Trait1\n", - "(σ2a[trait],σ2e[trait]) = (0.26104123217397623,0.7356884432614108)\n", - "Trait2\n", - "(σ2a[trait],σ2e[trait]) = (0.18874147380283665,0.8106899991616688)\n", - "Trait3\n", - "(σ2a[trait],σ2e[trait]) = (0.3185719276547346,0.6801458862875847)\n", - "Trait4\n", - "(σ2a[trait],σ2e[trait]) = (0.26556901333953487,0.7303588364945325)\n", - "Trait5\n", - "(σ2a[trait],σ2e[trait]) = (0.28123321193922013,0.7167989047155017)\n", - "Trait6\n", - "(σ2a[trait],σ2e[trait]) = (0.2829461149704479,0.7165629534396428)\n", - "Trait7\n", - "(σ2a[trait],σ2e[trait]) = (0.21543856403949083,0.7816211121585646)\n", - "Trait8\n", - "(σ2a[trait],σ2e[trait]) = (0.19412648732666096,0.8055277649986139)\n", - "Trait9\n", - "(σ2a[trait],σ2e[trait]) = (0.24789561127296741,0.7504615853619878)\n", - "Trait10\n", - "(σ2a[trait],σ2e[trait]) = (0.10007455815561886,0.899815277360586)\n", - "Trait11\n", - "(σ2a[trait],σ2e[trait]) = (0.16486778169300415,0.8338002257315682)\n", - "Trait12\n", - "(σ2a[trait],σ2e[trait]) = (0.08298660416198149,0.9158035668415443)\n", - "Trait13\n", - "(σ2a[trait],σ2e[trait]) = (0.05684248094794614,0.9423653381325947)\n", - "elapsed time: 0.19565668 seconds\n" - ] - }, - { - "data": { - "text/plain": [ - "0.19565668" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# heritability from single trait analysis\n", - "hST = zeros(13)\n", - "# standard errors of estimated heritability\n", - "hST_se = zeros(13)\n", - "# additive genetic effects\n", - "σ2a = zeros(13)\n", - "# enviromental effects\n", - "σ2e = zeros(13)\n", - "\n", - "tic()\n", - "for trait in 1:13\n", - " println(names(cg10k_trait)[trait + 2])\n", - " # form data set for trait j\n", - " traitj_data = TwoVarCompVariateRotate(cg10kdata_rotated.Yrot[:, trait], cg10kdata_rotated.Xrot, \n", - " cg10kdata_rotated.eigval, cg10kdata_rotated.eigvec, cg10kdata_rotated.logdetV2)\n", - " # initialize model parameters\n", - " traitj_model = VarianceComponentModel(traitj_data)\n", - " # estimate variance components\n", - " _, _, _, Σcov, _, _ = mle_fs!(traitj_model, traitj_data; solver=:Ipopt, verbose=false)\n", - " σ2a[trait] = traitj_model.Σ[1][1]\n", - " σ2e[trait] = traitj_model.Σ[2][1]\n", - " @show σ2a[trait], σ2e[trait]\n", - " h, hse = heritability(traitj_model.Σ, Σcov)\n", - " hST[trait] = h[1]\n", - " hST_se[trait] = hse[1]\n", - "end\n", - "toc()" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "2×13 Array{Float64,2}:\n", - " 0.261898 0.188849 0.318981 … 0.165088 0.0830871 0.0568875\n", - " 0.079869 0.0867203 0.0741462 0.0887725 0.0944835 0.0953863" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# heritability and standard errors\n", - "[hST'; hST_se']" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Pairwise traits\n", - "\n", - "Joint analysis of multiple traits is subject to intensive research recently. Following code snippet does joint analysis of all pairs of traits, a total of 78 bivariate variane component models." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Trait1Trait2\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.260119 0.176216; 0.176216 0.187376],\n", - "\n", - "[0.736589 0.583892; 0.583892 0.812033])\n", - "Trait1Trait3\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.261564 -0.0131268; -0.0131268 0.319057],\n", - "\n", - "[0.73518 -0.121127; -0.121127 0.679679])\n", - "Trait1Trait4\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.26088 0.222614; 0.222614 0.265581],\n", - "\n", - "[0.735846 0.599435; 0.599435 0.730347])\n", - "Trait1Trait5\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.260783 -0.147012; -0.147012 0.281877],\n", - "\n", - "[0.735937 -0.254584; -0.254584 0.716176])\n", - "Trait1Trait6\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.260707 -0.129356; -0.129356 0.283188],\n", - "\n", - "[0.736013 -0.231361; -0.231361 0.716329])\n", - "Trait1Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.260308 -0.140258; -0.140258 0.215081],\n", - "\n", - "[0.736406 -0.197805; -0.197805 0.781985])\n", - "Trait1Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.261035 -0.0335296; -0.0335296 0.194143],\n", - "\n", - "[0.735695 -0.126272; -0.126272 0.805512])\n", - "Trait1Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.263016 -0.204865; -0.204865 0.246796],\n", - "\n", - "[0.733794 -0.30745; -0.30745 0.751544])\n", - "Trait1Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.260898 -0.0998176; -0.0998176 0.0970233],\n", - "\n", - "[0.735828 -0.303609; -0.303609 0.902853])\n", - "Trait1Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.26074 -0.138983; -0.138983 0.163063],\n", - "\n", - "[0.735982 -0.359175; -0.359175 0.835595])\n", - "Trait1Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.263069 -0.145536; -0.145536 0.0805136],\n", - "\n", - "[0.733781 -0.0416975; -0.0416975 0.918359])\n", - "Trait1Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.262344 -0.108896; -0.108896 0.051294],\n", - "\n", - "[0.73445 -0.113996; -0.113996 0.947942])\n", - "Trait2Trait3\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.189015 0.146157; 0.146157 0.320529],\n", - "\n", - "[0.810418 0.0974992; 0.0974992 0.678271])\n", - "Trait2Trait4\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188395 0.0752146; 0.0752146 0.265558],\n", - "\n", - "[0.81103 0.220495; 0.220495 0.730369])\n", - "Trait2Trait5\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188716 -0.011314; -0.011314 0.281247],\n", - "\n", - "[0.810715 -0.0370105; -0.0370105 0.716786])\n", - "Trait2Trait6\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188774 -0.0031066; -0.0031066 0.283013],\n", - "\n", - "[0.810658 -0.0211827; -0.0211827 0.716499])\n", - "Trait2Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188352 -0.0299579; -0.0299579 0.215189],\n", - "\n", - "[0.811072 -0.00136939; -0.00136939 0.781868])\n", - "Trait2Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.189262 0.0331423; 0.0331423 0.194666],\n", - "\n", - "[0.810182 -0.0326003; -0.0326003 0.805005])\n", - "Trait2Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.187285 -0.0854146; -0.0854146 0.246719],\n", - "\n", - "[0.812133 -0.0808791; -0.0808791 0.751617])\n", - "Trait2Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188965 -0.125319; -0.125319 0.100121],\n", - "\n", - "[0.810498 -0.271071; -0.271071 0.899849])\n", - "Trait2Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.187762 -0.118479; -0.118479 0.166273],\n", - "\n", - "[0.811653 -0.295549; -0.295549 0.832437])\n", - "Trait2Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.188191 -0.0905383; -0.0905383 0.0822634],\n", - "\n", - "[0.811272 0.045422; 0.045422 0.916586])\n", - "Trait2Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.18826 -0.0707041; -0.0707041 0.0547239],\n", - "\n", - "[0.811217 0.0737977; 0.0737977 0.944521])\n", - "Trait3Trait4\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.31852 -0.154339; -0.154339 0.264754],\n", - "\n", - "[0.680196 -0.30344; -0.30344 0.731152])\n", - "Trait3Trait5\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.31897 0.184354; 0.184354 0.2825],\n", - "\n", - "[0.67976 0.336411; 0.336411 0.715567])\n", - "Trait3Trait6\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.319566 0.16664; 0.16664 0.285031],\n", - "\n", - "[0.679183 0.297698; 0.297698 0.714536])\n", - "Trait3Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.318576 0.166852; 0.166852 0.215232],\n", - "\n", - "[0.680142 0.347139; 0.347139 0.781823])\n", - "Trait3Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.320499 0.0575319; 0.0575319 0.197245],\n", - "\n", - "[0.678283 0.0442597; 0.0442597 0.802474])\n", - "Trait3Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.318719 0.137292; 0.137292 0.246976],\n", - "\n", - "[0.680004 0.267105; 0.267105 0.751357])\n", - "Trait3Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.318915 -0.0786338; -0.0786338 0.101103],\n", - "\n", - "[0.679815 -0.140789; -0.140789 0.898798])\n", - "Trait3Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.317822 -0.017984; -0.017984 0.164743],\n", - "\n", - "[0.680871 -0.114166; -0.114166 0.833923])\n", - "Trait3Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.320888 0.0845248; 0.0845248 0.0869868],\n", - "\n", - "[0.677914 0.0340133; 0.0340133 0.911841])\n", - "Trait3Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.323009 0.110681; 0.110681 0.0611739],\n", - "\n", - "[0.675901 -0.00729662; -0.00729662 0.938072])\n", - "Trait4Trait5\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.265667 -0.215848; -0.215848 0.282919],\n", - "\n", - "[0.730254 -0.376675; -0.376675 0.715164])\n", - "Trait4Trait6\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.266143 -0.200634; -0.200634 0.284442],\n", - "\n", - "[0.729794 -0.346804; -0.346804 0.715112])\n", - "Trait4Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.26449 -0.182752; -0.182752 0.214117],\n", - "\n", - "[0.731415 -0.326172; -0.326172 0.782935])\n", - "Trait4Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.266694 -0.0976354; -0.0976354 0.196126],\n", - "\n", - "[0.729266 -0.15036; -0.15036 0.803571])\n", - "Trait4Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.270037 -0.227407; -0.227407 0.248046],\n", - "\n", - "[0.726025 -0.415601; -0.415601 0.750298])\n", - "Trait4Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.265543 -0.0338107; -0.0338107 0.0996098],\n", - "\n", - "[0.730395 -0.227725; -0.227725 0.900275])\n", - "Trait4Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.265628 -0.09674; -0.09674 0.163274],\n", - "\n", - "[0.730302 -0.272611; -0.272611 0.835371])\n", - "Trait4Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.268164 -0.141613; -0.141613 0.0803968],\n", - "\n", - "[0.727883 -0.0828465; -0.0828465 0.918446])\n", - "Trait4Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.266171 -0.0980731; -0.0980731 0.0540102],\n", - "\n", - "[0.729775 -0.22506; -0.22506 0.945204])\n", - "Trait5Trait6\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.281592 0.280898; 0.280898 0.282298],\n", - "\n", - "[0.716455 0.660368; 0.660368 0.717195])\n", - "Trait5Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.280814 0.2321; 0.2321 0.211662],\n", - "\n", - "[0.717218 0.674304; 0.674304 0.785343])\n", - "Trait5Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.28134 0.163948; 0.163948 0.192703],\n", - "\n", - "[0.716701 0.221032; 0.221032 0.806922])\n", - "Trait5Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.283878 0.244532; 0.244532 0.241293],\n", - "\n", - "[0.714244 0.508417; 0.508417 0.756894])\n", - "Trait5Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.281813 -0.0462114; -0.0462114 0.101481],\n", - "\n", - "[0.716238 -0.0572111; -0.0572111 0.898424])\n", - "Trait5Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.280424 0.0202495; 0.0202495 0.164003],\n", - "\n", - "[0.717586 -0.0352436; -0.0352436 0.834649])\n", - "Trait5Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.281427 0.0616131; 0.0616131 0.0827166],\n", - "\n", - "[0.716615 0.0529286; 0.0529286 0.916074])\n", - "Trait5Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.282292 0.0704221; 0.0704221 0.0569463],\n", - "\n", - "[0.715782 0.0528374; 0.0528374 0.942268])\n", - "Trait6Trait7\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.282961 0.220656; 0.220656 0.213856],\n", - "\n", - "[0.716549 0.581083; 0.581083 0.783179])\n", - "Trait6Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.282961 0.18408; 0.18408 0.192379],\n", - "\n", - "[0.716549 0.436597; 0.436597 0.807246])\n", - "Trait6Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.284978 0.234436; 0.234436 0.243207],\n", - "\n", - "[0.714601 0.476826; 0.476826 0.755028])\n", - "Trait6Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.283655 -0.0435485; -0.0435485 0.102025],\n", - "\n", - "[0.715877 -0.0591681; -0.0591681 0.897886])\n", - "Trait6Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.281522 0.0279992; 0.0279992 0.16343],\n", - "\n", - "[0.717946 -0.0524106; -0.0524106 0.835213])\n", - "Trait6Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.283114 0.0571399; 0.0571399 0.0826789],\n", - "\n", - "[0.716403 0.0479199; 0.0479199 0.916112])\n", - "Trait6Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.28382 0.0611212; 0.0611212 0.0570817],\n", - "\n", - "[0.715722 0.0532698; 0.0532698 0.942133])\n", - "Trait7Trait8\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.213857 0.0884555; 0.0884555 0.192373],\n", - "\n", - "[0.783178 -0.0568331; -0.0568331 0.807252])\n", - "Trait7Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.218756 0.217042; 0.217042 0.244144],\n", - "\n", - "[0.778433 0.462901; 0.462901 0.754123])\n", - "Trait7Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.216273 -0.0421146; -0.0421146 0.10209],\n", - "\n", - "[0.780807 -0.0859075; -0.0859075 0.897822])\n", - "Trait7Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.214069 0.0206967; 0.0206967 0.163474],\n", - "\n", - "[0.782961 -0.048148; -0.048148 0.83517])\n", - "Trait7Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.214932 0.0757874; 0.0757874 0.0808731],\n", - "\n", - "[0.782132 0.0346956; 0.0346956 0.917915])\n", - "Trait7Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.215959 0.0749373; 0.0749373 0.0545934],\n", - "\n", - "[0.781139 0.0389058; 0.0389058 0.944622])\n", - "Trait8Trait9\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.194555 0.112816; 0.112816 0.247244],\n", - "\n", - "[0.805124 0.184778; 0.184778 0.751098])\n", - "Trait8Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.194441 -0.0156342; -0.0156342 0.100426],\n", - "\n", - "[0.805222 0.0119826; 0.0119826 0.899468])\n", - "Trait8Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.193853 0.0225325; 0.0225325 0.164669],\n", - "\n", - "[0.805796 -0.0272747; -0.0272747 0.833997])\n", - "Trait8Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.193951 -0.00287601; -0.00287601 0.0828573],\n", - "\n", - "[0.8057 0.0336128; 0.0336128 0.915932])\n", - "Trait8Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.19398 0.00407867; 0.00407867 0.0569071],\n", - "\n", - "[0.805672 0.0378817; 0.0378817 0.942301])\n", - "Trait9Trait10\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.247294 -0.00230836; -0.00230836 0.0998264],\n", - "\n", - "[0.751051 0.0740729; 0.0740729 0.900061])\n", - "Trait9Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.247823 0.0318235; 0.0318235 0.16489],\n", - "\n", - "[0.750532 0.152285; 0.152285 0.833778])\n", - "Trait9Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.250335 0.0845714; 0.0845714 0.0887587],\n", - "\n", - "[0.748091 0.107756; 0.107756 0.910108])\n", - "Trait9Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.249442 0.0934845; 0.0934845 0.057932],\n", - "\n", - "[0.748975 0.0982191; 0.0982191 0.941335])\n", - "Trait10Trait11\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.0931397 0.100034; 0.100034 0.164955],\n", - "\n", - "[0.906703 0.474427; 0.474427 0.833715])\n", - "Trait10Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.0967247 0.0564043; 0.0564043 0.0794528],\n", - "\n", - "[0.90315 0.0853232; 0.0853232 0.919334])\n", - "Trait10Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.100985 -0.0279916; -0.0279916 0.0578369],\n", - "\n", - "[0.898937 0.166051; 0.166051 0.94139])\n", - "Trait11Trait12\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.163841 0.0570318; 0.0570318 0.0792144],\n", - "\n", - "[0.834814 0.145597; 0.145597 0.919552])\n", - "Trait11Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.164883 -0.00158414; -0.00158414 0.0574968],\n", - "\n", - "[0.833798 0.200612; 0.200612 0.941715])\n", - "Trait12Trait13\n", - "(Σa[i,j],Σe[i,j]) = (\n", - "[0.0845946 0.0685052; 0.0685052 0.0554759],\n", - "\n", - "[0.914214 0.573152; 0.573152 0.943735])\n", - "elapsed time: 7.752196517 seconds\n" - ] - }, - { - "data": { - "text/plain": [ - "7.752196517" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# additive genetic effects (2x2 psd matrices) from bavariate trait analysis;\n", - "Σa = Array{Matrix{Float64}}(13, 13)\n", - "# environmental effects (2x2 psd matrices) from bavariate trait analysis;\n", - "Σe = Array{Matrix{Float64}}(13, 13)\n", - "\n", - "tic()\n", - "for i in 1:13\n", - " for j in (i+1):13\n", - " println(names(cg10k_trait)[i + 2], names(cg10k_trait)[j + 2])\n", - " # form data set for (trait1, trait2)\n", - " traitij_data = TwoVarCompVariateRotate(cg10kdata_rotated.Yrot[:, [i;j]], cg10kdata_rotated.Xrot, \n", - " cg10kdata_rotated.eigval, cg10kdata_rotated.eigvec, cg10kdata_rotated.logdetV2)\n", - " # initialize model parameters\n", - " traitij_model = VarianceComponentModel(traitij_data)\n", - " # estimate variance components\n", - " mle_fs!(traitij_model, traitij_data; solver=:Ipopt, verbose=false)\n", - " Σa[i, j] = traitij_model.Σ[1]\n", - " Σe[i, j] = traitij_model.Σ[2]\n", - " @show Σa[i, j], Σe[i, j]\n", - " end\n", - "end\n", - "toc()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## 3-trait analysis\n", - "\n", - "Researchers want to jointly analyze traits 5-7. Our strategy is to try both Fisher scoring and MM algorithm with different starting point, and choose the best local optimum. We first form the data set and run Fisher scoring, which yields a final objective value -1.4700991+04." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "This is Ipopt version 3.12.4, running with linear solver mumps.\n", - "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", - "\n", - "Number of nonzeros in equality constraint Jacobian...: 0\n", - "Number of nonzeros in inequality constraint Jacobian.: 0\n", - "Number of nonzeros in Lagrangian Hessian.............: 78\n", - "\n", - "Total number of variables............................: 12\n", - " variables with only lower bounds: 0\n", - " variables with lower and upper bounds: 0\n", - " variables with only upper bounds: 0\n", - "Total number of equality constraints.................: 0\n", - "Total number of inequality constraints...............: 0\n", - " inequality constraints with only lower bounds: 0\n", - " inequality constraints with lower and upper bounds: 0\n", - " inequality constraints with only upper bounds: 0\n", - "\n", - "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", - " 0 3.0247512e+04 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 \n", - " 5 1.6834796e+04 0.00e+00 4.07e+02 -11.0 3.66e-01 - 1.00e+00 1.00e+00f 1 MaxS\n", - " 10 1.4744497e+04 0.00e+00 1.12e+02 -11.0 2.45e-01 - 1.00e+00 1.00e+00f 1 MaxS\n", - " 15 1.4701497e+04 0.00e+00 1.30e+01 -11.0 1.15e-01 -4.5 1.00e+00 1.00e+00f 1 MaxS\n", - " 20 1.4700992e+04 0.00e+00 6.65e-01 -11.0 1.74e-04 -6.9 1.00e+00 1.00e+00f 1 MaxS\n", - " 25 1.4700991e+04 0.00e+00 2.77e-02 -11.0 7.36e-06 -9.2 1.00e+00 1.00e+00f 1 MaxS\n", - " 30 1.4700991e+04 0.00e+00 1.15e-03 -11.0 3.06e-07 -11.6 1.00e+00 1.00e+00f 1 MaxS\n", - " 35 1.4700991e+04 0.00e+00 4.76e-05 -11.0 1.27e-08 -14.0 1.00e+00 1.00e+00h 1 MaxS\n", - " 40 1.4700991e+04 0.00e+00 1.97e-06 -11.0 5.26e-10 -16.4 1.00e+00 1.00e+00f 1 MaxSA\n", - " 45 1.4700991e+04 0.00e+00 8.17e-08 -11.0 2.18e-11 -18.8 1.00e+00 1.00e+00h 1 MaxSA\n", - "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", - "\n", - "Number of Iterations....: 49\n", - "\n", - " (scaled) (unscaled)\n", - "Objective...............: 4.4724330090668286e+02 1.4700991028593347e+04\n", - "Dual infeasibility......: 6.4345872286001950e-09 2.1150637455850896e-07\n", - "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", - "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", - "Overall NLP error.......: 6.4345872286001950e-09 2.1150637455850896e-07\n", - "\n", - "\n", - "Number of objective function evaluations = 50\n", - "Number of objective gradient evaluations = 50\n", - "Number of equality constraint evaluations = 0\n", - "Number of inequality constraint evaluations = 0\n", - "Number of equality constraint Jacobian evaluations = 0\n", - "Number of inequality constraint Jacobian evaluations = 0\n", - "Number of Lagrangian Hessian evaluations = 49\n", - "Total CPU secs in IPOPT (w/o function evaluations) = 0.013\n", - "Total CPU secs in NLP function evaluations = 0.085\n", - "\n", - "EXIT: Optimal Solution Found.\n", - " 0.118428 seconds (66.82 k allocations: 6.250 MB)\n" - ] - }, - { - "data": { - "text/plain": [ - "VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}(,(\n", - "[0.281163 0.280014 0.232384; 0.280014 0.284899 0.220285; 0.232384 0.220285 0.212687],\n", - "\n", - "[0.716875 0.66125 0.674025; 0.66125 0.714602 0.581433; 0.674025 0.581433 0.784324]),,Char[],Float64[],-Inf,Inf)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "traitidx = 5:7\n", - "# form data set\n", - "trait57_data = TwoVarCompVariateRotate(cg10kdata_rotated.Yrot[:, traitidx], cg10kdata_rotated.Xrot, \n", - " cg10kdata_rotated.eigval, cg10kdata_rotated.eigvec, cg10kdata_rotated.logdetV2)\n", - "# initialize model parameters\n", - "trait57_model = VarianceComponentModel(trait57_data)\n", - "# estimate variance components\n", - "@time mle_fs!(trait57_model, trait57_data; solver=:Ipopt, verbose=true)\n", - "trait57_model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "We then run the MM algorithm, starting from the Fisher scoring answer. MM finds an improved solution with objective value 8.955397e+03." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - " MM Algorithm\n", - " Iter Objective \n", - "-------- -------------\n", - " 0 -1.470099e+04\n", - " 1 -1.470099e+04\n", - "\n", - " 0.363096 seconds (539.27 k allocations: 20.482 MB, 2.19% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}(,(\n", - "[0.281163 0.280014 0.232384; 0.280014 0.284899 0.220285; 0.232384 0.220285 0.212687],\n", - "\n", - "[0.716875 0.66125 0.674025; 0.66125 0.714602 0.581433; 0.674025 0.581433 0.784324]),,Char[],Float64[],-Inf,Inf)" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# trait59_model contains the fitted model by Fisher scoring now\n", - "@time mle_mm!(trait57_model, trait57_data; verbose=true)\n", - "trait57_model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Do another run of MM algorithm from default starting point. It leads to a slightly better local optimum -1.470104e+04, slighly worse than the Fisher scoring result. Follow up anlaysis should use the Fisher scoring result." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - " MM Algorithm\n", - " Iter Objective \n", - "-------- -------------\n", - " 0 -3.024751e+04\n", - " 1 -2.040338e+04\n", - " 2 -1.656127e+04\n", - " 3 -1.528591e+04\n", - " 4 -1.491049e+04\n", - " 5 -1.480699e+04\n", - " 6 -1.477870e+04\n", - " 7 -1.477026e+04\n", - " 8 -1.476696e+04\n", - " 9 -1.476499e+04\n", - " 10 -1.476339e+04\n", - " 20 -1.475040e+04\n", - " 30 -1.474042e+04\n", - " 40 -1.473272e+04\n", - " 50 -1.472677e+04\n", - " 60 -1.472215e+04\n", - " 70 -1.471852e+04\n", - " 80 -1.471565e+04\n", - " 90 -1.471336e+04\n", - " 100 -1.471152e+04\n", - " 110 -1.471002e+04\n", - " 120 -1.470879e+04\n", - " 130 -1.470778e+04\n", - " 140 -1.470694e+04\n", - " 150 -1.470623e+04\n", - " 160 -1.470563e+04\n", - " 170 -1.470513e+04\n", - " 180 -1.470469e+04\n", - " 190 -1.470432e+04\n", - " 200 -1.470400e+04\n", - " 210 -1.470372e+04\n", - " 220 -1.470347e+04\n", - " 230 -1.470326e+04\n", - " 240 -1.470307e+04\n", - " 250 -1.470290e+04\n", - " 260 -1.470275e+04\n", - " 270 -1.470262e+04\n", - " 280 -1.470250e+04\n", - " 290 -1.470239e+04\n", - " 300 -1.470229e+04\n", - " 310 -1.470220e+04\n", - " 320 -1.470213e+04\n", - " 330 -1.470205e+04\n", - " 340 -1.470199e+04\n", - " 350 -1.470193e+04\n", - " 360 -1.470187e+04\n", - " 370 -1.470182e+04\n", - " 380 -1.470177e+04\n", - " 390 -1.470173e+04\n", - " 400 -1.470169e+04\n", - " 410 -1.470165e+04\n", - " 420 -1.470162e+04\n", - " 430 -1.470159e+04\n", - " 440 -1.470156e+04\n", - " 450 -1.470153e+04\n", - " 460 -1.470150e+04\n", - " 470 -1.470148e+04\n", - " 480 -1.470146e+04\n", - " 490 -1.470143e+04\n", - " 500 -1.470141e+04\n", - " 510 -1.470140e+04\n", - " 520 -1.470138e+04\n", - " 530 -1.470136e+04\n", - " 540 -1.470134e+04\n", - " 550 -1.470133e+04\n", - " 560 -1.470132e+04\n", - " 570 -1.470130e+04\n", - " 580 -1.470129e+04\n", - " 590 -1.470128e+04\n", - " 600 -1.470127e+04\n", - " 610 -1.470125e+04\n", - " 620 -1.470124e+04\n", - " 630 -1.470123e+04\n", - " 640 -1.470122e+04\n", - " 650 -1.470122e+04\n", - " 660 -1.470121e+04\n", - " 670 -1.470120e+04\n", - " 680 -1.470119e+04\n", - " 690 -1.470118e+04\n", - " 700 -1.470118e+04\n", - " 710 -1.470117e+04\n", - " 720 -1.470116e+04\n", - " 730 -1.470116e+04\n", - " 740 -1.470115e+04\n", - " 750 -1.470114e+04\n", - " 760 -1.470114e+04\n", - " 770 -1.470113e+04\n", - " 780 -1.470113e+04\n", - " 790 -1.470112e+04\n", - " 800 -1.470112e+04\n", - " 810 -1.470111e+04\n", - " 820 -1.470111e+04\n", - " 830 -1.470111e+04\n", - " 840 -1.470110e+04\n", - " 850 -1.470110e+04\n", - " 860 -1.470109e+04\n", - " 870 -1.470109e+04\n", - " 880 -1.470109e+04\n", - " 890 -1.470108e+04\n", - " 900 -1.470108e+04\n", - " 910 -1.470108e+04\n", - " 920 -1.470108e+04\n", - " 930 -1.470107e+04\n", - " 940 -1.470107e+04\n", - " 950 -1.470107e+04\n", - " 960 -1.470106e+04\n", - " 970 -1.470106e+04\n", - " 980 -1.470106e+04\n", - " 990 -1.470106e+04\n", - " 1000 -1.470106e+04\n", - " 1010 -1.470105e+04\n", - " 1020 -1.470105e+04\n", - " 1030 -1.470105e+04\n", - " 1040 -1.470105e+04\n", - " 1050 -1.470105e+04\n", - " 1060 -1.470104e+04\n", - " 1070 -1.470104e+04\n", - " 1080 -1.470104e+04\n", - " 1090 -1.470104e+04\n", - " 1100 -1.470104e+04\n", - " 1110 -1.470104e+04\n", - "\n", - " 0.846316 seconds (153.65 k allocations: 15.453 MB, 0.89% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}(,(\n", - "[0.281188 0.280032 0.232439; 0.280032 0.284979 0.220432; 0.232439 0.220432 0.212922],\n", - "\n", - "[0.71685 0.661232 0.67397; 0.661232 0.71452 0.581287; 0.67397 0.581287 0.784092]),,Char[],Float64[],-Inf,Inf)" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# default starting point\n", - "trait57_model = VarianceComponentModel(trait57_data)\n", - "@time _, _, _, Σcov, = mle_mm!(trait57_model, trait57_data; verbose=true)\n", - "trait57_model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Heritability from 3-variate estimate and their standard errors." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "2×3 Array{Float64,2}:\n", - " 0.281741 0.285122 0.21356 \n", - " 0.0778033 0.0773313 0.0841103" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "h, hse = heritability(trait57_model.Σ, Σcov)\n", - "[h'; hse']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 13-trait joint analysis\n", - "\n", - "In some situations, such as studying the genetic covariance, we need to jointly analyze 13 traits. We first try the **Fisher scoring algorithm**." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "This is Ipopt version 3.12.4, running with linear solver mumps.\n", - "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", - "\n", - "Number of nonzeros in equality constraint Jacobian...: 0\n", - "Number of nonzeros in inequality constraint Jacobian.: 0\n", - "Number of nonzeros in Lagrangian Hessian.............: 16653\n", - "\n", - "Total number of variables............................: 182\n", - " variables with only lower bounds: 0\n", - " variables with lower and upper bounds: 0\n", - " variables with only upper bounds: 0\n", - "Total number of equality constraints.................: 0\n", - "Total number of inequality constraints...............: 0\n", - " inequality constraints with only lower bounds: 0\n", - " inequality constraints with lower and upper bounds: 0\n", - " inequality constraints with only upper bounds: 0\n", - "\n", - "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", - " 0 1.3113368e+05 0.00e+00 1.00e+02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0 \n", - " 5 8.2237394e+04 0.00e+00 6.04e+02 -11.0 2.53e+00 - 1.00e+00 1.00e+00f 1 MaxS\n", - " 10 1.2380115e+05 0.00e+00 1.03e+03 -11.0 6.31e+01 -5.4 1.00e+00 1.00e+00h 1 MaxS\n", - " 15 1.4133320e+05 0.00e+00 1.99e+02 -11.0 4.54e+02 -7.8 1.00e+00 1.00e+00h 1 MaxS\n" - ] - }, - { - "ename": "LoadError", - "evalue": "Base.LinAlg.PosDefException(25)", - "output_type": "error", - "traceback": [ - "Base.LinAlg.PosDefException(25)", - "", - " in chkposdef at ./linalg/lapack.jl:44 [inlined]", - " in sygvd!(::Int64, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,2}) at ./linalg/lapack.jl:4908", - " in eigfact!(::Symmetric{Float64,Array{Float64,2}}, ::Symmetric{Float64,Array{Float64,2}}) at ./linalg/symmetric.jl:224", - " in eigfact(::Symmetric{Float64,Array{Float64,2}}, ::Symmetric{Float64,Array{Float64,2}}) at ./linalg/eigen.jl:274", - " in VarianceComponentModels.TwoVarCompModelRotate{T<:AbstractFloat,BT<:Union{AbstractArray{T,1},AbstractArray{T,2}}}(::VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}) at /Users/huazhou/.julia/v0.5/VarianceComponentModels/src/VarianceComponentModels.jl:121", - " in eval_f(::VarianceComponentModels.TwoVarCompOptProb{VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}},VarianceComponentModels.TwoVarCompVariateRotate{Float64,Array{Float64,2},Array{Float64,2}},Array{Float64,2},Array{Float64,1},VarianceComponentModels.VarianceComponentAuxData{Array{Float64,2},Array{Float64,1}}}, ::Array{Float64,1}) at /Users/huazhou/.julia/v0.5/VarianceComponentModels/src/two_variance_component.jl:683", - " in (::Ipopt.#eval_f_cb#4{VarianceComponentModels.TwoVarCompOptProb{VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}},VarianceComponentModels.TwoVarCompVariateRotate{Float64,Array{Float64,2},Array{Float64,2}},Array{Float64,2},Array{Float64,1},VarianceComponentModels.VarianceComponentAuxData{Array{Float64,2},Array{Float64,1}}}})(::Array{Float64,1}) at /Users/huazhou/.julia/v0.5/Ipopt/src/IpoptSolverInterface.jl:53", - " in eval_f_wrapper(::Int32, ::Ptr{Float64}, ::Int32, ::Ptr{Float64}, ::Ptr{Void}) at /Users/huazhou/.julia/v0.5/Ipopt/src/Ipopt.jl:89", - " in solveProblem(::Ipopt.IpoptProblem) at /Users/huazhou/.julia/v0.5/Ipopt/src/Ipopt.jl:304", - " in optimize!(::Ipopt.IpoptMathProgModel) at /Users/huazhou/.julia/v0.5/Ipopt/src/IpoptSolverInterface.jl:120", - " in #mle_fs!#29(::Int64, ::Symbol, ::Symbol, ::Bool, ::Function, ::VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}, ::VarianceComponentModels.TwoVarCompVariateRotate{Float64,Array{Float64,2},Array{Float64,2}}) at /Users/huazhou/.julia/v0.5/VarianceComponentModels/src/two_variance_component.jl:893", - " in (::VarianceComponentModels.#kw##mle_fs!)(::Array{Any,1}, ::VarianceComponentModels.#mle_fs!, ::VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}, ::VarianceComponentModels.TwoVarCompVariateRotate{Float64,Array{Float64,2},Array{Float64,2}}) at ./:0" - ] - } - ], - "source": [ - "# initialize model parameters\n", - "traitall_model = VarianceComponentModel(cg10kdata_rotated)\n", - "# estimate variance components using Fisher scoring algorithm\n", - "@time mle_fs!(traitall_model, cg10kdata_rotated; solver=:Ipopt, verbose=true)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the output we can see the Fisher scoring algorithm ran into some numerical issues. Let's try the **MM algorithm**." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - " MM Algorithm\n", - " Iter Objective \n", - "-------- -------------\n", - " 0 -1.311337e+05\n", - " 1 -8.002195e+04\n", - " 2 -5.807051e+04\n", - " 3 -4.926234e+04\n", - " 4 -4.611182e+04\n", - " 5 -4.511727e+04\n", - " 6 -4.482798e+04\n", - " 7 -4.474410e+04\n", - " 8 -4.471610e+04\n", - " 9 -4.470285e+04\n", - " 10 -4.469355e+04\n", - " 20 -4.462331e+04\n", - " 30 -4.456960e+04\n", - " 40 -4.452834e+04\n", - " 50 -4.449652e+04\n", - " 60 -4.447178e+04\n", - " 70 -4.445237e+04\n", - " 80 -4.443699e+04\n", - " 90 -4.442467e+04\n", - " 100 -4.441470e+04\n", - " 110 -4.440656e+04\n", - " 120 -4.439985e+04\n", - " 130 -4.439427e+04\n", - " 140 -4.438959e+04\n", - " 150 -4.438564e+04\n", - " 160 -4.438229e+04\n", - " 170 -4.437941e+04\n", - " 180 -4.437694e+04\n", - " 190 -4.437480e+04\n", - " 200 -4.437294e+04\n", - " 210 -4.437131e+04\n", - " 220 -4.436989e+04\n", - " 230 -4.436863e+04\n", - " 240 -4.436751e+04\n", - " 250 -4.436652e+04\n", - " 260 -4.436564e+04\n", - " 270 -4.436485e+04\n", - " 280 -4.436414e+04\n", - " 290 -4.436351e+04\n", - " 300 -4.436293e+04\n", - " 310 -4.436242e+04\n", - " 320 -4.436195e+04\n", - " 330 -4.436152e+04\n", - " 340 -4.436113e+04\n", - " 350 -4.436078e+04\n", - " 360 -4.436046e+04\n", - " 370 -4.436016e+04\n", - " 380 -4.435989e+04\n", - " 390 -4.435965e+04\n", - " 400 -4.435942e+04\n", - " 410 -4.435921e+04\n", - " 420 -4.435902e+04\n", - " 430 -4.435884e+04\n", - " 440 -4.435867e+04\n", - " 450 -4.435852e+04\n", - " 460 -4.435838e+04\n", - " 470 -4.435825e+04\n", - " 480 -4.435813e+04\n", - " 490 -4.435802e+04\n", - " 500 -4.435791e+04\n", - " 510 -4.435781e+04\n", - " 520 -4.435772e+04\n", - " 530 -4.435764e+04\n", - " 540 -4.435756e+04\n", - " 550 -4.435748e+04\n", - " 560 -4.435741e+04\n", - " 570 -4.435735e+04\n", - " 580 -4.435729e+04\n", - " 590 -4.435723e+04\n", - " 600 -4.435718e+04\n", - " 610 -4.435713e+04\n", - " 620 -4.435708e+04\n", - " 630 -4.435704e+04\n", - " 640 -4.435700e+04\n", - " 650 -4.435696e+04\n", - " 660 -4.435692e+04\n", - " 670 -4.435688e+04\n", - " 680 -4.435685e+04\n", - " 690 -4.435682e+04\n", - " 700 -4.435679e+04\n", - " 710 -4.435676e+04\n", - " 720 -4.435674e+04\n", - " 730 -4.435671e+04\n", - " 740 -4.435669e+04\n", - " 750 -4.435667e+04\n", - " 760 -4.435665e+04\n", - " 770 -4.435663e+04\n", - " 780 -4.435661e+04\n", - " 790 -4.435659e+04\n", - " 800 -4.435657e+04\n", - " 810 -4.435656e+04\n", - " 820 -4.435654e+04\n", - " 830 -4.435653e+04\n", - " 840 -4.435651e+04\n", - " 850 -4.435650e+04\n", - " 860 -4.435649e+04\n", - " 870 -4.435648e+04\n", - " 880 -4.435647e+04\n", - " 890 -4.435646e+04\n", - " 900 -4.435645e+04\n", - " 910 -4.435644e+04\n", - " 920 -4.435643e+04\n", - " 930 -4.435642e+04\n", - " 940 -4.435641e+04\n", - " 950 -4.435640e+04\n", - " 960 -4.435639e+04\n", - " 970 -4.435639e+04\n", - " 980 -4.435638e+04\n", - " 990 -4.435637e+04\n", - " 1000 -4.435637e+04\n", - " 1010 -4.435636e+04\n", - " 1020 -4.435635e+04\n", - " 1030 -4.435635e+04\n", - " 1040 -4.435634e+04\n", - " 1050 -4.435634e+04\n", - " 1060 -4.435633e+04\n", - " 1070 -4.435633e+04\n", - " 1080 -4.435632e+04\n", - "\n", - " 3.696088 seconds (156.73 k allocations: 69.792 MB, 0.32% gc time)\n" - ] - }, - { - "data": { - "text/plain": [ - "(-44356.32043185692,VarianceComponentModels.VarianceComponentModel{Float64,2,Array{Float64,2},Array{Float64,2}}(,(\n", - "[0.273498 0.192141 … -0.128643 -0.098307; 0.192141 0.219573 … -0.0687355 -0.0433724; … ; -0.128643 -0.0687355 … 0.117185 0.0899824; -0.098307 -0.0433724 … 0.0899824 0.106354],\n", - "\n", - "[0.723441 0.568135 … -0.0586259 -0.12469; 0.568135 0.77999 … 0.0236098 0.0464835; … ; -0.0586259 0.0236098 … 0.881707 0.551818; -0.12469 0.0464835 … 0.551818 0.893023]),,Char[],Float64[],-Inf,Inf),(\n", - "[0.0111652 0.0131052 … 0.012894 0.0127596; 0.0131077 0.0151572 … 0.0171615 0.0171431; … ; 0.0128941 0.0171614 … 0.0174141 0.0182039; 0.0127597 0.0171428 … 0.0182038 0.0187958],\n", - "\n", - "[0.011228 0.0133093 … 0.0130109 0.012784; 0.0133113 0.0158091 … 0.0178685 0.0177999; … ; 0.0130107 0.0178687 … 0.017957 0.0187655; 0.0127836 0.0177997 … 0.0187654 0.0193582]),\n", - "[0.000124662 7.24629e-5 … -3.61414e-7 -1.40373e-5; 7.24483e-5 0.000171812 … -2.03586e-5 -3.26457e-6; … ; -3.87301e-7 -2.03778e-5 … 0.000352145 -1.48419e-5; -1.40438e-5 -3.27339e-6 … -1.48381e-5 0.000374741],\n", - "\n", - ",)" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# reset model parameters\n", - "traitall_model = VarianceComponentModel(cg10kdata_rotated)\n", - "# estimate variance components using Fisher scoring algorithm\n", - "@time mle_mm!(traitall_model, cg10kdata_rotated; verbose=true)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It converges after ~1000 iterations." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "## Save analysis results" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [], - "source": [ - "#using JLD\n", - "#@save \"copd.jld\"\n", - "#whos()" - ] - } - ], - "metadata": { - "anaconda-cloud": {}, - "kernelspec": { - "display_name": "Julia 0.5.1", - "language": "julia", - "name": "julia-0.5" - }, - "language_info": { - "file_extension": ".jl", - "mimetype": "application/julia", - "name": "julia", - "version": "0.5.1" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -}