Releases: itmat/Normalization
Releases · itmat/Normalization
PORT v0.8.1-beta 6/15/2016
- News
- added geneSymbol column to the ensembl annotation files so it can be used for Exon-Intron-Junction level normalization
- "convert_gtf_to_PORT_geneinfo.transcripts.pl" script also outputs a geneSymbol column
- changed default maxjobs number to 1000 for both lsf and sge in config
- Bug fixes
- -alt_out option bug fixed
- cigar2span fixed to work with GSNAP output (^#S#D and #D#S$ case)
- cleanup script bug fixed
PORT v0.8-beta 6/6/2016
- News
- PORT infers read length from unaligned files (uses average read length)
- inferred introns size cutoff set at 75000
- reads mapping to highly expressed features (gene,exon,intron) are handled separately and the resampled reads get put back into the final sam/bam (not just spreadsheet level).
- Implemented '-alt_out option'. Users can redirect the normalized data to an alternate location.
- sam2mappingstats reports number of Non-Unique alignments and reads instead of percentages.
- bam to sam step omitted for bam input
- script available for ensembl gtf -> gene info file conversion.
- bug fixes
- cigar2spans now accounts for ND, DN cases
- genepercents calculation fixed (sum of all min counts were used as total before, now using total gene mappers instead)
- making list of high expressors for novel exon case fixed
PORT v0.7.5-beta 10/20/2015
news:
- "-v" flag outputs version of PORT.
- ribo percents uses total number of reads, not all mapped reads for computing the stats.
bug fixes:
- quantifygenes_gnorm2 step in runall_normalization.pl name_of_job had a typo, fixed.
- check_samformat.pl outputs the problematic reads with the error.
PORT v0.7.4-beta 10/7/2015
news
- expected_num_reads.txt for Exon-Intron-Junction level normalization now includes the exon-inconsistent read information.
bug fixes
- getstats.pl: port does not throw error anymore when sam2mappingstats output file is missing stats.
- runall_normalization.pl: checksam step was missing -se flag for single end data, fixed.
- PORT can handle unique read normalization.
PORT v0.7.3-beta 8/31/2015
news
- takes bam input
- provides breakdown file for exon-intron-junction normalization
- predict number of reads provide comma separated list of sample ids
- checks sam/bam format to make sure:
- sam/bam has proper tags
- (paired-end) mated alignments are in adjacent lines
- infers paired/single-end, sam/bam, gzipped, and fasta/fastq (options removed)
bugfixes
- sam2cov : outputs forward reverse, not sense and antisense; fixed the output file names
- catshuffiles step bug fixed in runall_normalization.pl
PORT v0.7.2-beta 7/16/2015
News:
- restart only the failed jobs when -resume or -resume_at option used
- delete intermediate files from blast step when cleanup is set to true
- all merging steps and filter highly expressed genes step run at sample level to cut unnecessary wait time
- default lsf queue names in config file set to new PMACS cluster queue names
- undetermined reads renamed to exon inconsistent reads
- modified runall scripts to avoid too many jobs getting submitted to one node
Bug fixes:
- compress step bug fix; pipeline now waits until all jobs are completed
v0.7.1-beta
Bug fixes
- properly checks input file format
- genefilter.pl script name replacement step fixed (doesn't use regex anymore)
- quants2spreadsheet now uses 6G queue when # samples > 200.
v0.7-beta 4/9/2015
News:
- Users can pre-filter the ribosomal reads prior to running PORT and skip BLAST step.
- rRNA FASTA for non-mammalian organisms (Drosophila melanogaster (dm), Zebrafish (danRer) and C.elegans) are available.
- No longer need to provide two gene info files (gene info and annotation file) for EXON-INTRON-JUNCTION level normalization.
- Users need to provide chromosome names (for non standard names) and mitochondrial chromosome name.
- sam2cov now supports data aligned with GSNAP
Bug fixes:
- sam2junctions was not working properly when genome fasta file was not in one-line format. PORT now checks and converts into the correct format before generating jucntions files.
- high expressers were not getting put back in when -cutoff_highexp with the same cutoff value in both PART1 and PART2. It now works properly.
- renamed stranded coverage file names (sense and antisense instead of fwd and rev)
- STATS files provide more descriptive header/footer. All STATS files in % notation.
- filter high expressers were not working properly for single end data > fixed
- fixed issue with list for intronquants (when -novel_off flag was used).
- BLAST loops through query file to avoid memory problems.
- PORT checks input file formats before starting
- two PORT runs with the same STUDY name cannot be run at the same time.
v0.6.3-beta 2/2/2015
- New blast - faster.
- Name of default/normal queue added to config file.
Exon-Intron-Junction Norm
- Novel introns (inferred from junctions file)
- Flanking regions quantified
- Novel exons runs in parallel
Gene Norm
- sam2gene runs in chunks for shorter runtime
- highly expressed genes (if filtered out), gets put back into the final spreadsheet
11/20/2014
bug fixes:
- runall_shuf.pl (stranded non-unique data)
- quantification modified (does not double count anymore)