Releases · itmat/Normalization · GitHub

15 Jun 14:15

eunjijunekim

PORT v0.8.1-beta 6/15/2016

News
1. added geneSymbol column to the ensembl annotation files so it can be used for Exon-Intron-Junction level normalization
2. "convert_gtf_to_PORT_geneinfo.transcripts.pl" script also outputs a geneSymbol column
3. changed default maxjobs number to 1000 for both lsf and sge in config
Bug fixes
1. -alt_out option bug fixed
2. cigar2span fixed to work with GSNAP output (^#S#D and #D#S$ case)
3. cleanup script bug fixed

Assets 2

06 Jun 18:37

eunjijunekim

PORT v0.8-beta 6/6/2016

News
- PORT infers read length from unaligned files (uses average read length)
- inferred introns size cutoff set at 75000
- reads mapping to highly expressed features (gene,exon,intron) are handled separately and the resampled reads get put back into the final sam/bam (not just spreadsheet level).
- Implemented '-alt_out option'. Users can redirect the normalized data to an alternate location.
- sam2mappingstats reports number of Non-Unique alignments and reads instead of percentages.
- bam to sam step omitted for bam input
- script available for ensembl gtf -> gene info file conversion.
bug fixes
- cigar2spans now accounts for ND, DN cases
- genepercents calculation fixed (sum of all min counts were used as total before, now using total gene mappers instead)
- making list of high expressors for novel exon case fixed

Assets 2

20 Oct 21:25

eunjijunekim

PORT v0.7.5-beta 10/20/2015

news:

"-v" flag outputs version of PORT.
ribo percents uses total number of reads, not all mapped reads for computing the stats.

bug fixes:

quantifygenes_gnorm2 step in runall_normalization.pl name_of_job had a typo, fixed.
check_samformat.pl outputs the problematic reads with the error.

Assets 2

07 Oct 19:16

eunjijunekim

PORT v0.7.4-beta 10/7/2015

news

expected_num_reads.txt for Exon-Intron-Junction level normalization now includes the exon-inconsistent read information.

bug fixes

getstats.pl: port does not throw error anymore when sam2mappingstats output file is missing stats.
runall_normalization.pl: checksam step was missing -se flag for single end data, fixed.
PORT can handle unique read normalization.

Assets 2

31 Aug 21:17

eunjijunekim

PORT v0.7.3-beta 8/31/2015

news

takes bam input
provides breakdown file for exon-intron-junction normalization
predict number of reads provide comma separated list of sample ids
checks sam/bam format to make sure:
- sam/bam has proper tags
- (paired-end) mated alignments are in adjacent lines
infers paired/single-end, sam/bam, gzipped, and fasta/fastq (options removed)

bugfixes

sam2cov : outputs forward reverse, not sense and antisense; fixed the output file names
catshuffiles step bug fixed in runall_normalization.pl

Assets 2

16 Jul 18:36

eunjijunekim

PORT v0.7.2-beta 7/16/2015

News:

restart only the failed jobs when -resume or -resume_at option used
delete intermediate files from blast step when cleanup is set to true
all merging steps and filter highly expressed genes step run at sample level to cut unnecessary wait time
default lsf queue names in config file set to new PMACS cluster queue names
undetermined reads renamed to exon inconsistent reads
modified runall scripts to avoid too many jobs getting submitted to one node

Bug fixes:

compress step bug fix; pipeline now waits until all jobs are completed

Assets 2

12 Apr 17:19

brainfood

v0.7.1-beta

Bug fixes

properly checks input file format
genefilter.pl script name replacement step fixed (doesn't use regex anymore)
quants2spreadsheet now uses 6G queue when # samples > 200.

Assets 2

09 Apr 21:11

eunjijunekim

v0.7-beta 4/9/2015

News:

Users can pre-filter the ribosomal reads prior to running PORT and skip BLAST step.
rRNA FASTA for non-mammalian organisms (Drosophila melanogaster (dm), Zebrafish (danRer) and C.elegans) are available.
No longer need to provide two gene info files (gene info and annotation file) for EXON-INTRON-JUNCTION level normalization.
Users need to provide chromosome names (for non standard names) and mitochondrial chromosome name.
sam2cov now supports data aligned with GSNAP

Bug fixes:

sam2junctions was not working properly when genome fasta file was not in one-line format. PORT now checks and converts into the correct format before generating jucntions files.
high expressers were not getting put back in when -cutoff_highexp with the same cutoff value in both PART1 and PART2. It now works properly.
renamed stranded coverage file names (sense and antisense instead of fwd and rev)
STATS files provide more descriptive header/footer. All STATS files in % notation.
filter high expressers were not working properly for single end data > fixed
fixed issue with list for intronquants (when -novel_off flag was used).
BLAST loops through query file to avoid memory problems.
PORT checks input file formats before starting
two PORT runs with the same STUDY name cannot be run at the same time.

Assets 2

02 Feb 19:37

eunjijunekim

v0.6.3-beta 2/2/2015

New blast - faster.
Name of default/normal queue added to config file.

Exon-Intron-Junction Norm

Novel introns (inferred from junctions file)
Flanking regions quantified
Novel exons runs in parallel

Gene Norm

sam2gene runs in chunks for shorter runtime
highly expressed genes (if filtered out), gets put back into the final spreadsheet

Assets 2

20 Nov 21:22

eunjijunekim

11/20/2014

bug fixes:

runall_shuf.pl (stranded non-unique data)
quantification modified (does not double count anymore)

Assets 2