【提示】本文写作命令为:abysw blog 差异表达
【 另 】有任何问题,欢迎来公众号交流!
之前在做注释的时候已经进行了RNA的组装和注释:
RN02L1_1.fq.gz
RN02L1_2.fq.gz
RN02L1_2.P.qtrim.fq.gz
RN02L1_1.P.qtrim.fq.gz
inchworm.K25.L25.DS.fa
Trinity.fasta
RN02L1.U.qtrim.fq.gz
hisat2.log
RN02L1.sorted.bam
Ricciocarpus_natans.gff
Ricciocarpus_natans.gtf
Ricciocarpus_natans.ss
Ricciocarpus_natans.exon
RN02L1.gtf
接下来需要准备count文件:
$ prepDE.py -h
Usage: prepDE.py [options]
Generates two CSV files containing the count matrices for genes and
transcripts, using the coverage values found in the output of stringtie -e
Options:
-h, –help show this help message and exit
-i INPUT, –input=INPUT, –in=INPUT
a folder containing all sample sub-directories, or a
text file with sample ID and path to its GTF file on
each line [default: ./]
-g G where to output the gene count matrix [default:
gene_count_matrix.csv
-t T where to output the transcript count matrix [default:
transcript_count_matrix.csv]
-l LENGTH, –length=LENGTH
the average read length [default: 75]
-p PATTERN, –pattern=PATTERN
a regular expression that selects the sample
subdirectories
-c, –cluster whether to cluster genes that overlap with different
gene IDs, ignoring ones with geneID pattern (see
below)
-s STRING, –string=STRING
if a different prefix is used for geneIDs assigned by
StringTie [default: MSTRG]
-k KEY, –key=KEY if clustering, what prefix to use for geneIDs assigned
by this script [default: prepG]
-v enable verbose processing
–legend=LEGEND if clustering, where to output the legend file mapping
transcripts to assigned geneIDs [default: legend.csv]
ls /RNgtf | perl -ne ‘print “$1\t$_” if //(\S+)./‘ > gtf.list
stringtie -p 12 -G $SP.gtf -o $ID.gtf -A $ID.tab -B -e -l $ID $ID.bam
$ prepDE.py -i gtf.list
获得文件gene_count_matrix.csv
R载包:
BiocManager::install(“DESeq2”)
以下为R脚本内容:
参考:https://zhuanlan.zhihu.com/p/477404097
library(DESeq2)