【提示】本文写作命令为:abysw blog 染色体分型
【 另 】有任何问题,欢迎来公众号交流!
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
$ conda create -n hapcut2 -c bioconda hapcut2 -y
支持以下类型数据
NGS short reads (Illumina HiSeq)
single-molecule long reads (PacBio and Oxford Nanopore)
Linked-Reads (e.g. 10X Genomics, stLFR or TELL-seq)
proximity-ligation (Hi-C) reads
high-coverage sequencing (>40x coverage-per-SNP) using above technologies
combinations of the above technologies (e.g. scaffold long reads with Hi-C reads)
两步运行
- ./build/extractHAIRS [options] –bam reads.sorted.bam –VCF variants.vcf –out fragment_file
获取单倍型信息 - ./build/HAPCUT2 –fragments fragment_file –VCF variants.vcf –output haplotype_output_file
组装单倍型
输入文件:比对的bam文件,未压缩的vcf文件。
不知道六倍体怎么玩,试试吧。
软件总是在调试中理解(主要还是没耐性)。
./extractHAIRS [options] –bam reads.sorted.bam –VCF variants.VCF –out fragment_file
Options:
–qvoffset <33/64> : quality value offset, 33/64 depending on how quality values were encoded, default is 33
–mbq
–mmq
–realign_variants <0/1> : Perform sensitive realignment and scoring of variants.
–hic <0/1> : sets default maxIS to 40MB, prints matrix in new HiC format
–10X <0/1> : 10X reads. NOTE: Output fragments MUST be processed with LinkReads.py script after extractHAIRS to work with HapCUT2.
–pacbio <0/1> : Pacific Biosciences reads. Similar to –realign_variants, but with alignment parameters tuned for PacBio reads.
–ONT, –ont <0/1> : Oxford nanopore technology reads. Similar to –realign_variants, but with alignment parameters tuned for Oxford Nanopore Reads.
–new_format, –nf <0/1> : prints matrix in new format. Requires –new_format option when running HapCUT2.
–VCF
–maxIS
–minIS
–PEonly <0/1> : do not use single end reads, default is 0 (use all reads)
–indels <0/1> : extract reads spanning INDELS, default is 0, variants need to specified in VCF format to use this option
–noquality
–triallelic <0/1> : include variants with genotype 1/2 for parsing, default 0
–ref
–out
–region chr:start-end : chromosome and region in BAM file, useful to process individual chromosomes or genomic regions
–ep <0/1> : set to 1 to estimate HMM parameters from aligned reads (only with long reads), default = 0
–hom <0/1> : set to 1 to include homozygous variants for processing, default = 0 (only heterozygous)
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
USAGE : ./HAPCUT2 –fragments fragment_file –VCF variantcalls.vcf –output haplotype_output_file
Basic Options:
–fragments, –f
–VCF
–output, –o
–outvcf <0/1> : output phased variants to VCF file (
–converge, –c
–verbose, –v <0/1>: verbose mode: print extra information to stdout and stderr. default: 0
Read Technology Options:
–hic <0/1> : increases accuracy on Hi-C data; models h-trans errors directly from the data. default: 0
–hic_htrans_file, –hf
–qv_offset, –qo <33/48/64> : quality value offset for base quality scores, default: 33 (use same value as for extracthairs)
–long_reads, –lr <0/1> : reduces memory when phasing long read data with many SNPs per read. default: automatic.
Haplotype Post-Processing Options:
–threshold, –t
–skip_prune, –sp <0/1>: skip default likelihood pruning step (prune SNPs after the fact using column 11 of the output). default: 0
–call_homozygous, –ch <0/1>: call positions as homozygous if they appear to be false heterozygotes. default: 0
–discrete_pruning, –dp <0/1>: use discrete heuristic to prune SNPs. default: 0
–error_analysis_mode, –ea <0/1>: compute switch confidence scores and print to haplotype file but don’t split blocks or prune. default: 0
Advanced Options:
–new_format, –nf <0/1>: use new Hi-C fragment matrix file format (but don’t do h-trans error modeling). default: 0
–max_iter, –mi
–maxcut_iter, –mc
–htrans_read_lowbound, –hrl
–htrans_max_window, –hmw