基因家族扩张收缩

使用方法等详细信息请关注公众号【生物信息分析学习】(swxxfxxx)

【提示】本文写作命令为:abysw blog 基因家族扩张收缩
【 另 】有任何问题,欢迎来公众号交流!

git clone https://github.com/hahnlab/CAFE5.git

The Github page for CAFE5 is https://github.com/hahnlab/CAFE5

Navigate to a directory that you typically keep source code in and do one of the following:

Download the latest release from the CAFE release directory https://github.com/hahnlab/CAFE5/releases

If you wish to get the latest version from source, you can run

$ git clone https://github.com/hahnlab/CAFE5.git

Please note that the released versions contain tested and approved code, while the latest source code may contain experimental and untested features. It is highly recommended to use a released version instead.

======================================================
#wget https://github.com/hahnlab/CAFE5/archive/refs/tags/v5.0.tar.gz
#tar zxvf v5.0.tar.gz

#wget https://github.com/hahnlab/CAFE5/releases/download/v5.0/CAFE5-5.0.0.tar.gz
#./configure
#make prefix=/abysw/ausr

还是编译不了,最近喜欢出幺蛾子。

conda create -n cafe5 cafe=5.0.0 -c bioconda

  1. 做物种分歧时间树
    mcmctree,用conda安装即可,不做赘述。
    准备文件:a,有校准点的有根树;
    b,phy比对文件;
    c,配置文件mcmctree.ctl

参考:
http://t.zoukankan.com/bio-mary-p-12818888.html
https://www.jianshu.com/p/46b28829b078

http://abacus.gene.ucl.ac.uk/software/paml.html

seed = -1
seqfile = mtCDNApri123.txt
treefile = mtCDNApri.trees
mcmcfile = mcmc.txt
outfile = out.txt
ndata = 3
seqtype = 0 * 0 : nucleotides; 1: codons; 2: AAs
usedata = 1 * 0: no data; 1:seq; 2:approximation; 3:out.BV (in.BV)
clock = 2 * 1: global clock; 2: independent; and 3: correlated rates
RootAge = ‘<1.0’ * safe constraint on root age, used if no fossil for root.
model = 0 * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
alpha = 0 * alpha for gamma rates at sites
ncatG = 5 * No. categories in discrete gamma
cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?
BDparas = 1 1 0.1 * birth, death, sampling
kappa_gamma = 6 2 * gamma prior for kappa
alpha_gamma = 1 1 * gamma prior for alpha
rgene_gamma = 2 20 1 * gammaDir prior for rate for genes
sigma2_gamma = 1 10 1 * gammaDir prior for sigma^2 (for clock=2 or 3)
finetune = 1: .1 .1 .1 .1 .1 .1 * auto (0 or 1) : times, rates, mixing…
print = 1 * 0: no mcmc sample; 1: everything except branch 2: ev…
burnin = 2000
sampfreq = 10
nsample = 20000

mcmctree mcmctree.ctl

用newick_utils进行物种树的处理:
conda install newick_utils
$ nw
nw_clade nw_distance nw_gen nw_match nw_order nw_reroot nw_topology
nw_condense nw_duration nw_indent nwns nw_prune nw_stats nw_trim
nw_display nw_ed nw_labels nwnsi nw_rename nw_support

有一个坑,当序列只有一块的时候设置ndata为1 。(遇到了报错就知道是什么意思了)

===================分歧时间树制作完成===============

cafe 准备文件:上述已完成
分歧树文件:
((((cat:68.710507,horse:68.710507):4.566782,cow:73.277289):20.722711,(((((chimp:4.444172,human:4.444172):6.682678,orang:11.126850):2.285855,gibbon:13.412706):7.211527,(macaque:4.567240,baboon:4.567240):16.056992):16.060702,marmoset:36.684935):57.315065):38.738021,(rat:36.302445,mouse:36.302445):96.435575);

准备基因家族文件:之前已完成
Desc Family ID human chimp orang baboon gibbon macaque marmoset rat mouse cat horse cow
ATPase ORTHOMCL1 52 55 54 57 54 56 56 53 52 57 55 54
(null) ORTHOMCL2 76 51 41 39 45 36 37 67 79 37 41 49
HMG box ORTHOMCL3 50 49 48 48 46 49 48 55 52 51 47 55
(null) ORTHOMCL4 43 43 47 53 44 47 46 59 58 51 50 55
Dynamin ORTHOMCL5 43 40 43 44 31 46 33 79 70 43 49 50
……
….
..
DnaJ ORTHOMCL10016 45 46 50 46 46 47 46 48 49 45 44

perl -ne ‘$i = “Descript” if /Orthogroup/; $i = “(null)” if /OG(\S+)/; @l = split; print join “\t”, $i, @l[0..$#l-1]; print “\n”‘ Orthogroups.GeneCount.tsv > mcmc.og

运行
啥也不管
$ cafe5 -i mammal_gene_families.txt -t mammal_tree.txt
啥都管
$ cafe5 -i mammal_gene_families.txt -t mammal_tree.txt -k 3
指定管啥
((((cat:1,horse:1):1,cow:1):1,(((((chimp:2,human:2):2,orang:1):1,gibbon:1):1,(macaque:1,baboon:1):1):1,marmoset:1):1):1,(rat:1,mouse:1):1);
$ cafe5 -i mammal_gene_families.txt -t mammal_tree.txt -y chimphuman_separate_lambda.txt

当然,你可以不做超度量树的。
anyway,
自己看(github)[https://github.com/hahnlab/CAFE5]吧,有什么新的见解可以去公众号告诉我。