Phylogenetic network analysis of SARS-CoV-2 genomes
摘要:
In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide.
Google 翻譯:
在對160個完整的人類嚴重急性呼吸係統綜合症冠狀病毒2(SARS-Cov-2)基因組進行的係統進化網絡分析中,我們發現了三個主要的變異體,它們的氨基酸變化不同,我們將其命名為A,B和C,其中A為根據蝙蝠的外祖先型冠狀病毒。 A和C類型在東亞以外地區(即歐洲人和美國人中)的比例很高。相比之下,B型是東亞最常見的類型,其祖先基因組似乎沒有先突變成衍生的B型就沒有在東亞以外傳播,這表明在亞洲以外對該類型的創始人具有影響力或免疫或環境抵抗力。該網絡忠實地跟蹤了記錄在案的冠狀病毒病2019(COVID-19)病例的感染途徑,這表明係統進化網絡同樣可以成功地用於幫助跟蹤未記錄的COVID-19感染源,然後可以對其進行隔離以防止疾病複發全世界。
Fig. 1.
Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.