http://virological.org/t/the-proximal-origin-of-sars-cov-2/398
SARS-CoV-2的近期起源(BY Google translate)
克裏斯蒂安·安德森(Kristian G.Andersen)1,2 *,安德魯·朗巴特(Andrew Rambaut)3,伊恩·利普金(W.Ian Lipkin)4,愛德華·霍姆斯(Edward C.Holmes)5和羅伯特·F·加裏(Robert F.Garry)6,7
1美國斯克裏普斯研究所(Scripps Research Institute)免疫學和微生物學係,美國加利福尼亞州92037。
2Scripps Research Translational Institute,拉荷亞,CA 92037,美國。
3英國愛丁堡愛丁堡大學進化生物學研究所。
4美國紐約哥倫比亞大學梅爾曼公共衛生學院感染與免疫中心。
5澳大利亞悉尼大學生命與環境科學學院和醫學學院瑪麗·巴希爾傳染病與生物安全研究所。
6杜蘭大學醫學院,美國路易斯安那州新奧爾良微生物學和免疫學係。
7 Zalgen Labs,LCC,美國馬裏蘭州日耳曼敦。
*通訊作者:
克裏斯蒂安·安德森(Kristian G.Andersen),免疫學和微生物學係, 斯克裏普斯研究所,拉荷亞,CA 92037,美國。
自中國湖北省武漢市首次報道新型肺炎(COVID-19)以來,關於致病性病毒SARS-CoV-2的起源一直存在大量討論和不確定性。現在,SARS-CoV-2感染在中國非常普遍,每個省都有病例。截至2020年2月14日,已經確診了64,473起此類病例,其中1,384人死於該病毒。由於報告的輕度和無症狀病例有限,這些官方病例數可能被低估了,並且該病毒顯然能夠有效地進行人與人之間的傳播。基於可能傳播到醫療體係較弱的國家,世界衛生組織宣布COVID-19疫情為國際關注的突發公共衛生事件(PHEIC)。目前尚無針對該疾病的疫苗或特異性療法。
SARS-CoV-2是已知感染人類的??冠狀病毒科的第七名成員。其中三種病毒,SARS CoV-1,MERS和SARS-CoV-2,可以引起嚴重的疾病。四,HKU1,NL63,OC43和229E與輕度呼吸道症狀有關。本文中,我們回顧了從可用基??因組序列數據的比較分析中可以推斷出SARS-CoV-2的起源和早期進化的方法。特別是,我們提供了有關SARS-CoV-2基因組中顯著特征的觀點,並討論了可能出現這些特征的場景。重要的是,該分析提供了證據,表明SARS-CoV-2不是實驗室構建物也不是有意操縱的病毒。
下文描述的α-和β-冠狀病毒(Coronaviridae家族)的基因組比較確定了SARS-CoV-2基因組的兩個顯著特征:(i)基於結構模型和早期生化實驗,SARS-CoV-2似乎針對與人ACE2受體結合; (ii)SARS-CoV-2的高度可變的穗狀蛋白(S)通過插入十二個核苷酸在S1和S2邊界具有一個多堿基(弗林蛋白酶)切割位點。另外,該事件導致在多堿基切割位點附近獲得了三個預測的O-連接的聚糖。
SARS-CoV-2受體結合結構域的突變
SARS-CoV和SARS相關冠狀病毒的刺突蛋白中的受體結合域(RBD)是病毒基因組中變化最大的部分。 RBD中的六個殘基似乎對於與人ACE2受體結合並確定宿主範圍至關重要。使用基於SARS-CoV的Urbani應變的坐標,它們是Y442,L472,N479,D480,T487和Y4911。 SARS-CoV-2中的相應殘基為L455,F486,Q493,S494,N501和Y505。與它最密切相關的病毒Ra Rh13相比,SARS-CoV-2中的這六個殘基中有五個是突變的,它是從Rhinolophus affinis bat采樣而來的,與?96%相同(圖1a)。根據建模1和生化實驗3,4,SARS-CoV-2似乎具有一種RBD,它可能與人,非人靈長類,雪貂,豬和貓以及其他具有高受體的物種對ACE2的親和力很高。同源性1。相比之下,SARS-CoV-2在與SARS樣病毒相關的其他物種(包括齧齒動物和蜂巢)中與ACE2的結合效率可能較低。
SARS-CoV-2 S蛋白中第486位殘基的苯丙氨酸(F)對應於SARS-CoV Urbani菌株中的L472。值得注意的是,在SARS-CoV細胞培養實驗中,L472突變為苯丙氨酸(L472F)5,據預測,苯丙氨酸對於SARS-CoV RBD與人ACE2受體6的結合是最佳的。然而,蝙蝠的幾個SARS樣冠狀病毒中也存在此位置的苯丙氨酸(圖1a)。盡管這些分析表明SARS-CoV-2可能能夠以高親和力結合人ACE2受體,但相互作用並不被認為是最佳的。此外,SARS-CoV-2的RBD中的幾個關鍵殘基與先前描述的最適合人ACE2受體結合的殘基不同6。與這些計算預測相反,最近的結合研究表明,SARS-CoV-2
多元切割位點和O-連接聚糖
SARS-CoV-2的第二個顯著特征是在刺蛋白的兩個亞基S1和S2(圖1b)8,9的連接處的刺蛋白中有一個預測的多元切割位點(RRAR)。除了兩個堿性精氨酸和一個在切割位點的丙氨酸外,還插入了一個脯氨酸。因此,完全插入的序列是PRRA(圖1b)。脯氨酸插入產生的強烈轉角預計會導致在多堿基切割位點側翼的S673,T678和S686中添加O-連接的聚糖。以前在相關譜係Bβ冠狀病毒中未發現多堿基切割位點,這是SARS-CoV-2的獨特特征。一些人β冠狀病毒,包括HCoV-HKU1(譜係A),具有多堿基切割位點,以及在S1 / S2切割位點附近具有預測的O-連接聚糖。
盡管尚不清楚SARS-CoV-2中多堿基切割位點的功能後果,但SARS-CoV的實驗表明,在S1 / S2交界處改造這樣的位點可增強細胞-細胞融合,但不會影響病毒的進入10。多元裂解位點允許弗林蛋白酶和其他蛋白酶有效裂解,並且可以在選擇快速複製和傳播病毒的條件下(例如高密度雞群)在禽流感病毒血凝素(HA)蛋白的兩個亞基的連接處獲得)。 HA在細胞-細胞融合和病毒進入中的功能與冠狀病毒S蛋白相似。通過插入或重組獲得HA中的多元切割位點,可將低致病性禽流感病毒轉化為高致病性形式11-13。在細胞培養物中或動物反複傳代後,還觀察到流感病毒HA獲得了多價切割位點14,15。同樣,無毒的新城疫病毒分離株在雞的連續傳代過程中通過在融合蛋白亞基的交界處逐漸獲得一個多價裂解位點而成為高致病性16。三種預測的O-連接聚糖的潛在功能尚不清楚,但是它們可以產生一個“粘蛋白樣結構域”,該結構域可以屏蔽SARS-CoV-2穗蛋白上的潛在表位或關鍵殘基。需要進行生化分析或結構研究,以確定是否利用了預測的O-連接的聚糖位點。
圖1.(a)SARS-CoV-2刺突蛋白接觸殘基的突變。將SARS-CoV-2的突突蛋白(上圖)與最密切相關的SARS樣CoV和SARS-CoV-1進行比對。與ACE2受體接觸的刺突蛋白中的關鍵殘基在SARS-CoV-2和SARS-CoV Urbani菌株中均標有藍色框。 (b)獲得多元裂解位點和O-連接的聚糖。多元裂解位點標記為灰色,三個相鄰的預測的O-連接的聚糖標記為藍色。多元裂解位點和O-連接的聚糖都是SARS-CoV-2特有的,以前在譜係B beta冠狀病毒中沒有見過。顯示的序列來自NCBI GenBank,登錄號為MN908947,MN996532,AY278741,KY417146,MK211376。穿山甲冠狀病毒序列是從SRR10168377和SRR10168378(NCBI BioProject PRJNA573298)18,19產生的共有序列。
SARS-CoV-2起源的理論
SARS-CoV-2不可能通過實驗室處理現有的SARS相關冠狀病毒而出現。如上所述,SARS-CoV-2的RBD針對人ACE2受體的結合進行了優化,結合的有效結合溶液不同於已經預測的結合溶液。此外,如果已經進行了遺傳操作,則可以預期將使用可用於β冠狀病毒的幾種反向遺傳係統之一。但是,事實並非如此,因為遺傳數據表明SARS-CoV-2並非源自任何先前使用的病毒主鏈17。相反,我們提出了兩種可以合理解釋SARS-CoV-2起源的方案:(i)人畜共患病轉移之前在非人類動物宿主中的自然選擇,以及(ii)人畜共患病轉移之後在人類中的自然選擇。我們還討論了在傳代過程中進行選擇是否會引起相同的觀察特征。
在動物宿主中進行選擇
由於許多早期的COVID-19病例與武漢的華南海鮮和野生動植物市場有關,因此該地點可能存在動物來源。鑒於SARS-CoV-2與蝙蝠類似SARS的CoV(尤其是RaTG13)的相似性,蝙蝠充當SARS-CoV-2的宿主是有道理的。但是,重要的是要注意,先前在人類中爆發的冠狀病毒包括直接接觸蝙蝠以外的動物,包括帶有遺傳上與SARS-CoV-1或分別為MERS-CoV。以此類推,與SARS-Cov-2密切相關的病毒可能正在傳播
對人類的隱秘適應
SARS-CoV-2的祖先也有可能從非人類動物躍遷到人類,具有上述基因組特征是通過在隨後的人與人之間的傳播過程中進行適應而獲得的。我們推測,一旦(同時或連續)獲得了這些適應措施,它將使疫情得以爆發,從而產生足夠大且異常的肺炎病例群,從而觸發最終發現它的監視係統。
到目前為止,所有測序的SARS-CoV-2基因組都具有很好的適應性RBD和多堿基切割位點,因此是從具有這些特征的共同祖先獲得的。穿山甲中存在一種與SARS-CoV-2中非常相似的RBD,這意味著即使我們還沒有確切的非人類祖細胞病毒,它也可能已經存在於跳躍到人類的病毒中。 。這使得多價切割位點插入發生在人與人之間的傳播過程中。以甲型流感病毒HA基因為例,需要特定的插入或重組事件才能使SARS-CoV-2成為流行病原體。
使用當前可用的基因組序列數據估算SARS-CoV-2的最新共同祖先(tMRCA)的時間,表明病毒在2019年11月下旬至12月上旬出現,20,21與最早的回顧性確診病例一致。因此,這種情況假定在最初的人畜共患病轉移事件與多堿基切割位點的獲取之間存在一段無法識別的人類傳播時期。如果以前有許多人畜共患病事件在很長一段時間內產生人與人之間傳播的短鏈(所謂的“口吃鏈”),則可能會有足夠的機會。這基本上是阿拉伯半島MERS-CoV的情況,其中所有人類病例都是病毒從單峰駱駝反複跳出的結果,產生了單個感染或傳播的短鏈最終得以解決。迄今為止,在過去8年中發生了2499例病例之後,還沒有出現人類適應症,這使MERS-CoV不能在人群中紮根。
我們如何測試SARS-CoV-2的秘密傳播是否能使人類適應?儲存血清樣品的基因組學研究可能會提供重要信息,但鑒於病毒血症的時間相對較短,因此可能無法在曆史樣品中檢測到低水平的SARS-CoV-2循環。回顧性血清學研究可能會提供參考,並且已經進行了一些此類研究。有人發現動物進口貿易商對冠狀病毒的血清陽性率為13%,而另一人指出,中國南方一個村莊的居民中有3%對這些病毒呈血清陽性24。有趣的是,武漢的200名居民沒有顯示冠狀病毒的血清反應活性。然而,至關重要的是,這些研究無法區分陽性血清反應是由於先前感染SARS-CoV-1還是-2。應該進行進一步的回顧性血清學研究,以確定先前人類在不同地理區域接觸過β-冠狀病毒的程度,尤其是使用可以區分多種β-冠狀病毒的測定方法。
通過時的選擇
BSL-2多年來在世界各地的多個實驗室中進行了有關蝙蝠SARS樣冠狀病毒在細胞培養和/或動物模型中傳代的基礎研究25。還記錄了在BSL-2密閉環境下工作的實驗室人員在實驗室獲得SARS-CoV-1的實例29,30。因此,我們必須考慮有意或無意釋放SARS-CoV-2的可能性。從理論上講,SARS-CoV-2有可能在適應細胞培養傳代過程中獲得觀察到的RBD突變位點,正如在SARS-CoV5和MERS-CoV31研究中所觀察到的那樣。然而,如果有功能,則獲得多元裂解位點或O-連接的聚糖的說法反對這種情況。僅在細胞培養物或動物中長時間傳播低致病性禽流感病毒後,才觀察到新的多元切割位點。此外,通過細胞培養或動物傳代產生SARS-CoV-2將需要事先分離具有非常高遺傳相似性的祖病毒。然後,要在細胞培養物或具有類似於人的ACE-2受體的動物(例如雪貂)中進行大量傳代程序,就需要隨後的多堿基切割位點的產生。還懷疑在細胞培養傳代中是否會發生O-連接聚糖的產生,因為這種突變通常表明免疫係統的參與,這在體外是不存在的。
結論
在全球COVID-19公共衛生緊急情況中,有理由懷疑流行病的起因是什麽。對動物病毒如何越過物種邊界來如此有效地感染人類的??詳細了解將有助於預防未來的人畜共患病事件。例如,如果SARS-CoV-2已預先適應另一種動物,那麽即使目前的流行病得到控製,我們也有未來再發生事件的風險。相反,如果我們描述的適應性過程發生在人類中,那麽即使我們重複了人畜共患病的轉移,除非發生相同係列的突變,它們也不太可能起飛。此外,鑒定出SARS-CoV-2的最親近的動物親屬將大大有助於病毒功能的研究。確實,RaTG13 bat序列的可用性促進了此處進行的比較基因組分析,有助於揭示RBD中的關鍵突變以及多堿基切割位點的插入。
此處描述的基因組特征可以部分解釋SARS-CoV-2在人類中的傳染性和傳播性。盡管基因組證據不支持SARS-CoV-2是實驗室構建體的觀點,但目前尚無法證明或否認本文所述的其起源的其他理論,並且尚不清楚未來的數據是否會幫助解決此問題。確定直接的非人類動物來源並從中獲得病毒序列將是揭示病毒起源的最確定的方法。此外,獲得有關該病毒的更多遺傳和功能數據將是有幫助的,包括受體結合以及多堿基切割位點和預測的O-連接聚糖作用的實驗研究。同樣,對SARS-CoV-2潛在中間宿主的鑒定以及包括與武漢市場無關的早期病例的測序也將具有很高的信息意義。不管SARS-CoV-2的起源如何,對人類和其他動物進行肺炎的持續監測顯然至關重要。
致謝
我們感謝所有為SAISA-CoV-2基因組序列貢獻到GISAID數據庫(https://www.gisaid.org/ 25)和為Virological.org 16(http://virological.org/ 4)。我們感謝惠康基金會的支持。 ECH由ARC澳大利亞獲獎者獎學金(FL170100022)支持。 NIGA授予1U19AI135995-01支持KGA。 AR得到了Wellcome Trust(協作者獎206298 / Z / 17 / Z – ARTIC網絡)和歐洲研究理事會(授權協議號725422 – ReservoirDOCS)的支持。
The Proximal Origin of SARS-CoV-2
Kristian G. Andersen1,2*, Andrew Rambaut3, W. Ian Lipkin4, Edward C. Holmes5 & Robert F. Garry6,7
1Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
2Scripps Research Translational Institute, La Jolla, CA 92037, USA.
3Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK.
4Center for Infection and Immunity, Mailman School of Public Health of Columbia University, New York, New York, USA.
5Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia.
6Tulane University, School of Medicine, Department of Microbiology and Immunology, New Orleans, LA, USA.
7Zalgen Labs, LCC, Germantown, MD, USA.
*Corresponding author:
Kristian G. Andersen
Department of Immunology and Microbiology,
The Scripps Research Institute,
La Jolla, CA 92037,
USA.
Since the first reports of a novel pneumonia (COVID-19) in Wuhan city, Hubei province, China there has been considerable discussion and uncertainty over the origin of the causative virus, SARS-CoV-2. Infections with SARS-CoV-2 are now widespread in China, with cases in every province. As of 14 February 2020, 64,473 such cases have been confirmed, with 1,384 deaths attributed to the virus. These official case numbers are likely an underestimate because of limited reporting of mild and asymptomatic cases, and the virus is clearly capable of efficient human-to-human transmission. Based on the possibility of spread to countries with weaker healthcare systems, the World Health Organization has declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC). There are currently neither vaccines nor specific treatments for this disease.
SARS-CoV-2 is the seventh member of the Coronaviridae known to infect humans. Three of these viruses, SARS CoV-1, MERS, and SARS-CoV-2, can cause severe disease; four, HKU1, NL63, OC43 and 229E, are associated with mild respiratory symptoms. Herein, we review what can be deduced about the origin and early evolution of SARS-CoV-2 from the comparative analysis of available genome sequence data. In particular, we offer a perspective on the notable features in the SARS-CoV-2 genome and discuss scenarios by which these features could have arisen. Importantly, this analysis provides evidence that SARS-CoV-2 is not a laboratory construct nor a purposefully manipulated virus.
The genomic comparison of both alpha- and betacoronaviruses (family Coronaviridae ) described below identifies two notable features of the SARS-CoV-2 genome: (i) based on structural modelling and early biochemical experiments, SARS-CoV-2 appears to be optimized for binding to the human ACE2 receptor; (ii) the highly variable spike (S) protein of SARS-CoV-2 has a polybasic (furin) cleavage site at the S1 and S2 boundary via the insertion of twelve nucleotides. Additionally, this event led to the acquisition of three predicted O-linked glycans around the polybasic cleavage site.
Mutations in the receptor binding domain of SARS-CoV-2
The receptor binding domain (RBD) in the spike protein of SARS-CoV and SARS-related coronaviruses is the most variable part of the virus genome. Six residues in the RBD appear to be critical for binding to the human ACE2 receptor and determining host range1. Using coordinates based on the Urbani strain of SARS-CoV, they are Y442, L472, N479, D480, T487, and Y4911. The corresponding residues in SARS-CoV-2 are L455, F486, Q493, S494, N501, and Y505. Five of these six residues are mutated in SARS-CoV-2 compared to its most closely related virus, RaTG13 sampled from a Rhinolophus affinis bat, to which it is ~96% identical2 (Figure 1a). Based on modeling1 and biochemical experiments3,4, SARS-CoV-2 seems to have an RBD that may bind with high affinity to ACE2 from human, non-human primate, ferret, pig, and cat, as well as other species with high receptor homology1. In contrast, SARS-CoV-2 may bind less efficiently to ACE2 in other species associated with SARS-like viruses, including rodents and civets1.
The phenylalanine (F) at residue 486 in the SARS-CoV-2 S protein corresponds to L472 in the SARS-CoV Urbani strain. Notably, in SARS-CoV cell culture experiments the L472 mutates to phenylalanine (L472F)5, which is predicted to be optimal for binding of the SARS-CoV RBD to the human ACE2 receptor6. However, a phenylalanine in this position is also present in several SARS-like CoVs from bats (Figure 1a). While these analyses suggest that SARS-CoV-2 may be capable of binding the human ACE2 receptor with high affinity, the interaction is not predicted to be optimal1. Additionally, several of the key residues in the RBD of SARS-CoV-2 are different to those previously described as optimal for human ACE2 receptor binding6. In contrast to these computational predictions, recent binding studies indicate that SARS-CoV-2 binds with high affinity to human ACE27. Thus the SARS-CoV-2 spike appears to be the result of selection on human or human-like ACE2 permitting another optimal binding solution to arise. This is strong evidence that SARS-CoV-2 is not the product of genetic engineering.
Polybasic cleavage site and O-linked glycans
The second notable feature of SARS-CoV-2 is a predicted polybasic cleavage site (RRAR) in the spike protein at the junction of S1 and S2, the two subunits of the spike protein (Figure 1b)8,9. In addition to two basic arginines and an alanine at the cleavage site, a leading proline is also inserted; thus, the fully inserted sequence is PRRA (Figure 1b). The strong turn created by the proline insertion is predicted to result in the addition of O-linked glycans to S673, T678, and S686 that flank the polybasic cleavage site. A polybasic cleavage site has not previously been observed in related lineage B betacoronaviruses and is a unique feature of SARS-CoV-2. Some human betacoronaviruses, including HCoV-HKU1 (lineage A), have polybasic cleavage sites, as well as predicted O-linked glycans near the S1/S2 cleavage site.
While the functional consequence of the polybasic cleavage site in SARS-CoV-2 is unknown, experiments with SARS-CoV have shown that engineering such a site at the S1/S2 junction enhances cell–cell fusion but does not affect virus entry10. Polybasic cleavage sites allow effective cleavage by furin and other proteases, and can be acquired at the junction of the two subunits of the haemagglutinin (HA) protein of avian influenza viruses in conditions that select for rapid virus replication and transmission (e.g. highly dense chicken populations). HA serves a similar function in cell-cell fusion and viral entry as the coronavirus S protein. Acquisition of a polybasic cleavage site in HA, by either insertion or recombination, converts low pathogenicity avian influenza viruses into highly pathogenic forms11-13. The acquisition of polybasic cleavage sites by the influenza virus HA has also been observed after repeated forced passage in cell culture or through animals14,15. Similarly, an avirulent isolate of Newcastle Disease virus became highly pathogenic during serial passage in chickens by incremental acquisition of a polybasic cleavage site at the junction of its fusion protein subunits16. The potential function of the three predicted O-linked glycans is less clear, but they could create a “mucin-like domain” that would shield potential epitopes or key residues on the SARS-CoV-2 spike protein. Biochemical analyses or structural studies are required to determine whether or not the predicted O-linked glycan sites are utilized.
Figure 1. (a) Mutations in contact residues of the SARS-CoV-2 spike protein. The spike protein of SARS-CoV-2 (top) was aligned against the most closely related SARS-like CoVs and SARS-CoV-1. Key residues in the spike protein that make contact to the ACE2 receptor are marked with blue boxes in both SARS-CoV-2 and the SARS-CoV Urbani strain. ( b) Acquisition of polybasic cleavage site and O-linked glycans. The polybasic cleavage site is marked in grey with the three adjacent predicted O-linked glycans in blue. Both the polybasic cleavage site and O-linked glycans are unique to SARS-CoV-2 and not previously seen in lineage B betacoronaviruses. Sequences shown are from NCBI GenBank, accession numbers MN908947, MN996532, AY278741, KY417146, MK211376. The pangolin coronavirus sequences are a consensus generated from SRR10168377 and SRR10168378 (NCBI BioProject PRJNA573298)18,19.
Theories of SARS-CoV-2 origins
It is improbable that SARS-CoV-2 emerged through laboratory manipulation of an existing SARS-related coronavirus. As noted above, the RBD of SARS-CoV-2 is optimized for human ACE2 receptor binding with an efficient binding solution different to that which would have been predicted. Further, if genetic manipulation had been performed, one would expect that one of the several reverse genetic systems available for betacoronaviruses would have been used. However, this is not the case as the genetic data shows that SARS-CoV-2 is not derived from any previously used virus backbone17. Instead, we propose two scenarios that can plausibly explain the origin of SARS-CoV-2: (i) natural selection in a non-human animal host prior to zoonotic transfer, and (ii) natural selection in humans following zoonotic transfer. We also discuss whether selection during passage in culture could have given rise to the same observed features.
Selection in an animal host. As many of the early cases of COVID-19 were linked to the Huanan seafood and wildlife market in Wuhan, it is possible that an animal source was present at this location. Given the similarity of SARS-CoV-2 to bat SARS-like CoVs, particularly RaTG13, it is plausible that bats serve as reservoir hosts for SARS-CoV-2. It is important, however, to note that previous outbreaks of betacoronaviruses in humans involved direct exposure to animals other than bats, including civets (SARS) and camels (MERS), that carry viruses that are genetically very similar to SARS-CoV-1 or MERS-CoV, respectively. By analogy, viruses closely related to SARS-Cov-2 may be circulating in one or more animal species. Initial analyses indicate that Malayan pangolins ( Manis javanica ) illegally imported into Guangdong province contain a CoV that is similar to SARS-CoV-218,19. Although the bat virus RaTG13 remains the closest relative to SARS-CoV-2 across the whole genome, the Malayan pangolin CoV is identical to SARS-CoV-2 at all six key RBD residues (Figure 1). However, no pangolin CoV has yet been identified that is sufficiently similar to SARS-CoV-2 across its entire genome to support direct human infection. In addition, the pangolin CoV does not carry a polybasic cleavage site insertion. For a precursor virus to acquire the polybasic cleavage site and mutations in the spike protein suitable for human ACE2 receptor binding, an animal host would likely have to have a high population density – to allow natural selection to proceed efficiently – and an ACE2 gene that is similar to the human orthologue. Further characterization of CoVs in pangolins and other animals that may harbour SARS-CoV-like viruses should be a public health priority.
Cryptic adaptation to humans. It is also possible that a progenitor to SARS-CoV-2 jumped from a non-human animal to humans, with the genomic features described above acquired through adaptation during subsequent human-to-human transmission. We surmise that once these adaptations were acquired (either together or in series) it would enable the outbreak to take-off, producing a sufficiently large and unusual cluster of pneumonia cases to trigger the surveillance system that ultimately detected it.
All SARS-CoV-2 genomes sequenced so far have the well adapted RBD and the polybasic cleavage site, and are thus derived from a common ancestor that had these features. The presence of an RBD in pangolins that is very similar to the one in SARS-CoV-2 means that this was likely already present in the virus that jumped to humans, even if we don’t yet have the exact non-human progenitor virus. This leaves the polybasic cleavage site insertion to occur during human-to-human transmission. Following the example of the influenza A virus HA gene, a specific insertion or recombination event is required to enable the emergence of SARS-CoV-2 as an epidemic pathogen.
Estimates of the timing of the most recent common ancestor (tMRCA) of SARS-CoV-2 using currently available genome sequence data point to virus emergence in late November to early December 201920,21, compatible with the earliest retrospectively confirmed cases22. Hence, this scenario presumes a period of unrecognised transmission in humans between the initial zoonotic transfer event and the acquisition of the polybasic cleavage site. Sufficient opportunity could occur if there had been many prior zoonotic events producing short chains of human-to-human transmission (so-called ‘stuttering chains’) over an extended period. This is essentially the situation for MERS-CoV in the Arabian Peninsula where all the human cases are the result of repeated jumps of the virus from dromedary camels, producing single infections or short chains of transmission that eventually resolve. To date, after 2,499 cases over 8 years, no human adaptation has emerged that has allowed MERS-CoV to take hold in the human population.
How could we test whether cryptic spread of SARS-CoV-2 enabled human adaptation? Metagenomic studies of banked serum samples could provide important information, but given the relatively short period of viremia it may be impossible to detect low level SARS-CoV-2 circulation in historical samples. Retrospective serological studies potentially could be informative and a few such studies have already been conducted. One found that animal importation traders had a 13% seropositivity to coronaviruses23, while another noted that 3% residents of a village in Southern China were seropositive to these viruses24. Interestingly, 200 residents of Wuhan did not show coronavirus seroreactivity. Critically, however, these studies could not have distinguished whether positive serological responses were due to a prior infection with SARS-CoV-1 or -2. Further retrospective serological studies should be conducted to determine the extent of prior human exposure to betacoronaviruses in different geographic areas, particularly using assays that can distinguish among multiple betacoronaviruses.
Selection during passage. Basic research involving passage of bat SARS-like coronaviruses in cell culture and/or animal models have been ongoing in BSL-2 for many years in multiple laboratories across the world25-28. There are also documented instances of the laboratory acquisition of SARS-CoV-1 by laboratory personnel working under BSL-2 containment29,30. We must therefore consider the possibility of a deliberate or inadvertent release of SARS-CoV-2. In theory, it is possible that SARS-CoV-2 acquired the observed RBD mutations site during adaptation to passage in cell culture, as has been observed in studies with SARS-CoV5 as well as MERS-CoV31. However, the acquisition of the polybasic cleavage site or O-linked glycans - if functional - argues against this scenario. New polybasic cleavage sites have only been observed after prolonged passaging of low pathogenicity avian influenza virus in cell culture or animals. Furthermore, the generation of SARS-CoV-2 by cell culture or animal passage would have required prior isolation of a progenitor virus with a very high genetic similarity. Subsequent generation of a polybasic cleavage site would have then required an intense program of passage in cell culture or animals with ACE-2 receptor similar to humans (e.g. ferrets). It is also questionable whether generation of the O-linked glycans would have occurred on cell culture passage, as such mutations typically suggest the involvement of an immune system, that is not present in vitro .
Conclusions
In the midst of the global COVID-19 public health emergency it is reasonable to wonder why the origins of the epidemic matter. A detailed understanding of how an animal virus jumped species boundaries to infect humans so productively will help in the prevention of future zoonotic events. For example, if SARS-CoV-2 pre-adapted in another animal species then we are at risk of future re-emergence events even if the current epidemic is controlled. In contrast, if the adaptive process we describe occurred in humans, then even if we have repeated zoonotic transfers they are unlikely to take-off unless the same series of mutations occurs. In addition, identifying the closest animal relatives of SARS-CoV-2 will greatly assist studies of virus function. Indeed, the availability of the RaTG13 bat sequence facilitated the comparative genomic analysis performed here, helping to reveal the key mutations in the RBD as well as the polybasic cleavage site insertion.
The genomic features described here may in part explain the infectiousness and transmissibility of SARS-CoV-2 in humans. Although genomic evidence does not support the idea that SARS-CoV-2 is a laboratory construct, it is currently impossible to prove or disprove the other theories of its origin described here, and it is unclear whether future data will help resolve this issue. Identifying the immediate non-human animal source and obtaining virus sequences from it would be the most definitive way of revealing virus origins. In addition, it would be helpful to obtain more genetic and functional data about the virus, including experimental studies of receptor binding and the role of the polybasic cleavage site and predicted O-linked glycans. The identification of a potential intermediate host of SARS-CoV-2, as well as the sequencing of very early cases including those not connected to the Wuhan market, would similarly be highly informative. Irrespective of how SARS-CoV-2 originated, the ongoing surveillance of pneumonia in humans and other animals is clearly of utmost importance.
Acknowledgements
We thank all those who have contributed SARS-CoV-2 genome sequences to the GISAID database (https://www.gisaid.org/ 25) and contributed analyses and ideas to Virological.org 16 (http://virological.org/ 4). We thank the Wellcome Trust for supporting this work. ECH is supported by an ARC Australian Laureate Fellowship (FL170100022). KGA is supported by NIH grant 1U19AI135995-01. AR is supported by the Wellcome Trust (Collaborators Award 206298/Z/17/Z – ARTIC network) and the European Research Council (grant agreement no. 725422 – ReservoirDOCS).