LOL,job opening直接送到我的信箱了
正是中國病毒流行的時候,這個位置恰好是研究病毒和免疫係統的,還是這裏的外係不認識的同事送來的。。。要不然我也不會打開看了。除了病毒和免疫我要補課,要求的試驗技術我都能勝任,好多還是我的強項呢。。。就是病毒技術,似乎暗示不重要:你可以在病毒領域得到很好的訓練。。。
大城市哦,又是同一個大學係統,pension還可以繼續。得回家問問領導。反正明年五月以後我這裏還不知道有沒有funding呢。就看她舍不舍得後院的兩顆桃樹兩顆棗樹了。。。大城市房子貴,可以住apartment, 釣魚種菜就免了。。。生活好像沒有現在的樂趣多了。。。
(二0二0二月七日)
數錢——w-2表到了,今年能拿回來一千多
嘿嘿,每年小秘把稅相關的表格收集齊了,就不管了,隻催著填稅表或追問能拿回來多少錢,夠不夠出去吃一頓。這不,昨天最後一張w-2到了,今天就問有多少能拿回來的。隻好先毛估估:少交了一千多,兒子有兩千五百的education credit,最後能拿回來1000出頭。。。
今年不用追加IRA了。一來今年放進退休賬戶的錢根據去年的數字做了調整,保證marginal稅率就是12%;二來口袋裏也沒錢,正忙著給兒子湊大學第三年的錢呢。。。給小秘說:別怕,萬一不夠,咱就找小丫頭借,然後說沒錢還她,看她怎麽辦。。。哈哈
今晚隻請小朋友吃飯。都說他們離父母遠了,不能和爹娘一起過中國年,叔叔阿姨請他們來。。。可一個武漢肺炎嚇得隻好取消了。想想他們都回學校三個禮拜了,該安全了。。。校園至今隻有一個疑似病例,還不是大學學生,就請他們過來吃晚飯。。。可惜有一個今晚有事,隻能下次等我挖了野菜包餃子,請他爹娘一起來。。。老同學朋友都不錯,雖然比我掙得多多了,從不嫌棄我們窮博士後家的野菜和野生魚。。。
做多了,做多了,剩的還可以再開一個party
嗬嗬,讓兩個能吃的小夥子帶了點回去,剩下的我和領導三天不用開火了。。。烤紅薯烤排骨烤蝦,還有一鍋剩飯和水果。。。還好牛排給吃光了。
領導說,明天不用做飯,來家壇視察,看我網上打架。。。我說我有很多花衣服,你找得出來我讓你喝瓶我的啤酒。。。晚上被拷問了很久:誰是小秘?!。。。我說我是窮人啊,老婆當小秘用的。他們富人才小秘當老婆用。
(二0二0二月八日)
簡單科普一下DNA和RNA
RNA和DNA分子結構上很相似的,上麵帶的堿基都差不到。堿基是四種,RNA裏麵是A,G,C,U;DNA裏麵是A,G,C,T。RNA裏麵的U在DNA裏麵變成了T。DNA或RNA都可以形成雙鏈的,靠的是這些堿基配對,就是A:T或者G:C。如果出現RNA/DNA的雜交雙鏈,那就是A:U了,但G:C還是不變。
RNA和DNA的另一個差別就是Backbone(骨架上)。骨架上的糖單元,一個多一個氧原子(RNA),DNA比RNA少一個氧原子。因為骨架很相似,所以,我們寫DNA或RNA序列時,不寫骨架單元的。如寫DNA,就會寫成 5’GTACATTCGGAA,寫RNA序列,就是5’AGUCUUCTTGUAA。。。看到U你就知道是RNA了。上麵那個5’是表示序列的順序。因為可以寫成從左到右,也可以從右到左,有一個5’或3’,就不會出錯了。這個很重要的:在寫兩條鏈配對的時候,一條是從5’到3’;另一條就是從3’到5’。兩條DNA配對後就是這麽寫的:
很多人常常不寫5’或3’,所以,就統一規定:不寫的一律按5’到3’處理。這就是當年考研究生時的一個考題:序列AGU的配對序列是哪個:UCA還是ACU。。。
能形成雙鏈是它們能作為遺傳物質的根本需要:一條鏈可以作為模板複製出另一條。細胞一分為二,DNA就必須複製出第二份,才能分家。所以,DNA或RNA聚合酶就是幹這事。
冠狀病毒顆粒裏麵就隻有一條鏈的RNA。等RNA進了宿主細胞,RNA聚合酶就先複製另一條對應的RNA。這個複製出來的RNA鏈是病毒RNA的配對鏈,不能包裝成病毒,而是作為模板來產生千萬條可作包裝用的病毒RNA。。。
艾滋病病毒也是RNA。但它是先複製成RNA-DNA雜交雙鏈,然後再合成DNA-DNA雙鏈。這個DNA-DNA雙鏈能整合到人的DNA裏麵,然後再合成RNA,包裝成病毒。
RNA聚合酶和DNA聚合酶比較,一個特性就是總體來講RNA聚合酶的fidelity不如DNA聚合酶好。出現這個差別的原因是RNA聚合酶隻有合成的功能,就是把核苷酸按配對原則一個一個加上去。但配對出錯了,它沒有效正的功能。在用模板合成第二條鏈時,A:T/U或者G:C的配對規則遵守得越好,突變的可能就越低。對病毒來講突變多一些是好事--突變越多,找到新宿主的可能性越多,還能逃避宿主免疫係統的追殺,有利它這個品種在多變的環境裏的生存。。。病毒從一個宿主跳到另一個新宿主,就是靠這個突變來完成的。
(二0二0年二月十一日)
停藥
今天是這個冬天最冷的一天,早晨-5F,從停車場走到辦公室,鼻毛都凍硬了。。。沒辦法,今天還得早到:今天是老鼠卵巢癌模型的最後一天。昨天最後一次給藥後,今天作最後的檢測。本來都是中午作的,可今天動物房要調整,我們的檢測儀器需要搬動。為了保持數據一直,決定在搬動前把試驗作完。。。老板知道我喜歡走路去動物房,校園裏走十分鍾。他說:要不我明天早點來上班,先給你一個ride...
注射給藥的老鼠三天七天的結果已經知道了,檢測不到腫瘤;主要是口服的還得看看是不是到14天會縮小很多或完全消失:一個模型用的抗藥性細胞,知道有難度;另一個七天的時候就縮小了90%。
昨天老板通知說,接受往臨床推進的公司傳來好消息:他們用普通乳腺癌細胞(不抗藥)作的老鼠試驗,每7天給一次藥(注射),三個禮拜後,腫瘤手測(老式的用尺量,哈)消失。。。我們大笑:還手測呢。。。我們早就用分子標記的辦法多少年了。。。我們的方式靈敏度高多了,也合適卵巢癌子宮癌等等埋藏在身體深處的腫瘤。不然就得殺了老鼠才能測量了。
公司不用我們標記的細胞,也是花錢讓別的公司作的,估計也是希望第三方驗證我們的結果。他們的結果倒是無意幫了我們的忙:我們一直懷疑用藥後癌細胞突然大量死亡是不是可以啟動身體裏的免疫係統,然後對沒有死亡的癌細胞發起攻擊,不需要用藥也能清除。。。腫瘤的老鼠模型都得用免疫係統不健全的動物,不然會排斥人的細胞,這個難題我們一直沒有克服:找老鼠腫瘤細胞來替代人癌細胞的模型一直不成功;humanized老鼠有太貴。。。他們這種七天給一次藥,至少是支持我們的假設的。
腫瘤的免疫方向是目前癌症研究最熱門的地方。我們一直找不到切入點。給老板開玩笑:再搞點經費,我就可以在你這裏幹到退休了。。。
終於對退休有底了。。。
哈哈,w-2表到齊了,開始準備手填稅表。明天和領導shopping的路上一起去圖書館拿說明書,從頭到尾讀一遍,就可以自己填了。。。
前兩天就把可能拿回來的refund毛估估了一下。今天趁領導給父母打電話的時間段,整理一下去年兩老的開銷數字,為退休做準備。去年是兒子全年在大學生活,兩老“自立”的一年,全部數字加加減減,兩老去年基本生活開銷是35000。。。
LOL,不錯不錯,後年兒子大學畢業我們就可以退休了。今天給領導說:太冷了,就不買花了,好不好。再說拿著花走到車邊,幾分鍾花就凍死了,不如明天買打折的。
錢這樣才能省出來啊。。。
(二0二0年二月十四日)
情人節後送給老婆的花
懶蟲不肯起床,隻好我親自下廚。。。蛋花也是花啊。。。
下午去買下禮拜兩老的吃喝。店裏情人節一過,花就打折賣。Sam's的一把花巨大,還便宜一半,把領導樂開了花。我買其它的東西,她就不囉嗦了,隨便。。。趁機買兩版牛排,一版煙熏三紋魚。。。酒還沒喝完,下次再悄悄買。
動物實驗動物房
我沒在P3或P4級的實驗室幹過。但憑biosafety的邏輯也能理解。P3-P4的東西必須是隻能進,出必須消毒處理後才可以的。就是我們P2的生物垃圾,也是需要高溫高壓後才能扔進普通垃圾的。但20年前確實沒這麽嚴格。
動物房分級的。這個是因為不同的動物能對付的環境不一樣。我們做腫瘤模型的老鼠,免疫係統是不完整的。有完整免疫係統的老鼠會排斥人的癌細胞。因此,這些老鼠不能接觸病毒細菌等,所以他們是在最幹淨的實驗室裏麵。所有用具食品都是消毒處理後才能進我們的老鼠房間。
P3-P4級實驗室需要的動物不應該在普通的動物房,這應該是biosafety的常識。P3-P4實驗室需要動物,應該有自己特殊的動物房,在P4的設施之內。動物實驗完畢,應該是無害化處理,就是高溫高壓消毒的。
但不要以為做烈性病毒或細菌研究的都是P4。很多研究是把病毒或細菌的的基因克隆出來,就是沒有毒性的基因片段之後,給非P3-P4實驗室用的。也要記住很多病毒細菌不屬於P3-P4級別。我們常用的大腸杆菌是人體裏麵就有的;我們實驗用的人工病毒是不具備繁殖能力的,所以這些P2實驗室就可以作。美國有嚴格的biosafety分類。
(二0二0年二月十五日)
今年夏天的全家度假計劃搞好了
根據前幾年孩子們能接受的模式,女兒飛過去,我們兩老開車過去,這樣就省了女兒時間,也在當地有自己的車。今年和他們約好時間地點後,他們就象往年一樣,把計劃的事情交給老爸老媽。老媽是甩手掌櫃,所以,就隻能我照著google地圖,找好玩的地方了。
Niagara Fall,一天
Letchworth State Park,Corning Glass Museum,一天
Watkins Glen State Park,一兩個winery,一天
Robert H. Treman State Park,Buttermilk Falls State Park,Taughannock Falls State Park(就看waterfall,少走路),一天
Cascadilla Gorge Trail,Cornell University校園,一天
Cayuga湖環湖繞一圈,看幾個Winery,一天
一天機動,上麵那個要沒看完,或找到新的有趣活動,就保留這一天。
給兒子說好了,參觀winery老爸是要喝酒的,開車的任務就交給他了。。。反正他還沒到喝酒年齡,隻能看著我們喝。
(二0二0年二月十五日)
今晚拿啥下酒?
走了四mile,天氣不錯。禮拜四還是今年冬天最冷的日子,-5F;現在已經是41F了,明天據說是50F。領導在蒸饅頭,數數13個。。。酒友說,晚上喝一瓶吧。我說好。翻翻冰箱,臨時也來不及準備,幹脆就來個酸豆角夾饅頭算了。給領導說,一個芹菜炒豆腐幹,一個酸豆角炒肉末,加辣。。。
希望她明天還有饅頭當早飯。。。
嗬嗬,老了,一瓶酒下肚,啥痛都沒了
還說不學老美,不用cheese下紅酒呢,結果,幫酒友修完了馬桶,他拎著兩瓶酒,一塊cheese,上門來了。我家領導的13個饅頭,最後剩了五個。給酒友領導說:每次我們爺們高興喝兩杯,都是你們領導們嘀咕囉嗦。。。下次把家裏的馬桶水龍頭再搞壞幾個,我們酒就可以隨便喝了。。。兩瓶酒,總比找人上門修理便宜吧。。。
LOL,一個教授一個博士後,居然拆了馬桶,裝回去還漏水。。。拆了兩次裝了兩次,最後讓酒友領導這幾天注意點,漏水是不是好了。不行下次再拆了裝,再找個借口喝兩瓶。。。
爺們的手,都是老繭接老繭,拆了馬桶就端酒杯哦。。。
(二0二0年二月十六日)
中藥西藥的差別
傳統中藥是憑經驗來的。除了來自於自然界,也不需要分清到具體的化學成分(如分子結構),分子水平的作用機理。。。正是這種原因,現代科學尤其生物學和化學的知識沒有被廣泛用來提升它,很大程度上局限了它發展,也容易被人用來欺騙病人。。。簡單的如箐篙素,其實中藥都差不多失傳了,原因是它不能用傳統的煮中藥方式,它在高溫下會因為化學結構的改變而失去活性。。。屠呦呦先生也是試了很多提取方式,發現乙醚提取有效--乙醚可以低溫揮發,保持了箐篙素的化學結構。提取成功後,後麵的過程都是西醫的研究方式,搞清作用的原理,也搞清了高溫失活的化學原因,為進一步找類似藥物提供了理論基礎。
西醫利用現代生物學化學知識,搞清楚人體細胞裏分子之間的相互作用機製,針對不同的疾病在分子水平上進行幹預達到治療目的。雖然很多疾病還沒有找到合適的藥物,但前景是誘人的。很多人以為西藥不會“頭疼醫腳”,那是錯誤的理解。西醫還沒有發展到搞清身體裏的所有疾病的分子機製,或知道了機製但還沒有找到合適的藥物。作為科學,西醫對不能治療的疾病是承認的,如對付病毒除了疫苗抗體,目前就沒有針對性的特效藥。。。但發展的趨勢可以從過去100年看出來。西醫不僅僅靠天然的藥物,也能根據天然的化合物結構化學合成和找出更好更有效的藥物;不僅僅是小分子化合物,大分子的biologics,如對付過敏的針劑,腫瘤免疫療法裏的抗體等等。
中藥的經驗決定了它可能對一個病人有效,但對下一個有沒有借鑒,這個預見性就差了。西醫在一步一步進步,預見性也是從很差,到越來越好。根本原因就是從分子水平上來推理,失敗了可以有根有據地找原因。給你們說一個例子:很多人聽說了西藥clinic trial失敗的,這個不假。但同樣的藥物,後麵的數據分析往往會發現看起來失敗了的trial,可能在一個小的亞型病人裏成功了。。。為什麽呢?因為這個小分型是建立在新的分子基礎上的認知。。。舉個我熟悉的例子:乳腺癌分三個亞型,我們作的這個亞型最大,約占病人的70%。如果一個針對我們這亞型的藥,有效率是50%,那麽在一個所有乳腺癌病人參與clinic trail裏,有效率就隻有35%。。。這個邏輯如果被一個隻占總病人的10%的亞型來分析:對這個亞型有50%的效果的藥物,在總體病人的clinic trial數據裏,就隻有5%了,這很可能就在數據分析的誤差範圍內,會被認為失敗了。。。
西醫的進步就體現在分子水平的研究對病人分型會越來越細,如果每個亞型都找到對應的candidate藥物,在clinic trial時就會挑選這個亞型的病人,而不是經驗性質的乳腺癌/肝炎等等的所有病人。。。這就是目前的趨勢。就是我們,也在考慮如果上clinic trial怎麽從分子水平來給病人分型,然後挑選出合適的人選了。因為我們發現我們的candidate藥物不僅僅需要我們研究的這個亞型的標誌性分子,還有另外幾個分子也是需要的,換句話說,就是我們研究的這個亞型可能還能細分成不同的亞亞型,以前不知道,現在知道了。。。所以,分子水平的診斷分型,然後找合適的藥物,就成了未來生物醫藥研究的前景。。。
(二0二0年二月十七日)
人應該屬於社會性的動物,必要的社會圈子還是要的
我這兩年把自己封閉了一些,主要是家裏事情多,同時錢上麵因為每年要貢兒子三萬多,比較緊,很多華人的活動我都選擇了不去。不過,小範圍的活動還是保持著的。鄰居三五聚聚,老朋友來來往往,小朋友來熱鬧一下。。。把家裏空巢的日子點綴得還不象老人院。。。
感覺被人需要是最勵誌的。空巢了事情不多,時不時被鄰居朋友叫一下,伸個手幫過忙最有意思。其實來往的都是知根知底的,忙完了往往就多了個理由聚一起聊天喝酒喝茶。到了不敢多吃的年齡,誰也不在乎吃啥喝啥,就是social一下。。。
領導總問我退休了有啥計劃。我其實還沒有具體計劃,得等退休的時候數著錢才能計劃。搬家不搬家?啥時候退休?有沒有錢留給兒女?每次討論這個話題我倆都是在憧憬中結束:哪裏不錯呢,我還有一個小時候的朋友或大學同學在那裏;要不咱搬到那誰住的小區去?或者等兒女定下來不走了,咱搬過去湊熱鬧。。。
完全孤獨地離群索居,我倆不行。
我來解讀一下美英澳三國科學家關於武漢病毒不是人造病毒的證據
1。有兩個突變被懷疑是人為(一個是與病毒進入人細胞的部位有關;還有一個我記得是複旦(可能記錯)那邊的一位指出的,有沒有比複旦這位更早的,我不清楚)。。但這兩個被懷疑的突變都在其它病毒裏找到了,也就是自然界存在的。
2。與病毒進入人細胞的的功能有關的突變。這個突變使病毒顆粒更容易和人細胞表麵的ACE2蛋白結合,因此進入人細胞更容易。但:A。這個突變其實和以前作的一些研究的結論不完全重合,就是沒有完全采納以前有結論的突變;B。病毒重組的一些實驗室常用載體的DNA痕跡不存在;C。這個突變其實不僅僅和進入人細胞有關,和另外幾種動物的細胞也有關。因為那些動物的ACE2上和病毒結合的位點和人的ACE2一樣。這就是說病毒可能還經過了其它動物。
3。複旦(可能記錯)那邊的一位指出的突變, 禽流感病毒裏就有,功能還不完全清楚。一般來說,這個突變似乎與禽流感的毒性加大有關。但武漢病毒裏的這個突變其實不僅僅是加了這個禽流感裏麵的蛋白酶的切點,還多加了一個氨基酸。這個多餘的氨基酸應該是影響蛋白質三維結構的,可能與病毒表麵蛋白的糖基化有關,但需要證實。也就是說,如果人工改造病毒,幹嗎多加這個功能還不確定的氨基酸。
4。病毒的起源推測:蝙蝠的RaTG3病毒和武漢病毒的同源性高達96%,可能性最大;馬來西亞非法出口廣東的Malayan pangolins攜帶的一種冠狀病毒雖然同源性不如上麵這個RaTG3高,但它帶有武漢病毒裏和ACE2結合需要的全部六個氨基酸,也是懷疑對象。。。
5。到底是在動物身上先突變好了再傳到人,或傳到人後再突變,兩種可能目前沒法區分。但從不同病人身上分離出來的病毒的DNA序列都很接近,表明這次的武漢病毒是同一個源頭。(也就是中國武漢病毒所發表的DNA序列已經被別的實驗室分離的病毒證實)
結論:目前的DNA序列比較不支持武漢病毒是人工構建的,被懷疑的突變更象天然產生和自然篩選的。
******************************************************************************
病毒的起源,包括陰謀論,我大體上分三種:
1。病毒從野生動物到人。這個不算陰謀論。它的問題是:找到的最接近的蝙蝠病毒與武漢病毒是96%同源,所以,comment sense就是蝙蝠和人中間還有一個宿主。華南農大有報道說穿山甲上找到99%的同源病毒,要被證實,這個理論會被大多數人接受。但華南農大沒有後續的報道。。。找到中間宿主會強有力地支持這個理論。
2。人造病毒,然後不小心泄露或投毒。
3。天然病毒,實驗室因研究需要分離保存後,不小心泄露或投毒。
我個人認為,該文的證據基本排除了上麵的第二種可能。但第三種可能,就是天然病毒被實驗室分離後,再人為泄露(包括有意或無意),沒法排除。所以第一種和第三種可能會被長期爭執,直到找到中間宿主。。。
(二0二0年二月十八日)
附原文:
The Proximal Origin of SARS-CoV-2
Kristian G. Andersen1,2*, Andrew Rambaut3, W. Ian Lipkin4, Edward C. Holmes5 & Robert F. Garry6,7
1Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
2Scripps Research Translational Institute, La Jolla, CA 92037, USA.
3Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK.
4Center for Infection and Immunity, Mailman School of Public Health of Columbia University, New York, New York, USA.
5Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia.
6Tulane University, School of Medicine, Department of Microbiology and Immunology, New Orleans, LA, USA.
7Zalgen Labs, LCC, Germantown, MD, USA.
*Corresponding author:
Kristian G. Andersen
Department of Immunology and Microbiology,
The Scripps Research Institute,
La Jolla, CA 92037,
USA.
Since the first reports of a novel pneumonia (COVID-19) in Wuhan city, Hubei province, China there has been considerable discussion and uncertainty over the origin of the causative virus, SARS-CoV-2. Infections with SARS-CoV-2 are now widespread in China, with cases in every province. As of 14 February 2020, 64,473 such cases have been confirmed, with 1,384 deaths attributed to the virus. These official case numbers are likely an underestimate because of limited reporting of mild and asymptomatic cases, and the virus is clearly capable of efficient human-to-human transmission. Based on the possibility of spread to countries with weaker healthcare systems, the World Health Organization has declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC). There are currently neither vaccines nor specific treatments for this disease.
SARS-CoV-2 is the seventh member of the Coronaviridae known to infect humans. Three of these viruses, SARS CoV-1, MERS, and SARS-CoV-2, can cause severe disease; four, HKU1, NL63, OC43 and 229E, are associated with mild respiratory symptoms. Herein, we review what can be deduced about the origin and early evolution of SARS-CoV-2 from the comparative analysis of available genome sequence data. In particular, we offer a perspective on the notable features in the SARS-CoV-2 genome and discuss scenarios by which these features could have arisen. Importantly, this analysis provides evidence that SARS-CoV-2 is not a laboratory construct nor a purposefully manipulated virus.
The genomic comparison of both alpha- and betacoronaviruses (family Coronaviridae ) described below identifies two notable features of the SARS-CoV-2 genome: (i) based on structural modelling and early biochemical experiments, SARS-CoV-2 appears to be optimized for binding to the human ACE2 receptor; (ii) the highly variable spike (S) protein of SARS-CoV-2 has a polybasic (furin) cleavage site at the S1 and S2 boundary via the insertion of twelve nucleotides. Additionally, this event led to the acquisition of three predicted O-linked glycans around the polybasic cleavage site.
Mutations in the receptor binding domain of SARS-CoV-2
The receptor binding domain (RBD) in the spike protein of SARS-CoV and SARS-related coronaviruses is the most variable part of the virus genome. Six residues in the RBD appear to be critical for binding to the human ACE2 receptor and determining host range1. Using coordinates based on the Urbani strain of SARS-CoV, they are Y442, L472, N479, D480, T487, and Y4911. The corresponding residues in SARS-CoV-2 are L455, F486, Q493, S494, N501, and Y505. Five of these six residues are mutated in SARS-CoV-2 compared to its most closely related virus, RaTG13 sampled from a Rhinolophus affinis bat, to which it is ~96% identical2 (Figure 1a). Based on modeling1 and biochemical experiments3,4, SARS-CoV-2 seems to have an RBD that may bind with high affinity to ACE2 from human, non-human primate, ferret, pig, and cat, as well as other species with high receptor homology1. In contrast, SARS-CoV-2 may bind less efficiently to ACE2 in other species associated with SARS-like viruses, including rodents and civets1.
The phenylalanine (F) at residue 486 in the SARS-CoV-2 S protein corresponds to L472 in the SARS-CoV Urbani strain. Notably, in SARS-CoV cell culture experiments the L472 mutates to phenylalanine (L472F)5, which is predicted to be optimal for binding of the SARS-CoV RBD to the human ACE2 receptor6. However, a phenylalanine in this position is also present in several SARS-like CoVs from bats (Figure 1a). While these analyses suggest that SARS-CoV-2 may be capable of binding the human ACE2 receptor with high affinity, the interaction is not predicted to be optimal1. Additionally, several of the key residues in the RBD of SARS-CoV-2 are different to those previously described as optimal for human ACE2 receptor binding6. In contrast to these computational predictions, recent binding studies indicate that SARS-CoV-2 binds with high affinity to human ACE27. Thus the SARS-CoV-2 spike appears to be the result of selection on human or human-like ACE2 permitting another optimal binding solution to arise. This is strong evidence that SARS-CoV-2 is not the product of genetic engineering.
Polybasic cleavage site and O-linked glycans
The second notable feature of SARS-CoV-2 is a predicted polybasic cleavage site (RRAR) in the spike protein at the junction of S1 and S2, the two subunits of the spike protein (Figure 1b)8,9. In addition to two basic arginines and an alanine at the cleavage site, a leading proline is also inserted; thus, the fully inserted sequence is PRRA (Figure 1b). The strong turn created by the proline insertion is predicted to result in the addition of O-linked glycans to S673, T678, and S686 that flank the polybasic cleavage site. A polybasic cleavage site has not previously been observed in related lineage B betacoronaviruses and is a unique feature of SARS-CoV-2. Some human betacoronaviruses, including HCoV-HKU1 (lineage A), have polybasic cleavage sites, as well as predicted O-linked glycans near the S1/S2 cleavage site.
While the functional consequence of the polybasic cleavage site in SARS-CoV-2 is unknown, experiments with SARS-CoV have shown that engineering such a site at the S1/S2 junction enhances cell–cell fusion but does not affect virus entry10. Polybasic cleavage sites allow effective cleavage by furin and other proteases, and can be acquired at the junction of the two subunits of the haemagglutinin (HA) protein of avian influenza viruses in conditions that select for rapid virus replication and transmission (e.g. highly dense chicken populations). HA serves a similar function in cell-cell fusion and viral entry as the coronavirus S protein. Acquisition of a polybasic cleavage site in HA, by either insertion or recombination, converts low pathogenicity avian influenza viruses into highly pathogenic forms11-13. The acquisition of polybasic cleavage sites by the influenza virus HA has also been observed after repeated forced passage in cell culture or through animals14,15. Similarly, an avirulent isolate of Newcastle Disease virus became highly pathogenic during serial passage in chickens by incremental acquisition of a polybasic cleavage site at the junction of its fusion protein subunits16. The potential function of the three predicted O-linked glycans is less clear, but they could create a “mucin-like domain” that would shield potential epitopes or key residues on the SARS-CoV-2 spike protein. Biochemical analyses or structural studies are required to determine whether or not the predicted O-linked glycan sites are utilized.
figure
figure2718×1487 394 KB
Figure 1. (a) Mutations in contact residues of the SARS-CoV-2 spike protein. The spike protein of SARS-CoV-2 (top) was aligned against the most closely related SARS-like CoVs and SARS-CoV-1. Key residues in the spike protein that make contact to the ACE2 receptor are marked with blue boxes in both SARS-CoV-2 and the SARS-CoV Urbani strain. ( b) Acquisition of polybasic cleavage site and O-linked glycans. The polybasic cleavage site is marked in grey with the three adjacent predicted O-linked glycans in blue. Both the polybasic cleavage site and O-linked glycans are unique to SARS-CoV-2 and not previously seen in lineage B betacoronaviruses. Sequences shown are from NCBI GenBank, accession numbers MN908947, MN996532, AY278741, KY417146, MK211376. The pangolin coronavirus sequences are a consensus generated from SRR10168377 and SRR10168378 (NCBI BioProject PRJNA573298)18,19.
Theories of SARS-CoV-2 origins
It is improbable that SARS-CoV-2 emerged through laboratory manipulation of an existing SARS-related coronavirus. As noted above, the RBD of SARS-CoV-2 is optimized for human ACE2 receptor binding with an efficient binding solution different to that which would have been predicted. Further, if genetic manipulation had been performed, one would expect that one of the several reverse genetic systems available for betacoronaviruses would have been used. However, this is not the case as the genetic data shows that SARS-CoV-2 is not derived from any previously used virus backbone17. Instead, we propose two scenarios that can plausibly explain the origin of SARS-CoV-2: (i) natural selection in a non-human animal host prior to zoonotic transfer, and (ii) natural selection in humans following zoonotic transfer. We also discuss whether selection during passage in culture could have given rise to the same observed features.
Selection in an animal host. As many of the early cases of COVID-19 were linked to the Huanan seafood and wildlife market in Wuhan, it is possible that an animal source was present at this location. Given the similarity of SARS-CoV-2 to bat SARS-like CoVs, particularly RaTG13, it is plausible that bats serve as reservoir hosts for SARS-CoV-2. It is important, however, to note that previous outbreaks of betacoronaviruses in humans involved direct exposure to animals other than bats, including civets (SARS) and camels (MERS), that carry viruses that are genetically very similar to SARS-CoV-1 or MERS-CoV, respectively. By analogy, viruses closely related to SARS-Cov-2 may be circulating in one or more animal species. Initial analyses indicate that Malayan pangolins ( Manis javanica ) illegally imported into Guangdong province contain a CoV that is similar to SARS-CoV-218,19. Although the bat virus RaTG13 remains the closest relative to SARS-CoV-2 across the whole genome, the Malayan pangolin CoV is identical to SARS-CoV-2 at all six key RBD residues (Figure 1). However, no pangolin CoV has yet been identified that is sufficiently similar to SARS-CoV-2 across its entire genome to support direct human infection. In addition, the pangolin CoV does not carry a polybasic cleavage site insertion. For a precursor virus to acquire the polybasic cleavage site and mutations in the spike protein suitable for human ACE2 receptor binding, an animal host would likely have to have a high population density – to allow natural selection to proceed efficiently – and an ACE2 gene that is similar to the human orthologue. Further characterization of CoVs in pangolins and other animals that may harbour SARS-CoV-like viruses should be a public health priority.
Cryptic adaptation to humans. It is also possible that a progenitor to SARS-CoV-2 jumped from a non-human animal to humans, with the genomic features described above acquired through adaptation during subsequent human-to-human transmission. We surmise that once these adaptations were acquired (either together or in series) it would enable the outbreak to take-off, producing a sufficiently large and unusual cluster of pneumonia cases to trigger the surveillance system that ultimately detected it.
All SARS-CoV-2 genomes sequenced so far have the well adapted RBD and the polybasic cleavage site, and are thus derived from a common ancestor that had these features. The presence of an RBD in pangolins that is very similar to the one in SARS-CoV-2 means that this was likely already present in the virus that jumped to humans, even if we don’t yet have the exact non-human progenitor virus. This leaves the polybasic cleavage site insertion to occur during human-to-human transmission. Following the example of the influenza A virus HA gene, a specific insertion or recombination event is required to enable the emergence of SARS-CoV-2 as an epidemic pathogen.
Estimates of the timing of the most recent common ancestor (tMRCA) of SARS-CoV-2 using currently available genome sequence data point to virus emergence in late November to early December 201920,21, compatible with the earliest retrospectively confirmed cases22. Hence, this scenario presumes a period of unrecognised transmission in humans between the initial zoonotic transfer event and the acquisition of the polybasic cleavage site. Sufficient opportunity could occur if there had been many prior zoonotic events producing short chains of human-to-human transmission (so-called ‘stuttering chains’) over an extended period. This is essentially the situation for MERS-CoV in the Arabian Peninsula where all the human cases are the result of repeated jumps of the virus from dromedary camels, producing single infections or short chains of transmission that eventually resolve. To date, after 2,499 cases over 8 years, no human adaptation has emerged that has allowed MERS-CoV to take hold in the human population.
How could we test whether cryptic spread of SARS-CoV-2 enabled human adaptation? Metagenomic studies of banked serum samples could provide important information, but given the relatively short period of viremia it may be impossible to detect low level SARS-CoV-2 circulation in historical samples. Retrospective serological studies potentially could be informative and a few such studies have already been conducted. One found that animal importation traders had a 13% seropositivity to coronaviruses23, while another noted that 3% residents of a village in Southern China were seropositive to these viruses24. Interestingly, 200 residents of Wuhan did not show coronavirus seroreactivity. Critically, however, these studies could not have distinguished whether positive serological responses were due to a prior infection with SARS-CoV-1 or -2. Further retrospective serological studies should be conducted to determine the extent of prior human exposure to betacoronaviruses in different geographic areas, particularly using assays that can distinguish among multiple betacoronaviruses.
Selection during passage. Basic research involving passage of bat SARS-like coronaviruses in cell culture and/or animal models have been ongoing in BSL-2 for many years in multiple laboratories across the world25-28. There are also documented instances of the laboratory acquisition of SARS-CoV-1 by laboratory personnel working under BSL-2 containment29,30. We must therefore consider the possibility of a deliberate or inadvertent release of SARS-CoV-2. In theory, it is possible that SARS-CoV-2 acquired the observed RBD mutations site during adaptation to passage in cell culture, as has been observed in studies with SARS-CoV5 as well as MERS-CoV31. However, the acquisition of the polybasic cleavage site or O-linked glycans - if functional - argues against this scenario. New polybasic cleavage sites have only been observed after prolonged passaging of low pathogenicity avian influenza virus in cell culture or animals. Furthermore, the generation of SARS-CoV-2 by cell culture or animal passage would have required prior isolation of a progenitor virus with a very high genetic similarity. Subsequent generation of a polybasic cleavage site would have then required an intense program of passage in cell culture or animals with ACE-2 receptor similar to humans (e.g. ferrets). It is also questionable whether generation of the O-linked glycans would have occurred on cell culture passage, as such mutations typically suggest the involvement of an immune system, that is not present in vitro .
Conclusions
In the midst of the global COVID-19 public health emergency it is reasonable to wonder why the origins of the epidemic matter. A detailed understanding of how an animal virus jumped species boundaries to infect humans so productively will help in the prevention of future zoonotic events. For example, if SARS-CoV-2 pre-adapted in another animal species then we are at risk of future re-emergence events even if the current epidemic is controlled. In contrast, if the adaptive process we describe occurred in humans, then even if we have repeated zoonotic transfers they are unlikely to take-off unless the same series of mutations occurs. In addition, identifying the closest animal relatives of SARS-CoV-2 will greatly assist studies of virus function. Indeed, the availability of the RaTG13 bat sequence facilitated the comparative genomic analysis performed here, helping to reveal the key mutations in the RBD as well as the polybasic cleavage site insertion.
The genomic features described here may in part explain the infectiousness and transmissibility of SARS-CoV-2 in humans. Although genomic evidence does not support the idea that SARS-CoV-2 is a laboratory construct, it is currently impossible to prove or disprove the other theories of its origin described here, and it is unclear whether future data will help resolve this issue. Identifying the immediate non-human animal source and obtaining virus sequences from it would be the most definitive way of revealing virus origins. In addition, it would be helpful to obtain more genetic and functional data about the virus, including experimental studies of receptor binding and the role of the polybasic cleavage site and predicted O-linked glycans. The identification of a potential intermediate host of SARS-CoV-2, as well as the sequencing of very early cases including those not connected to the Wuhan market, would similarly be highly informative. Irrespective of how SARS-CoV-2 originated, the ongoing surveillance of pneumonia in humans and other animals is clearly of utmost importance.
Acknowledgements
We thank all those who have contributed SARS-CoV-2 genome sequences to the GISAID database (https://www.gisaid.org/ 25) and contributed analyses and ideas to Virological.org 16 (http://virological.org/ 4). We thank the Wellcome Trust for supporting this work. ECH is supported by an ARC Australian Laureate Fellowship (FL170100022). KGA is supported by NIH grant 1U19AI135995-01. AR is supported by the Wellcome Trust (Collaborators Award 206298/Z/17/Z – ARTIC network) and the European Research Council (grant agreement no. 725422 – ReservoirDOCS).