Dear professor,
我今天一整天都在閱讀您的文章《廢止統計顯著性》。由於語言背景問題,對於我來說有點難以理解您的完整思想,所以不得不借助古狗翻譯。
I have been reading your another paper “Abandon Statistical Significance” all day today. It is a little bit hard to me to understand your whole ideas due to the linguistic background. I have to borrow 個google-translate.
我原本以為在統計學自身的邏輯係統中,統計顯著性應該不會成為一個問題。現在的它之所以成了一個問題應該是由於某種誤解。
I thought the statistical significance should not be a problem in the logical system of statistics itself. The problem might be caused by sort of misunderstandings.
讓我們以t檢驗為例。假設我們有了兩個樣本均數x_bar1和x_bar2,並且可以很容易得到它們之間的差值:x_bar1減去x_bar2。這個差值在經典的數學看來是一個絕對真,也就是它們之間確實存在差異,隻要其結果不等於0。但是,以統計學的觀點看,這個差值由兩個部分構成,或者說,它有兩個來源,一是係統誤差,另一個是隨機誤差。t檢驗法構造了一個t統計量來以概率測量這個隨機誤差在總差值中的大小。因此,我們不得不使用兩分法來對上述兩個樣本均數之間的差異作出某種判斷。我想這就是用t檢驗能夠發現的所謂的“顯著性”。
Let’s take the t-test as an example. We have two sample means x_bar1 and x_bar2, and easily to find the difference between them, x_bar1 minus x_bar2. This difference is absolutely true in a classical mathematical point of view . But in the statistical point of view, this difference is composed of two parts, or it has two different sources, one is systematic error, and the other is random error. The t-test constructed the t statistic to measure a probabilistic magnitude of the random error in the total difference. Therefore, we have to take dichotomization to make a judgement. I think this is the so-called significance that we can find with the t-test.
所以,如果總差異中隨機誤差發生的可能性小於某個臨界值,例如目前使用的0.05(也就是說,在t統計量的算法下隨機誤差占總差異不到5%),我們就可以說兩樣本均數之間的差異是顯著的;反之就不顯著。
So, we can say the difference is significant if the probability of random error is happening less than 0.05 threshold; otherwise it is not significant.
當然,這個0.05的臨界值是主觀確定的,但看起來我們也沒有其它辦法能確定一個“客觀的”臨界值。
Of course, the threshold 0.05 is arbitrarily made. But it looks like that we could have no other ways to do so.
然而,總差值中的係統誤差或隨機誤差的真實大小是未知的,我們也沒有辦法知道它們。上述的t統計量僅僅隻是提供了一個統計的途徑在概率尺度上估計它們。事實上,t統計量本身也是一個測量尺度。一旦把它概率化,我們就有了一個關於t分布的概率尺度。這就是為何我們能用一個t值得到一個概率值,即p值。
However, the true magnitude of either systematic error or random error in the total difference is unknown. We have no way to know them. The t statistic just provides a statistical way to estimate them in a probability scale. Actually the t statistic itself is also a measurement scale. Once it
is probabilized, we have the probability scale. That is why we can obtain a p-value through a t-value.
因此,我的觀點是,不是我們要如何癡迷兩分法。我們不得不采用兩分法是因為我們試圖檢驗的總差異隻有兩個來源。反之,如果我們不采用兩分法,我們將落入某種蒙昧的境地。
So, in my opinion, it is not that we are “dichotomania”. We have to take the dichotomy because the difference that we try to test has only two sources. In contrary, if we don’t take the dichotomy, we will fall into a situation with some ignorance.
在您的文章中,我看到您在多處說到“針對零效應或零係統誤差的無效假設”。我想說的是,“零係統誤差”應該被“總差異中的隨機誤差足夠大(即大到不可忽視)”所取代。對於統計的顯著性檢驗來說,這樣的陳述應該會更好,從而可以合理地解釋結果,並消除某些誤解。
I can see in your paper, you often say the null hypothesis for zero effect or zero systematic error. I would like to say that the “zero systematic error” should be replaced by “random error is large enough in the total difference”. This might be better for doing significance test and consequently to explain the results and eliminate some sorts of misunderstanding.
Best regards!
Yours sincerely,
Ligong Chen, MD/MPH