AI生成的文章數量,就已經超過了人類撰寫文章的數量
今後,我們將不得不接受一個現實,就是未來80%以上, 也許更多的內容都是AI生成的。。。
調查的原文
https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans
Methodology
CommonCrawl
Common Crawl maintains one of the largest publicly available web archives. It provides billions of URLs and is used by researchers and developers, and is a key data source for training large language models.
Selection of Articles
We need a representative sample of English-language articles on the web. To do so, we randomly select 65k URLs from CommonCrawl, and confirm that each is in English, has an article schema markup, is at least 100 words, has a publish date between January 2020 and May 2025, and is an article or listicle as classified by the Graphite page type classifier.
AI Detection Algorithm
Accurate detection of AI-generated content is required to make claims about the prevalence of AI-generated articles on the web. There is a considerable disagreement about the accuracy of AI detection algorithms, and many argue that detecting AI is impossible, or at best, highly inaccurate. Many companies offer AI detection algorithms, including Originality.ai, GPTZero, Grammarly, and Surfer.
To compute the percentage of AI-generated content in an article, we use the same algorithm described in our 2024 whitepaper, but classify each chunk using Surfer’s AI detector with a chunk size of 500 words. We classify an article as AI-generated if the algorithm predicts that more than 50% of the content is AI-generated, and human-written otherwise.