the platform has just declared war on them

No more using automated bots to extract public data from Reddit
Companies that do not have a licensing agreement will not be able to do so.

Artificial intelligence (AI) companies are hungry for data to train their models. One of the alternatives that they use the most to satisfy this appetite is web scrapinga technique that allows extracting and storing public information of web pages left and right. Most of the time this activity is carried out without the consent of the creators or licensees of the content, so there is no payment involved.

Reddit has announced a measure to stop the web scraping not wanted. The platform, which is home to millions of conversations on a wide variety of topics cataloged in subreddits, will prevent unauthorized companies from using its public content. This is a change at the level of backendspecifically in the robots.txt file exclusion protocol, which will be launched “in the coming weeks.”

Reddit, on the warpath with web scrapers

The aforementioned movement seeks to restrict access to the content of the firm led by Steve Huffman for those actors who They do not have an agreement with the platform. Over the past few months, we’ve seen tech giants like OpenAI, owner of ChatGPT, and Google, creator of Gemini, enter into partnerships with Reddit. In other words, if you don’t have an agreement, you’re locked out of accessing the data.

The changes announced this Wednesday have been reflected in the platform’s Public Content Policy. It should be noted that, although the company is declaring war on the web scrapers promises to continue offering its contents to researchers and academics. The platform also says it will guarantee access to moderators and organizations like the Internet Archive, which seeks to preserve online content.

In the world of AI that we are living in, not only text matters, but also images, music or videos. For a long time, as we have seen, companies they have “scraped” the web to feed your models with content of all kinds. Firms like OpenAI, however, are reluctant to answer in detail where the data they use comes from, and point out that they use licensed content, by agreement, and “publicly available” content.

The aforementioned, however, has not prevented a giant like The New York Times from suing Microsoft and OpenAI for copyright infringement. Or that record labels such as Sony Music, Warner Music and Universal Music launch a legal battle against music generators Suno AI and Udio for apparently using their songs. We are witnessing firsthand the battle for data to feed AI. In time we will know how all this will end.

Images | Reddit

In Xataka | YouTube sees a future where AI will clone today’s music. Convincing record companies is not going to be easy.

For Latest Updates Follow us on Google News