dataprovider

GenAI transparency and the living internet

September 2024 | dataprovider portfolio

Loura Kruger-Zwart

06 Dec 2024 — 7 min read

AI and its generative tools are here to stay, but that doesn’t mean it won’t take some getting used to. Its impacts are already being felt in workflow-enhancing applications and content creation capabilities, but the opacity behind its use remains troubling. What are companies and social media platforms doing to ensure integrity and credibility, and how can we as users stay sharp online? Read on as we discuss the problem of genAI transparency, explore the dead and living internet, and share ways to recognize genAI in the digital wild.

What’s going on with (generative) AI?

It’s both clear and inevitable that the internet is changing. The previous installment in our Modern Internet series delved into Internet Nostalgia and what’s changed in the last decade about the way we use the internet. It included some reasons behind these changes and how to recapture a bit of the magic of the ‘old internet,’ but the evolution of the online world is far from over. In just the last few years, mainstream access to and use of Artificial Intelligence (AI) has increased exponentially. AI—a term often used as a catch-all term for all things black-box and automated, here referring to Large Language Models, text-to-image generators, and machine learning algorithms—has positively influenced workflows as a tool and expanded what many have thought was possible.

AI generated picture of a globe with many connections and lines to technological symbols — Figure 1: An image generated with Microsoft Designer, from the prompt: “visualization of Machine to Machine interactions at a global scale”.

The release of ChatGPT and DALL-E preempted the takeoff of many more generative AI (genAI) tools, including Midjourney and Stable Diffusion for images, and Claude, Gemini, Copilot, or Aria for text. Many software tools have developed their own AIs for different purposes, such as Notion’s generator for longer, shorter, or simplified texts, or even the paraphrasers and summarizers of top-line spellcheckers like Grammarly or LanguageTool.

But beyond the marvels of AI, including generative technologies, there’s something else sort of sinister happening on the internet. Bots have been an issue on social media for many years, when considering fake followers and accounts spreading propaganda or fake news, but it’s taken what feels like a new form: AI-generated content—both images and texts—shared by what seems like AI-run accounts, being interacted with by other AI-operated ‘users’. This cycle ends up looking like faceless bots creating content for faceless bots, on platforms like Facebook meant specifically for, well, people with faces. Whether the point is engagement-farming or just using sheer volume to drown out other kinds of online content, we’re left asking: what’s actually going on here?

Three Facebook screenshots of posts containing AI pictures, aiming to pass as photographs — Figure 2: Three examples of Facebook posts containing AI-generated pictures, aiming to pass as photographs.

Machine-to-Machine, Bot-to-Bot?

Machine-to-machine (M2M) interactions and technologies are not new: they’ve been used in business contexts for years. Cybersecurity firms like Darktrace use AI to detect AI or other automatically generated emails, blocking them from client servers—while other AI learns this and works around it… before Darktrace figures out the workaround and blocks it… and the cycle continues. Payment terminals are essentially one machine contacting another machine (the bank’s tech) to approve and execute a transaction, and the Internet of Things (IoT) network is almost exclusively machine-to-machine communication. The web traffic from crawlers, the technology with which Google and Dataprovider.com index the internet, is classified as bot traffic, and this is also M2M tech. This is to say that there are many useful things that come out of M2M tech, but what's missing is the point of a bot-run and bot-interacted surface web.

a pie graph showing the percentages of bot, human, and unidentified traffic to google.nl — Figure 3: Sources of traffic to www.google.nl on September 22nd, 2024. Bot traffic is estimated to be at 28%, based on Dataprovider.com's proprietary Connection Index.

Internet analysts have been grappling with the “Dead Internet Theory,” the idea that the internet is already or soon will be devoid of human interaction in the way we’ve been used to in the last years. Explained here by two Australian tech academics, this theory contends that “many of the accounts that engage with [AI-generated] content also appear to be managed by artificial intelligence agents. This creates a vicious cycle of artificial engagement, one that has no clear agenda and no longer involves humans at all.” The motive here is unclear, and may or may not be malicious.

But the intention can be transparent too, with financial incentives being a major factor. Previously lively sites, like BuzzFeed, have come under fire for publishing low-effort AI content. From quizzes and listicles to travel guides, the company says“generative AI will replace the majority of static content… [and] lead to new formats that are more gamified, more personalized, and more interactive.” While generative AI can produce many times more results (from brainstorm notes to full-fledged articles) than a human worker can, and for a fraction of the cost, does this matter if people don’t actually want it? If the metric is ‘time-spent,’ as advertisers and stakeholders prefer, then AI-generated content, creating highly personalized and ‘endless’ experiences, is the right choice.

You have to ask: does this kind of internet experience really create happy, engaged users? Are you actually enjoying your time online, these days?

How AI transparency can help the internet

There are plenty of great uses for AI and genAI in the world today, from the artistic and entertainment value of creating something with these tools or enjoying the creations of others, to utilizing them to simplify your workflows. AI has become part of our online ecosystem, but the issue arises when we encounter AI, knowingly or unknowingly, where we don’t expect or appreciate it. At its core, the issue is not the existence of AI tools, but the transparency around its use. And just because it can be used doesn't mean it should… So what to do?

The key is to know your internet.

AI detection and Made By AI labels

Detection and tagging tools are still in their infancy, but are becoming increasingly useful for spotting AI-generated content online. Social media platforms like Vimeo and Instagram have added labels to mark posts made with AI (or at least provide AI info), which creates an avenue for voluntary transparency online. LinkedIn and TikTok are expected to follow suit in the near future. These are directed at visual content, like images and videos, which leaves some questions about text-based content: how can generated written work be detected? This is much trickier, as texts are far less likely to be labeled or obviously genAI influenced. Detection applications are not accurate enough to be reliable yet, so this is still difficult to spot and even more difficult to prove.

Screenshots showing Instagram's "AI info" label and the information given when clicked — Figure 3: Instagram's "AI info" label, the information given when clicked, and Meta's AI label policy.

Provenance

This means knowing your sources. Things are changing faster than we can keep up, but it’s important to recognize and critically view what we see online. Companies like Google and Adobe are already in the process of adding provenance trackers to image-based online content, the feature showing if something was taken with a camera, edited in Photoshop, or made with artificial intelligence. In the meantime, however, there’s a lot to learn about how to spot AI content, and understanding its markers can help us notice, flag, or avoid it. Trace the source of something questionable to determine its reliability, for example by finding the original poster, the artist or author, or other online mentions. And ultimately, we can avoid the online spaces that care less about AI transparency than we do.

Where to start:

Written content markers: check for signs like repetition, predictability, and oddly consistent sentence length and paragraph structure. For comparison, look up other examples of writing by that author or on that topic, and see how the style and content aligns. There’s no harm in checking claims against other sources, or reaching out to the author for clarity.
Visual content markers: there are a couple of well-known signs to look out for. Especially with images of people, you might spot hands with too many fingers or feet with too many toes, but the key is to zoom in. Are shapes in the background vague or distorted? Are cars facing the wrong direction on the road? Do lines or outlines start and stop in illogical places? Is the writing actually words, or just a jumble of lines and letters? Besides the brushed glow that genAI images tend to have, look out for things that would never appear in a photograph or video, or that (digital) artists would not purposefully include.
Quizzes: it’s actually quite entertaining to challenge yourself on your genAI recognition skills. Start with this image quiz and get more insights about spotting AI, then follow with this video test and check your assumptions. There are plenty of quizzes around to help keep you sharp.

Screenshot of two side-by-side images of ancient Egyptian sculptures, one of which is AI-generated — Figure 4: One of these images of ancient Egyptian sculptures is AI-generated—can you tell which? Via PCMag.com.

Reclaim the Living Internet

If the ‘Dead Internet’ is where humans are no longer creators and interactors, the Living Internet might be where we actually want to be. At risk of repeating the conclusions of our previous article about Internet Nostalgia, there’s much to be said about the return to an ‘old’ way of using the internet—a return to online communities. User-generated and official spaces are the online environments where writers, artists, and other creators have both the incentive and accountability for sharing living content: this can even include AI-assisted works if the use is transparent or otherwise expected. There is plenty of room for responsible innovation and the avoidance of ‘bionic duckweed’. Let’s create, support, and enjoy the Living Internet together.

This article is part of the Modern Internet series, where we dive into today's digital culture and the online landscape. Check out Internet Nostalgia for a head start on finding great spaces to hang out online, and keep an eye out for our upcoming pieces on low-tech internet solutions and the rise of mini-games where they're least expected.