How does ChatGPT choose its sources?

How does ChatGPT choose its sources?

We’re close to a future where people won’t browse the internet anymore—AI tools will do it for us. As these tools gain more traction, content creators will have to figure out how to get AI tools to cite them. Instead of just worrying about SEO, we’ll have to convince the AI tools that our content is worthy of being included—and, crucially, cited—in the output, so people click through.

But what criteria do AI tools use to select content when browsing? I decided to ask the most authoritative source I could find: ChatGPT itself. 

Of course, I trust but verify, so I combined its explanations with my own testing, and compiled this guide based on the most recent available version of GPT-4.

A word of caution: ChatGPT (along with all AI chatbots) is often updated, so the way it prioritizes content is evolving. I’d expect that how AI chooses its sources will change frequently and, like Google’s ranking criteria, remain relatively opaque. The insights here are mostly based on my testing and tinkering with the models, as well as my experience of how the models work under the hood.

With that in mind, here are some of the criteria that ChatGPT is using to decide what to surface when it browses the internet.

1. Multiple, precise keywords

Many popular AI tools display their search queries as they browse, offering a glimpse into how they interpret requests and the types of terms they commonly use. (Note: if the tool doesn’t show its search terms, you can prompt it to reveal the terms it used.)

The first thing you’ll likely notice is that these tools typically use multiple search queries, review multiple sites, and then aggregate the results. That means that if you want them to find your site, you need to consider your ranking across several terms, not just one. Tools like Perplexity might use multiple search terms simultaneously, while ChatGPT tends to sequentially search based on the results they initially find.

Importantly though, they don’t search quite the same way as many humans do. As far as I’ve found, the tools tend to translate the search question into a statement—they don’t use the question text. So while humans might enter “How do I fix a leaky faucet?” the AI will alter it to something like “how to fix a leaky faucet detailed guide.” (It’s kind of like how DALL·E 3 will alter your prompts for you to get you what it thinks are better results.)

ChatGPT telling the user what search terms it used (

When the AI tool creates search terms, it attempts to make the terms concise and as specific as possible. ChatGPT tells me it will choose “fix leaky faucet” rather than just “faucet problem” to get more specific answers. This checked out when I tested it.

ChatGPT—so it tells me—may also attempt to use terms that are specialized to an industry, which is handy when you might not know what those terms are. For instance, instead of “sustainable building materials,” it claims it might use “eco-friendly materials for green construction.” However, my experience with this was mixed—it frequently parroted my terms in its search instead of zhuzhing them up. 

2. Search intent

ChatGPT also claims it tries to figure out your intent from your prompt and translates this into its search terms. When I tested this, it did indeed check out: the AI will pretty consistently attempt to figure out your intent and often append “intent terms” like “tutorial,” “guide,” or “examples.” As a result, the sites it surfaced and cited frequently had these exact terms in the page title.

3. Recency 

Tools like ChatGPT have the ability to use a recency filter to find the most up-to-date information. You do too—most search engines let you search within the last week or month—but AI tools seem to lean on this functionality a lot.

That means that for any content that benefits from recent information, including trends, it needs to be really new to make it into that cutoff. Recency, of course, matters in search engine rankings, too, but with AI tools, this cutoff can be especially short—in extreme cases, that same day or the day before. When I asked about trends, I found the AI tools will often only consider results within the last week or month.

ChatGPT telling the user that it set the recency to the last 30 days.

Sometimes, to surface new information, the tools append a year on the search query. But because the models are only updated every once in a while, sometimes it searches for the previous year (oops). They also add terms like “current,” “latest,” or “recent,” instead of or in conjunction with using the date cutoff.

4. Credibility

Currently, AI tools appear to assess and prioritize search results using criteria familiar to SEO and content specialists. Echoing Google’s E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) framework, AI tools favor authoritative sources with a well-established online presence.

Tools like ChatGPT appear to already do a reasonable job of assessing the credibility of the sources it pulls up through its search function. It gives high marks for information by well-known outlets and objective sources, prioritizes information it considers comprehensive, and gives high marks to blogs of reputable brands. 

It also seems to prefer official sources for certain types of information. When I asked it for information about public health guidelines, legal regulations, or statistical data, ChatGPT tended to consider official government or international organizations’ websites the most trustworthy. For example, when searching for information about new regulations coming into effect in Canada, it used only the official website rather than the approximately one million law firms who wrote content about it.

ChatGPT using a government website to answer a question

5. Trustworthiness

There are a few things that go into trustworthiness, according to ChatGPT and my testing:

  • The author’s bio is one of the things that ChatGPT claims it uses when it decides whether or not that source is credible: it prioritizes people who are experts in their field as well as experienced journalists. Affiliations with well-known institutions are also prioritized. (And in my tests, product sites specific to a certain kind of product reviews—like software or appliances—tended to be chosen over general sites.)

  • Objectivity (or, at least, the appearance of it) is prioritized, and the tools attempt to deprioritize sensationalist writing. ChatGPT also claims to downgrade sources based on areas of possible bias, including sites for whom affiliate marketing might influence the results, noting inherent biases of company blogs toward their own products. In my tests, it did somewhat steer clear of these sites, but not totally.

  • Transparency and information provenance, like citing sources, is important to getting the AI tool to consider a source credible. 

  • Methodology matters—for instance, how something was tested or how products were ranked. (This matches Google’s algorithm too).

Having said all that, the tools tend to choose from the top 20ish sites that come up when you run your own search, so top-ranking sites are more likely to be used, regardless of their objectivity. The bots are still relying on the search engines for help, after all.

ChatGPT is, in theory, using Bing to search the web, but when I tested it, the top sites actually seemed to be from the first page of Google and not Bing. I’m wary of speculating too much here, but it’ll be interesting to see how different AI tools use different search engines and if that changes anything.

6. A variety of perspectives

ChatGPT told me that it attempts to balance the information by using sources from various viewpoints (a benefit of using multiple searches). But in my tests, I found that it still frequently uses roundup sites rather than sites with a specific perspective—whether that’s good or bad is debatable.

ChatGPT offering a variety of perspectives on a topic, with citations from all different sites

Following in search’s footsteps

In order to navigate the new landscape of AI-driven content, we’re all going to have to figure out how these tools search and how they prioritize recency and credibility. At the moment anyway, it does seem to be similar to the already-widely-scrutinized search engine ranking algorithms. I think that’s good—it means we already know a little bit about what to do.

Still, AI tools like ChatGPT are reshaping the criteria for digital visibility. For content creators, this means adapting strategies to not only produce timely, reputable, and human content but also to understand the evolving mechanics of AI search functionalities. 

These AI tools will continue to evolve, so staying informed and agile will be key to ensuring that your content not only reaches its intended audience but also stands out in the increasingly competitive market.

Related reading:

by Zapier