How does ChatGPT choose its sources?

We’re close to a future where people won’t browse the internet anymore—AI tools will do it for us. As these tools gain more traction, content creators will have to figure out how to get AI tools to cite them. Instead of just worrying about SEO, we’ll have to convince the AI tools that our content is worthy of being included—and, crucially, cited—in the output, so people click through.

But what criteria do AI tools use to select content when browsing? I decided to ask the most authoritative source I could find: ChatGPT itself.

Of course, I trust but verify, so I combined its explanations with my own testing, and compiled this guide based on the most recent available version of GPT-4.

A word of caution: ChatGPT (along with all AI chatbots) is often updated, so the way it prioritizes content is evolving. I’d expect that how AI chooses its sources will change frequently and, like Google’s ranking criteria, remain relatively opaque. The insights here are mostly based on my testing and tinkering with the models, as well as my experience of how the models work under the hood.

With that in mind, here are some of the criteria that ChatGPT is using to decide what to surface when it browses the internet.

1. Multiple, precise keywords

Many popular AI tools display their search queries as they browse, offering a glimpse into how they interpret requests and the types of terms they commonly use. (Note: if the tool doesn’t show its search terms, you can prompt it to reveal the terms it used.)

The first thing you’ll likely notice is that these tools typically use multiple search queries, review multiple sites, and then aggregate the results. That means that if you want them to find your site, you need to consider your ranking across several terms, not just one. Tools like Perplexity might use multiple search terms simultaneously, while ChatGPT tends to sequentially search based on the results they initially find.

Importantly though, they don’t search quite the same way as many humans do. As far as I’ve found, the tools tend to translate the search question into a statement—they don’t use the question text. So while humans might enter “How do I fix a leaky faucet?” the AI will alter it to something like “how to fix a leaky faucet detailed guide.” (It’s kind of like how DALL·E 3 will alter your prompts for you to get you what it thinks are better results.)

ChatGPT telling the user what search terms it used (

When the AI tool creates search terms, it attempts to make the terms concise and as specific as possible. ChatGPT tells me it will choose “fix leaky faucet” rather than just “faucet problem” to get more specific answers. This checked out when I tested it.

ChatGPT—so it tells me—may also attempt to use terms that are specialized to an industry, which is handy when you might not know what those terms are. For instance, instead of “sustainable building materials,” it claims it might use “eco-friendly materials for green construction.” However, my experience with this was mixed—it frequently parroted my terms in its search instead of zhuzhing them up.

2. Search intent

ChatGPT also claims it tries to figure out your intent from your prompt and translates this into its search terms. When I tested this, it did indeed check out: the AI will pretty consistently attempt to figure out your intent and often append “intent terms” like “tutorial,” “guide,” or “examples.” As a result, the sites it surfaced and cited frequently had these exact terms in the page title.

3. Recency

Tools like ChatGPT have the ability to use a recency filter to find the most up-to-date information. You do too—most search engines let you search within the last week or month—but AI tools seem to lean on this functionality a lot.

That means that for any content that benefits from recent information, including trends, it needs to be really new to make it into that cutoff. Recency, of course, matters in search engine rankings, too, but with AI tools, this cutoff can be especially short—in extreme cases, that same day or the day before. When I asked about trends, I found the AI tools will often only consider results within the last week or month.

ChatGPT telling the user that it set the recency to the last 30 days.

Sometimes, to surface new information, the tools append a year on the search query. But because the models are only updated every once in a while, sometimes it searches for the previous year (oops). They also add terms like “current,” “latest,” or “recent,” instead of or in conjunction with using the date cutoff.

4. Credibility

Currently, AI tools appear to assess and prioritize search results using criteria familiar to SEO and content specialists. Echoing Google’s E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) framework, AI tools favor authoritative sources with a well-established online presence.

Tools like ChatGPT appear to already do a reasonable job of assessing the credibility of the sources it pulls up through its search function. It gives high marks for information by well-known outlets and objective sources, prioritizes information it considers comprehensive, and gives high marks to blogs of reputable brands.

It also seems to prefer official sources for certain types of information. When I asked it for information about public health guidelines, legal regulations, or statistical data, ChatGPT tended to consider official government or international organizations’ websites the most trustworthy. For example, when searching for information about new regulations coming into effect in Canada, it used only the official website rather than the approximately one million law firms who wrote content about it.