New York City’s “MyCity” AI chatbot is off to a rough start. City government rolled out the tech five months ago in an attempt to aid residents interested in running a business in the Big Apple in finding helpful information.
While the bot will happily answer your questions with what appears on the surface to be legitimate answers, an investigation by The Markup discovered the bot lies—a lot. When asked if an employer can take a cut of their employees’ tips, for example, the bot says yes, even though the law says bosses can’t take employee tips. When asked if buildings are required to take section 8 vouchers, the bot answers with a no, even though landlords can’t discriminate based on a prospective tenant’s source of income. When asked if you can make your store cashless, the bot says go ahead, when in reality, cashless establishments have been banned in NYC since the beginning of 2020—when it says, “there are no regulations in New York City that require businesses to accept cash as a form of payment,” it’s full of shit.
To the city’s credit, the site does warn users not to rely solely on the chatbot’s responses in place of professional advice, and to verify any statements via the provided links. The problem is, some answers don’t include links at all, making it even more difficult to check whether what the bot is saying is factually accurate. Which begs the question: Who is this technology for?
AI has a tendency to hallucinate
This story won’t be shocking to anyone who has been following recent developments in AI. It turns out that chatbots just make stuff up sometimes. It’s called hallucinating: AI models, trained to respond to user queries, will confidently conjure up an answer based on their training data. Since these networks are so complicated, it’s tough to know exactly when or why a bot will choose to spin a certain piece of fiction in response to your question, but it happens a lot.
It’s not really New York City’s fault that its chatbot is hallucinating that you can stiff your employees out of their tips: Their bot runs on Microsoft’s Azure AI, a common AI platform that businesses like AT&T, Reddit, and Volkswagen all use for various services. The city likely paid for access to Microsoft’s AI technology to power their chatbot in an honest effort to help out New Yorkers interested in starting a business, only to find that the bot hallucinates wildly incorrect answers to important questions.
When will hallucinations stop?
It’s possible these unfortunate situations will soon be behind us: Microsoft has a new safety system in place to catch and protect customers from the darker sides of AI. In addition to tools to help block hackers from utilizing your AI as a malicious tool and evaluate potential security vulnerabilities inside the AI platforms, Microsoft is rolling out Groundedness Detection, which can monitor for potential hallucinations and intervene when necessary. (“Ungrounded” is another term for hallucination.)
When Microsoft’s system detects a possible hallucination, it can enable customers to test the current version of the AI against the one that existed before it was deployed; point out the hallucinated statement and either fact check it or engage in “knowledge base editing,” which presumably allows you to edit the underlying training set to eliminate the issue; rewrite the hallucinated statement before sending it out to the user; or evaluate the quality of synthetic training data before using it to generate new synthetic data.
Microsoft’s new system runs on a separate LLM called the Natural Language Inference (NLI), which constantly evaluates claims from AI based on the source data. Of course, since the system fact-checking the LLM is itself an LLM, couldn’t the NLI hallucinate its own analysis? (Probably! I kid, I kid. Kinda.)
This could mean that organizations like New York City that power their products with Azure AI could have a real-time hallucination-busting LLM on the case. Maybe when the MyCity chatbot tries to say that you can run a cashless business in New York, the NLI will quickly correct the claim, so what you see as the end user will be the real, accurate answer.
Microsoft only just rolled out this new software, so it’s not clear yet how well it will work. But for now, if you’re a New Yorker, or anyone using a government-run chatbot to find answers to legitimate questions, you should take those answers with a grain of salt. I don’t think “the MyCity chatbot said I could!” is going to hold up in court.