Perscient

Weighing Found Wanting

Rusty Guinn

Of all the problems that face software packages that analyze text to extract useful information, the tyranny of topic is by far the most dangerous. But this is most often an accidental problem. No one really sets out to build a noun-detection Rube Goldberg device, even if that’s what most natural language processing solutions effectively resolve to.

But measuring sentiment? That is the whole point ofmany text analysis products. Unfortunately, sentiment analysis usually fails just as spectacularly to answer any meaningful questions about the text, much less to provide correct intuition for what anyone ought to do about it.

And yes, I am being slightly uncharitable.

It is true, for example, that in certain narrow cases, sentiment analysis can be very useful. Consider a regular document or release in a rigid format where only a few dozen words might change from month to month to reflect updated information or the thinking of the people involved. Infinancial markets, the official minutes from meetings of the Federal Reserve Board of Governors are precisely this kind of document. Knowing if the words that changed went from “good” words to “bad” words or vice versa is really useful in this case because it is an exercise with very few degrees of freedomon the tone and nature of the language used and with context that is largely static. Now, useful is a relative term in a world where some investors are processing these publications and trading on their analysis within milliseconds of release (or, depending on your predilection for conspiracy,within seconds before their official release). But the analysis itself has utility, even if any opportunity to do anything useful with it has largely been competed into oblivion.

There are other sufficiently constrained texts with similarly static context where simple sentiment analysis can be useful. Some product reviews with tight character limits or where the text entry blocks provide context of their own (“best features”, “things you didn’t like aboutit”) fit the bill. Wall Street research publications or corporate earnings reports. Regularly published updates like “stock reports” or “power rankings”in sports, that sort of thing. But outside of these narrow cases, long-time customers of sentiment analysis product nearly always find that the sentiment grades toooften don’t make any sense at all.

The reason is simple: the network of meanings in heavilysymbolic human language is far too complex, context-dependent, and interdependent to be reduced to a system of weighing individual words against one another.

Consider for a moment a text that is probably familiar to you – something like America’s Declaration of Independence. It contains some of humanity’s most uplifting words.

We hold these truths to beself-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty andthe pursuit of Happiness.

Of course, the document then spends twenty-something bulletpoints accusing the British Crown of all manner of abuses in no uncertain terms.

So tell me, is the Declaration of Independence a positive document or a negative document? Now, would adding another bullet of charges against the crown have made it a more negative document in any sense that meant something to you? Would adding three or four more rhetorical flourishes in celebration of liberty and the rights of man in the preamble have changed your answer at all?

Consider another text that you might remember, or at least that you have always supposed existed. Let’s think about a 1992 news article announcing the sentencing of serial killer Jeffrey Dahmer.

Is the news of the conviction and sentencing of a serial killer a positive or negative document? Is Dahmer’s claim that he wanted any money from selling rights to his life story to go to victims’ families a slightly positive sentence to you because it says money(yay!) and life story (yay!) but also victims (boo!), or do you process how you feel about that on an entirely different axis that takes into account the context of the piece. Surely we can agree that “Jeffrey Dahmer, you have become a hero for a few, but you have become a nightmare for so many more” is a neutral sentiment sentence, since hero and nightmare offset each other in the scales so beautifully.

These may seem like extreme examples, but they are not. News coverage, blogs, and research reports are, as a rule, presented to us in forms where the weight of “good” words and “bad” words has practically nothing to do with how you or I would actually interpret how a thing was being framed or how a story was being told. That’s because the goodness or badness of a word or phrase cannot be weighed independently. You can’t place a word in isolation on a scale and say, “this word has bad sentiment.” That word is a small part of a network of meaning that requires the context of everything around it.

The natural response of some practitioners has been to build or rely on more sophisticated language models than simple libraries which associate words with degrees of goodness and badness. These tools, whether custom-trained models or off-the-shelf LLMs, can and do incorporate more of that missing context. Still, for us this raises a much more interesting question: if you’re going to the trouble of incorporating additional semantic layers and training to make sure that you’re correctly capturing the affect and context of a document, why stop there? Why stop at “the words about the soupare mostly good” when you have the tools to quantify the emergence ofstories of specific meaning like “the soup was well-seasoned” and “the soup was an excellent value” and “the soup complemented the rest of the tasting menu well” and “the soup was perfect for the season?”

You live in a world where capturing previously abstracteddimensions in text at scale is now well within our grasp, and an entireindustry is using those new tools to…reduce that dimensionality right back intoa cardboard cutout.

By and large, the options seem to boil down to that (flat,one-dimensional topic extraction or sentiment analysis products) or, as we’ll explore in our next post, AI-generated executive summary slop. Unfortunately, the latter is just as problematic, even if the problem it conveys is quite literally the opposite: so much dimensionality that fully half of them don’t even exist.

‍