Latent Semantic Indexing
Under the hood and without the math
|
v1.0
|
This document provides some important insights about LSI that are commonly neglected.
It addresses what it can do, how it does it, what it
can't do, and important limitations and data assertions that are often hidden
by the smoke'n'mirrors of marketing. Best of all, we're not going to use any
math.

Everything you see here is backed up by this this document from Cambridge.
Latent Semantic Indexing
Latent Semantic Indexing (LSI) works. That's not disputed.
But, like any tool, it can be misused and misunderstood.
Understanding LSI's capabilities, intended purpose, and
limitations up front will help determine
if LSI is a proper fit.
Clustering Isn't Ranking
Imagine you have four objects sitting in front of you: a plastic orange, a sandwich, an apple core,
and a rock.
LSI can place these items into buckets based on context that it's been exposed to.
You get to choose how many buckets you want, so to keep our example simple, let's say
we have two. Let's see what LSI does with this: