cluster 2.0
traditional NLP methods rely on "keywords" or strict semantic conditions to create topic clusters. This approach is outdated and flawed when analyzing online conversations for 2 key reasons:
for every post, there are ~5 comments that don't include the keyword mentioned in the original post. traditional keyword methods only capture 20% of the story, leaving us blind to 80% of the true story. siftree invented a density-based clustering algorithm that mimics gravity, capturing all comments that surround the subjects of discussion, even if they don't include the word itself.
relying on strict semantics also ignores surrounding context and creates noisy results. if people are agreeing with somebody's opinion, traditional clustering technqiues would create a cluster filled with "same", "me too", "i agree" etc. but what's the context? since the original opinion was only posted once, and represents a small % of the data, strict semantic models exclude it, inhibiting the ability to form an actionable insight.
online conversations are filled with these instances. we're left with rapidly evolving, high-dimensional data, that traditional methods and llms aren't optimized to handle. it's a very niche, but difficult problem.
siftree's innovative clustering models solve both problems, moving beyond keywords and semantics, creating the most accurate and scientific representation of online conversations. this approach allows our models to sift through slang, jargon, and cluster conversations based on ground truth, outperforming llms from openai, google, anthropic, and more.