On Sunday night as many grappled with the long-lasting Tiktok ban, an interesting tweet happened to appear on my timeline. Arman Khon, a software engineer at Tiktok, referenced the academic paper that explains the ground infrastructure behind the Tiktok algorithm (Monolith: Real Time Recommendation System With Collisionless Embedding Table). If you talk to anyone who has had Tiktok, the experience is warping. The continuous stream of relevant, enticing, and novel clips is impressive. Addicting actually. Tiktok was undoubtedly a step-wise change in creating an addictive social network.
While Instagram Reels is like a random dining experience of slop intertwined with an occasional gem, Tiktok is like a menu curated for your pallet.
As such, an exploration of how the algorithm is crafted is a modern day gold mine.
Every algorithm is built to constantly serve users content that is predicted to keep them engaged and on the platform.
The main question is, why is Tiktok far superior to other apps at predicting what you want to see?
Theoretically, each app user is a line item within a social media company’s database. Your behavior, from watch time to likes and swipe speeds are all recorded in your user row. We all collectively know and acknowledge that every moment we spend in the app trains what we will receive in the future. What the Tiktok algorithm proves to be pre eminent at is ensuring your user row, or digital in app finger print, is kept pure and uninterrupted by similar or repeat users.
Within deep learning, the idea of two users blurring the clarity of each is referred to as collisions. Traditionally, collisions are difficult to avoid given the sheer quantity of both users and their ever changing behavior. Your smallest preferences, such as staying on one video 2 seconds longer than another, should all be informing future recommendations you receive.
What other apps appear to struggle with is that when these collisions in data occur, the memory of both cannot be stored- only one. Two unique users compete for one spot within the data set. Thus, one person’s experience is now permanently less personalized. While you might suggest just keeping meticulous logs for each person, this proves to be a problem many social platforms don’t have a meaningful solution for given the scale of current data sets.
How Tiktok Avoids Collisions and Ensures Personalization
To address the problem of collision, Tiktok chose to use a Cuckoo Hash Table design.
Great name for a table I must admit. So what is it?
Whilst other apps allow two users data to “collide” and degrade recommendations, the Cuckoo design does something novel within large data sets:
If a new user entry intends to occupy a memory slot in the database that is currently occupied, the table design forces the older user entry to leave and find a new memory slot. Thus, the database can avoid collisions.
In Tiktoks case, it is vital to avoid collisions, since the tastes and trends of social media can change within the hour.
As a new user is captured, collision-free insertions into the database ensures a cleanliness of the data, which is valuable because emerging/trending items need to be distinctly represented for immediate personalization.
As a side note, I believe cleanliness is a high-signal metaphor for what goes on with these platforms. Visiting Tiktok is a bit like entering a house where everything is neatly put together, and you feel as if you are on a tour. Visiting Instagram reels is more like entering a house where a toddler is throwing a tantrum in the next room over while you try to have a conversation. It’s nice for a little while, but you’d prefer to leave.
With Tiktok, you truly do feel as if the videos you are presented with are crafted just for you. With most other platforms, it feels more like you are thrown in a bucket of similar people. They miss your true nature and interests, and as such lose your attention.
Love this - along the lines of: https://open.substack.com/pub/livenomadiq/p/the-paradox-of-digital-connection?r=c0tm6&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false