System Design Interview — Twitter

Yash Raithatha
3 min readMay 22, 2021

--

Questions:

  1. Is reTweet feature needed ?
  2. Allow media in the tweet or just text ? (If yes then we will need to store media on some distributed storage like AWS S3)
  3. Do we need to shorten URL (to save storage space and better readability on UI ) in case the tweet contains URLs ? OR can a tweet contain long URLs ?
  4. Do we want any kind of analytics (for e.g. trending topics)?

Functional Requirements:

  1. Post A Tweet / Retweet
  2. Follow/Unfollow Other users
  3. HomePage (Timeline)
  4. Analytics (Trending Topics)

Non Functional Requirements:

  1. High Availability
  2. Eventual Consistency (It’s ok if the tweet is visible to followers after some delay)
  3. High Performance (i.e. low latency ) (Note: The system is going to be Read Heavy. So Reads >>>> Writes which suggests that we may need caching and pre-computation )

Calculations:

Scalability:

Monthly Active User = 1 billion

Avg. Tweet/Retweet Per Day By 1 User = 5

Total Tweets in a month = 5*1= 5 billion

Total Tweets in a sec = 5B / (30 * 24 * 60 *60) = X

Storage:

Size of 1 user Tweet: 140 characters (1 UTF-8 character ~ 4 bytes) ~ 560 bytes ~ 1 kb

Size of storage needed in a sec = X * 1kb = Y

Storage needed 10 years down the line = Y* (60*60*24*365*10) kb

Components Diagram:

API Gateway: Takes care of authentication, rate limiting and routing the request to the corresponding service. One of the design patterns in a microservice architecture.

User Service: Takes care of all user related stuff like on boarding and login. Provides user related APIs needed in the system.

Graph Service: Takes care of all followers and followees related info.

Tweet Ingestion Service: Accepts only Tweet Post or Retweet requests, validates it, generates unique UUID for each tweet and publishes the tweet in the kafka cluster. Reason for directly placing it in kafka is to have a low overhead for this service as it’s a write heavy service and will be handling thousands of tweets in a second given the scale of twitter. So we want to service to persist the information in the fastest way possible without having anyloss of data.

Tweet Service: Listens kafka events for new tweets and stores them in Cassandra. Has APIs like returns all tweets of user in a given time frame, return a tweet info given its UUID, etc.

Home Page Service: Listens to kafka cluster for new tweet (or update) events and update the home page results for each user based on follower/followee relationship and stores it in Redis so that we can immediately serve it when the user visits his/her home page. Important Point: When a celebrity tweets, we will need to find all its followers (which can be in millions)and then update its cache for homepage which should be avoided. So we will avoid update user home page when a celebrity (whom the user follows) posts a tweet. Instead we will calculate it on the fly when the person visits home page and get rest of the homepage info from the cache.

Trends Service: Spark Streaming cluster listens kafka events for the new tweets, calculates the most trending topics with respect to time and calls Trends Service APIs to store it in Redis.

Data Model:

Coming Soon !

--

--

Yash Raithatha
Yash Raithatha

Written by Yash Raithatha

Java Techical Specialist, Microservices Expert, AWS Expert, Knowledge Seeker

No responses yet