Project Description



This is a baby version of Twitter implemented with various technologies including Sinatra, Ruby, Node.js, Redis, PostgreSQL, GraphQL, Heroku and so on. The main goal is to practice scaling web applications to handle huge load using Service-Oriented Architecture.

Our final product includes three different services: load balancer, read service and write service. Our load testing results (using Loader.io) are the best among six groups in the class.

Screenshots

timeline

Technologies & Architecture

High-Level Architecture

Our application has four parts:
  • Custom Load Balancer, which redirects requests by HTTP method (GET / POST)
  • Read Services, which serves GET requests by getting the HTML cache from Redis
  • Write Service, which serves both GET and POST requests, interacts with Postgres and Redis
  • Data Storage with Redis and Postgres. Redis caches the whole HTML for a page
When a GET request hits our load balancer, it redirects the request to either read services or write service based on a hash function. The hash function hashes 80% of the GET requests to read services so write service can save resources for other POST requests. Then read service will check if either one of the Redis instances has the full page cache. If cache is found, it returns the cache, else it redirects the request to write service, so that write service can query Postgres, generate the page, put it into both Redis instances and respond to the request. When the same url is visited the next time, read services can directly pull the cache from Redis without any database query.

If a POST request hits our load balancer, it gets sent to the write service directly. Write service will then update Postgres and update the Redis page cache with new data, so read services can fetch the page cache directly without asking data from Postgres. For example, if a user posts a new tweet, write service will save the tweet to database, generate the new timeline page cache for the user and his followers and save them into Redis.

An interesting trick we used when caching is that instead of caching the HTML text directly, we zip it and then save it to Redis. When we pull the cache, we unzip it to revert it back to normal HTML. This drastically decreased our cache accessing time because the cache item size is way smaller.

Diagram

Technologies

We used many different technologies in our project. The write service and read services are built with Ruby and Sinatra. The load balancer is written in Node.js and Express.js framework. Redis is utilized as in-memory cache, while PostgreSQL being our persistent data storage. Our API server supports GraphQL queries and is coded in Ruby with Sinatra. The whole application stack is deployed to Heroku.

Load Test Results (with Loader.io)

  • / (global timeline): 40k+ successes with 0~2500 clients in 1 min (maintain client load)
  • /user/testuser (user profile page): 60k+ successes with 0~2500 clients in 1 min (maintain client load)
  • /user/testuser/tweet (random tweet posting): 500+ successes with 500 clients in 1 min (client per test)