Do you have a Django app built on Postgres? Want to know how to scale it to 10 million users using AWS Aurora? Luke Demi, is an expert software engineer who helped scale Clubhouse’s backend infrastructure from 10 thousand users to 10 million users in 6 months. In this video, he shares his experience migrating Clubhouse from Heroku to AWS and Aurora Postgres. Prior to working at Clubhouse Luke was a staff software engineer at Doordash and Coinbase, where he also worked on very hard scaling problems.
Clubhouse’s backend is a Django application built on Postgres. They started their infrastructure on Heroku but faced scaling issues and had to rapidly transition to AWS and Aurora Postgres. In this interview Luke goes into detail about the challenges that the Clubhouse team faced dealing with exponential growth.
Check out the video for the complete interview, but here are a summary of some of the highlights:
Background and Initial Challenges
- Overview of Luke Demi’s experience with scaling at Clubhouse, DoorDash, and Coinbase.
- Clubhouse started on Heroku, but Heroku’s limitations (frequent micro outages and insufficient database scaling) prompted the need to move to AWS.
Transition from Heroku to AWS
- Heroku’s limitations: frequent downtimes during deployments, inadequate database scaling, and lack of investment from Salesforce.
- Decision to transition to AWS due to the need for better scalability and reliability.
Challenges with Postgres on Heroku
- Postgres resource limitations: CPU saturation, memory requirements, connection limits, and inefficiencies with large SQL queries using the IN operator.
- Issues with real-time feed and presence table queries in Clubhouse’s architecture.
Migration Process
- Initial steps: Migrating compute to AWS first using Cloudflare tools for phased traffic migration.
- Database migration approach: Chose to take downtime, then dump and restore method instead of zero-downtime with dual writes due to time constraints.
- Successful migration to AWS with minimal issues at the time but some edge cases emerged later.
Post-Migration Insights
- Performance insights in AWS Aurora revealed gaps in visibility for certain queries.
- Challenges with large SQL queries using the IN operator in Postgres and subsequent issues with performance monitoring.
- Use of PG bouncer and AWS RDS Proxy to manage Postgres connections and mitigate load bursts.
Scaling Challenges Post-Migration
- Exponential user growth and the inability of their 15 read replicas to handle increasing load.
- Constant query optimization to manage resource consumption and prevent hitting database limits.
- Implementation of caching strategies (both centralized and per-instance) to reduce query load.
Lessons Learned
- Importance of planning for exponential growth and the limitations of traditional relational databases like Postgres in handling massive scale.
- Benefits of DynamoDB for predictable and scalable performance, though it requires a different mindset for developers.
- Ongoing need for query optimization, caching, and leveraging advanced database features to manage scale.
Recommendations for Future Migrations
- Consider DynamoDB or similar scalable solutions from the start if massive scale is anticipated.
- Ensure robust monitoring and logging to detect and resolve performance issues quickly.
- Collaborate closely with experts and leverage managed services to handle complex database challenges.
Conclusion
- Reflection on the rapid growth and scaling journey of Clubhouse, the technical challenges faced, and the strategies employed to overcome them.
- Final thoughts on the importance of adaptability, continuous learning, and leveraging modern database technologies for scalable application development.
If you need to scale your Django + Postgres application like Luke did for Clubhouse, StepChange is ready to assist. We specialize in designing, migrating and optimizing scalable data architectures that balance performance, efficiency and flexibility. Contact us to book a free consultation to learn how we can help you build a robust, cost-effective Postgres solution.