Each Facebook Chat user now needs to be notified whenever one of his/her friends (a) takes an action such as sending a chat message or loads a Facebook page (if tracking idleness via a last-active timestamp) or (b) transitions between idleness states (if representing idleness as a state machine with states like "idle-for-1-minute", "idle-for-2-minutes", "idle-for-5-minutes", "idle-for-10-minutes", etc.).Note that approach (a) changes the sending a chat message / loading a Facebook page from a one-to-one communication into a multicast to all online friends, while approach (b) ensures that users who are neither chatting nor browsing Facebook are nonetheless generating server load.The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop.We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page.From Facebook to Google, Twitter and Instagram - free does not mean cheap.
is jam packed with all the features you need to provide winning customer support - it is lightning fast, reliable and scalable.
Despite those advantages, using Erlang for a component of Facebook Chat had a downside: that component needed to communicate with the other parts of the system.
Both subsystems are clustered and partitioned for reliability and efficient failover. In short, because the problem domain fits Erlang like a glove.
Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems.
Unlike a three-guys-in-a-garage startup, we don't have the luxury of scaling out infrastructure to keep pace with user growth; when your feature's userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start.