Maxim Fedorov

Software Engineer at WhatsApp

Maxim Fedorov is a software engineer at WhatsApp, the largest messaging app. Maxim’s work is focused on performance and scalability of the server side.

Before WhatsApp, Maxim has been developing low-latency TCP/IP applications at NetAlliance (Sydney, Australia), designing Kaspersky Enterprise Security Endpoint (Moscow, Russia), improving Parallels Virtual Automation, called Odin now, at Parallels (former SWsoft), and developing network security software before.

Maxim earned a Master’s Degree in Computer Science.

Past conferences

Maxim Fedorov
Code BEAM SF 2019
28 Feb 2019
13.50 - 14.35

Mid-air airplane repair: troubleshooting at WhatsApp

Simple, reliable messaging. It takes a lot to support this statement. For 10 years WhatsApp demonstrated unprecedented reliability and availability, serving over 1.5B users. There is absolutely no way to reproduce interactions between all of them, within the cluster spanning over 10,000 nodes and multiple datacenters. Investigations must be done on a live system without disturbing connected users. If there are repairs needed, it has to be done on the fly.

This talk will guide through debugging and troubleshooting techniques used at WhatsApp. Maxim will share a few case studies, explain monitoring, introspection, performance analysis, and tools.

Some knowledge of Erlang and C is necessary.


Share processes, best practices, tools and war stories about 10 years of reliable messaging service.


Software developers, DevOps, Site Reliability Engineers, System Administrators and everyone else interested in troubleshooting live production system.

Maxim Fedorov
Code Mesh LDN 2018
09 Nov 2018
15.25 - 16.10

Scaling Erlang cluster to 10,000 nodes

Growing user population beyond 1.5B does not leave a chance to keep server footprint as small as it used to be. Adding new capabilities requires more and more processing power. When it gets impossible to keep everything on just ten servers, we have to scale the cluster to a hundred. When a hundred gets too tight, we expand it to 1,000. What’s next? 10,000? And how is it possible, considering current scalability limits of a single Erlang cluster?

This talk will guide you along the way we took to improve Erlang scalability, remove bottlenecks and increase the efficiency of our Erlang-based applications.


Demonstrate an example of live Erlang cluster being scaled from just a few nodes to 10,000 machines with no service interruption.


Scalability engineers, people interested in optimising Erlang for large-scale server applications.


Articles: 1

How to serve 1.5 billion active users at the same time - scaling Erlang cluster to 10,000 nodes

Article by Maxim Fedorov

A growing user population beyond 1.7B, whilst simultaneously adding new capabilities, does not leave much chance to keep the server footprint as small as it used to be.


Videos: 2