<

Maxim Fedorov

Performance & scalability engineer at WhatsApp

Maxim Fedorov is a software engineer at WhatsApp, the largest messaging app. Maxim’s work is focused on performance and scalability of the server side.

Before WhatsApp, Maxim has been developing low-latency TCP/IP applications at NetAlliance (Sydney, Australia), designing Kaspersky Enterprise Security Endpoint (Moscow, Russia), improving Parallels Virtual Automation, called Odin now, at Parallels (former SWsoft), and developing network security software before.

Upcoming Activities

Maxim Fedorov
Code Mesh LDN

The art of challenging assumptions

We spent countless hours and sleepless nights bringing and keeping up server side of the most successful messaging service in the world. Looking back, how many choices we'd change? And how to ensure we make the right one next time? "The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil" (Donald Knuth). But why does it happen? Why did we do something we didn't really want? Because we acted on assumptions. This talk will guide through a number of war stories where assumptions were made and acted on. There were regrets and disappointments, and we learned to challenge assumptions the hard way. Now it's time to share what we learnt so far.

OBJECTIVES

  • identify sources of human errors in software development
  • discuss instruments and routines helping to challenge assumptions
  • provide advice for improved decision making process

AUDIENCE

Tech leads, software architects, systems designers and everyone else involved in making technical decisions and facing the consequences.

Past Activities

Maxim Fedorov
Code BEAM SF 2019
28 Feb 2019
13.50 - 14.35

Mid-air airplane repair: troubleshooting at WhatsApp

Simple, reliable messaging. It takes a lot to support this statement. For 10 years WhatsApp demonstrated unprecedented reliability and availability, serving over 1.5B users. There is absolutely no way to reproduce interactions between all of them, within the cluster spanning over 10,000 nodes and multiple datacenters. Investigations must be done on a live system without disturbing connected users. If there are repairs needed, it has to be done on the fly.

This talk will guide through debugging and troubleshooting techniques used at WhatsApp. Maxim will share a few case studies, explain monitoring, introspection, performance analysis, and tools.

Some knowledge of Erlang and C is necessary.

OBJECTIVES

Share processes, best practices, tools and war stories about 10 years of reliable messaging service.

TARGET AUDIENCE

Software developers, DevOps, Site Reliability Engineers, System Administrators and everyone else interested in troubleshooting live production system.

Maxim Fedorov
Code Mesh LDN 2018
09 Nov 2018
15.25 - 16.10

Scaling Erlang cluster to 10,000 nodes

Growing user population beyond 1.5B does not leave a chance to keep server footprint as small as it used to be. Adding new capabilities requires more and more processing power. When it gets impossible to keep everything on just ten servers, we have to scale the cluster to a hundred. When a hundred gets too tight, we expand it to 1,000. What’s next? 10,000? And how is it possible, considering current scalability limits of a single Erlang cluster?

This talk will guide you along the way we took to improve Erlang scalability, remove bottlenecks and increase the efficiency of our Erlang-based applications.

OBJECTIVES

Demonstrate an example of live Erlang cluster being scaled from just a few nodes to 10,000 machines with no service interruption.

TARGET AUDIENCE

Scalability engineers, people interested in optimising Erlang for large-scale server applications.
 

Media

Articles: 1

How to serve 1.5 billion active users at the same time - scaling Erlang cluster to 10,000 nodes

Article by Maxim Fedorov

A growing user population beyond 1.7B, whilst simultaneously adding new capabilities, does not leave much chance to keep the server footprint as small as it used to be.

READ MORE

Videos: 2