Processing Firefox Crash Reports With Python¶
by Laura Thomson
Web tools engineering manager
author of two books:
- PHP and MySQL Web Development
- The Surrealists
Done about 100 talks!
Mozilla is hiring like crazy
Overview¶
- The basics
- The numbers
- Work process and tools
The Basics¶
Socorro crash information collector thingee
Lots of companies use it to track this data:
- Steam (game stuff)
- Other things
How crashy is the browser?¶
- Mozilla Crash report - please use it!
- Will email you if you have malware they detect
- Generates https://crash-stats.mozilla.com/products/Firefox
- Mozilla needs your data to make Firefox better.
Basic Architecure¶
- Database is PostGres
- HBase for map-reduce, she wants to replace it with something else
- Lots of components powered by Python
- Front-end is PHP but will be converted to Django in 2012
Lifetime to a crash¶
Browser crashes
Sends data to Mozilla in a big binary dump with a JSON header
Mozilla processes the header and tries to generate a signature of the crash
- They need more than just the function that created the crash
- Doesn’t cover all cases
- Uses a regex to glean out other things from the binary crash dump
Back end processing¶
Large number of cron jobs, e.g.:
- Calculate aggregates: Top Crashers (Farmville if you want to know)
- Process incoming builds from ftp server
- Match known crashes to bugzilla bugs
- Duplicate detection
- Match up pairs of dumps (OOPP, content crashes, etc)
- Generates extracts (CSV) for engineers to analyze
Middleware¶
- Moving all data access to be through REST API (by end of year)
- (Still some queries in webapp)
- Enable other front ends to data and us to rewrite webapp using Django in 2012
- Upcoming (2011 or 2012) each component will have it’s own API
Webapp¶
- Hard parts: How to vizualize some of this data
- Ex: Nightly builds, moving to reporting in build time, not clock time
- Code crufty (old KohanaPHP)
Implementation Details¶
Python 2.6 mostly (PHP is the exception)
Post Gres 9.1
memcache for the webapp
Thrift for HBase access
- HBase is meant to work with Java
- Could do it in Clojure/Scala but finding resources would be hard
- Thought about Jython then backed off
- Considering alternatives
100 users
100 Terabytes of data
Some Numbers¶
- At peak 2300 crashes per minute
- 2.5 million per day
- Median crash size 150K, max size 20MB (reject bigger)
- ~110TB stored in HDFS (3x replication, ~40TB of HBase data)
What can they do?¶
- Does a version of FF crash more than others?
- Analyze differences between versions of Flash
- Detect duplicate crashes
- Detect explosive crashes
- Find “frankenstalls” that can happen on Windows
- Email victims of malware
Implementation Scale¶
> 115 boxes (not cloud cause that won’t cut it)
Now 8 devs + sysadmins + QA + Hadoop ops/analysts
- Hiring: https://whitespacejobs.org
Deploy approximatelt weekly but could do continuous if they need
Development Process¶
- Fork
- Hard to install (must use VM)
- Pull request with bugfix/feature
- Code review
- Jenkins polls github master, picks up changes
- Jenkins runs tests, builds a package
- Package picked up and moved to dev
- Wanted changes merged to release branch
- Jenkins builds release branch, manual push to stage
- QA runs acceptance on stage
- TODO missing
- TODO missing
Absolutely Critical!¶
Build all the machinery for continuous deployment even if you don’t want to deploy continuously
- You don’t want to install HBase
Upcoming¶
- ElasticSearch implemented for better search
- More analytics; automatic detection of explosive crashes, malware, etc
- Better queueing
- Grand Unified Configuration System