19 March 2013

Mergeable accumulation of the running min, mean, max, and variance

Boost Accumulators are pretty slick, but they do have the occasional shortcoming. To my knowledge, one cannot merge data from independent accumulator sets. Merging is a nice operation to have when you want to compute statistics in parallel and then roll up the results into a single summary.

In his article Accurately computing running variance, John D. Cook presents a small class for computing a running mean and variance using an algorithm with favorable numerical properties reported by Knuth in TAOCP. Departing from Cook's sample, my rewrite below includes

  • templating on the floating point type,
  • permitting accumulating multiple statistics simultaneously without requiring multiple counters,
  • additionally tracking the minimum and maximum while holding storage overhead constant relative to Cook's version,
  • providing sane (i.e. NaN) behavior when no data has been processed,
  • permitting merging information from multiple instances,
  • permitting clearing an instance for re-use,
  • assertions to catch when one screws up,
  • and adding Doxygen-based documentation.

Please let me know if you find this useful. If anyone is interested, I will also post the associated regression tests.

No comments:

Subscribe Subscribe to The Return of Agent Zlerich