I think the most important thing we need is robust handling of benchmark results and backfills.
The whole idea of storing a hash to denote the benchmark with the result.
Deliverables here can be:
Quick backfilling/correction of a single benchmark
Allow certain members to run a single benchmark across all versions with "git bisect" like filling of data (so Koichi/Nobu can test stuff)