Why can I not see the source for the comparison test, we got to add that…
I have … questions… about the actual benchmark
User.where(name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
.where(Sequel.ilike(:email, 'foobar00%@email.com')).to_a
This is kind of weak, it is not doing anything with the actual data, so if you defer materialize you automatically get a huge speed boost. We should probably always generate an actual string from these benchmarks to ensure that we materialize stuff, winning on deferred materialization is kind of winning by cheating. Yes, it happens sometimes in the real world but it is a bit of a crutch, why are you even selecting data you are not going to use?
Why do I have to work so hard to find the comparison benchmark?
This actual feature is enormous @bmarkons you have made an excellent improvement to rubybench let’s just nut out all the rough edges!
I am super confused at how Active Record is faster
I have run these two locally against postgres_scope_where to make myself sure and AR is indeed faster than Sequel here. Maybe doing some actual work with fetched records will show us different results. I will change both benches to construct string from data.
Hmm, adding sequel_pg doesn’t affect performance noticeably. For example Sequel.like:
Iterations per second result is the same : ~2500 ips
Objects: 148
I setup Gemfile with gem 'sequel_pg', :require => 'sequel' as noted in sequel_pg README.
I am not sure if I need to change something in benchmark? But I guess not.
Try materializing a bunch of rows to see what kind of impact it makes, especially dates. Eg: try selecting 10,000 dates from a table and then making a giant string out of it, with or without the gem.
Excellent, I guess this highlights a few problems we have that I would like you to address:
We should always be generating strings in our “selection” perf tests. We should fix all our existing specs to do so. I don’t care if we already have data for specs, we can backfill the corrections.
We should add a spec that selects a large amount of columns from a table and generates a giant string.
We should either always use sequel_pg OR optionally use it. I think for simplicity sake we should always use it, cause that is what people are using in production.
We got to ensure parity between our AR and Sequel specs, otherwise we can not compare them, fixing up that ILIKE is urgent.
I think it makes sense to generate strings in all benchmarks, and make sure the output is the same for both (or at least have the differences not be material). It may also be worthwhile to check that the SQL used is the same or comparable to make sure you are comparing ORM differences instead of query differences, unless the ORMs have different approaches to handling similar cases (such as eager loading of associations).
Adding a spec that selects a large amount of columns makes sense. You also may want to use different column types in the spec, or separate specs for large amounts of columns, one for each type.
Always using sequel_pg is probably best, since the Sequel postgres adapter picks it up automatically. But if you want to benchmark both with and without sequel_pg, that’s fine too.
Regardless of whether that’s the right thing to do or not, it is the most common usage pattern for Rails users, so it’s going to be the usage pattern we optimize for. I like @jeremyevans idea of materializing a subset of fields
We do not have a “super optimized” AR going yet, but I would like to focus on figuring out what we have messed up on our test runner that is messing up results first.