How can active record be faster?

sam · June 20, 2017, 12:36am

Looking at: RubyBench - Long Running Ruby Benchmark

I am super confused at how Active Record is faster @bmarkons

Some questions I have:

Are we using sequel pg? GitHub - jeremyevans/sequel_pg: Faster SELECTs when using Sequel with pg
Why can I not see the source for the comparison test, we got to add that…
I have … questions… about the actual benchmark

  User.where(name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
      .where(Sequel.ilike(:email, 'foobar00%@email.com')).to_a

This is kind of weak, it is not doing anything with the actual data, so if you defer materialize you automatically get a huge speed boost. We should probably always generate an actual string from these benchmarks to ensure that we materialize stuff, winning on deferred materialization is kind of winning by cheating. Yes, it happens sometimes in the real world but it is a bit of a crutch, why are you even selecting data you are not going to use?

Why do I have to work so hard to find the comparison benchmark?

This actual feature is enormous @bmarkons you have made an excellent improvement to rubybench let’s just nut out all the rough edges!

bmarkons · June 20, 2017, 8:33am

I am super confused at how Active Record is faster

I have run these two locally against postgres_scope_where to make myself sure and AR is indeed faster than Sequel here. Maybe doing some actual work with fetched records will show us different results. I will change both benches to construct string from data.

Are we using sequel pg?

No, all the Sequel benchmarks are being run with GitHub - jeremyevans/sequel: Sequel: The Database Toolkit for Ruby.
Do you want the same setup for GitHub - jeremyevans/sequel_pg: Faster SELECTs when using Sequel with pg ?

Why can I not see the source for the comparison test, we got to add that…

Will add it

Why do I have to work so hard to find the comparison benchmark?

Sure, that needs to be improved. I will work on it

Thanks for feedback @sam

sam · June 20, 2017, 12:19pm

I asked Jeremy on Twitter and got:

So @bmarkons please have a look at this, also, I want you to test on local:

Perf improvement of sequel pg
Fixing the sequel test so it uses like and not ilike, our tests need to be consistent
Generating a big string so nobody gains from defer materialization.
Various bits of Jeremy’s feedback

Can you have a shot on local and report back here?

bmarkons · June 20, 2017, 12:34pm

Now that makes sense. Sure @sam, I’ll post the results here

bmarkons · June 20, 2017, 12:57pm

Here are results i got locally:

Sequel.ilike : ~900 ips, 163 objects
Sequel.like : ~2500 ips, 164 objects
Sequel.lit : ~2500 ips, 153 objects

sam · June 20, 2017, 1:00pm

Nice, so it already beats AR big time… we better fix our specs

What kind of diff does adding sequel_pg gem make?

bmarkons · June 20, 2017, 3:04pm

Hmm, adding sequel_pg doesn’t affect performance noticeably. For example Sequel.like:
Iterations per second result is the same : ~2500 ips
Objects: 148

I setup Gemfile with gem 'sequel_pg', :require => 'sequel' as noted in sequel_pg README.
I am not sure if I need to change something in benchmark? But I guess not.

sam · June 20, 2017, 3:27pm

Try materializing a bunch of rows to see what kind of impact it makes, especially dates. Eg: try selecting 10,000 dates from a table and then making a giant string out of it, with or without the gem.

bmarkons · June 21, 2017, 9:35am

I compared it against scope_all benchmark and there is an obvious speed up:

with sequel_pg : 620 ips, 4026 objects
without : 420 ips, 5032 objects

bmarkons · June 21, 2017, 10:04am

After adding two more DateTime columns in model and creating string from 10,000 records:

with sequel_pg : 680 ips, 219 objects
without : 530 ips, 777 objects

Following spec I used:

require 'bundler/setup'
require 'sequel'
require_relative 'support/benchmark_sequel'

DB = Sequel.connect(ENV.fetch('DATABASE_URL'))

DB.create_table!(:users) do
  primary_key :id
  String :name, size: 255
  String :email, size: 255
  DateTime :birthday
  DateTime :graduation_day
  DateTime :created_at, null: true
  DateTime :updated_at, null: true
end

DB.add_index :users, :email, unique: true

class User < Sequel::Model
  self.raise_on_save_failure = true
  self.set_allowed_columns :name, :email, :birthday, :graduation_day, :created_at, :updated_at
end

10000.times do |i|
  User.create({
    name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
    email: "foobar#{"%03d" % i}@email.com",
    birthday: Time.at(i),
    graduation_day: Time.at(i)
  })
end

Benchmark.sequel("sequel/#{db_adapter}_scope_where", time: 5) do
  str = ""
  User
    .where(name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
    .where(Sequel.lit('email LIKE ?', 'foobar00%@email.com')).each do |user|
      str << "name: #{user.name} email: #{user.email} birthday: #{user.birthday} graduation_day: #{user.graduation_day}\n"
    end
end

sam · June 21, 2017, 2:02pm

Excellent, I guess this highlights a few problems we have that I would like you to address:

We should always be generating strings in our “selection” perf tests. We should fix all our existing specs to do so. I don’t care if we already have data for specs, we can backfill the corrections.
We should add a spec that selects a large amount of columns from a table and generates a giant string.
We should either always use sequel_pg OR optionally use it. I think for simplicity sake we should always use it, cause that is what people are using in production.
We got to ensure parity between our AR and Sequel specs, otherwise we can not compare them, fixing up that ILIKE is urgent.

Thoughts?

cc @jeremyevans

bmarkons · June 21, 2017, 2:31pm

All right, I will fix the specs first and configure sequel suite to run only with sequel_pg (since that should be miniature change).

Later on I will configure sequel suite to run with and without sequel_pg. That should be similar to running with and without prepared statements.

jeremyevans · June 21, 2017, 3:19pm

I think it makes sense to generate strings in all benchmarks, and make sure the output is the same for both (or at least have the differences not be material). It may also be worthwhile to check that the SQL used is the same or comparable to make sure you are comparing ORM differences instead of query differences, unless the ORMs have different approaches to handling similar cases (such as eager loading of associations).

Adding a spec that selects a large amount of columns makes sense. You also may want to use different column types in the spec, or separate specs for large amounts of columns, one for each type.

Always using sequel_pg is probably best, since the Sequel postgres adapter picks it up automatically. But if you want to benchmark both with and without sequel_pg, that’s fine too.

sgrif · July 24, 2017, 5:52pm

Regardless of whether that’s the right thing to do or not, it is the most common usage pattern for Rails users, so it’s going to be the usage pattern we optimize for. I like @jeremyevans idea of materializing a subset of fields

sam · July 24, 2017, 6:15pm

Keep in mind our actual spec here has no select clause, so this is already captured

https://rubybench.org/rails/rails/commits?result_type=active_record/postgres_discourse&display_count=2000

We do not have a “super optimized” AR going yet, but I would like to focus on figuring out what we have messed up on our test runner that is messing up results first.

Topic		Replies	Views
Esay way to compare Meta	1	1660	September 13, 2017
Rewriting the scope_all benchmark in Sequel and Raw	27	5195	June 19, 2017
More over-select benchmarks in suite	3	1585	September 7, 2017
Can we fill up our benchmarks?	12	2344	September 7, 2017
GSOC 2017 - Long Running Ruby and Rails Benchmarks	27	4511	September 5, 2017

How can active record be faster?

Related topics