How can active record be faster?


#1

Looking at: RubyBench - Long Running Ruby Benchmark

I am super confused at how Active Record is faster @bmarkons

Some questions I have:

  1. Are we using sequel pg? GitHub - jeremyevans/sequel_pg: Faster SELECTs when using Sequel with pg

  2. Why can I not see the source for the comparison test, we got to add that…

  3. I have … questions… about the actual benchmark

  User.where(name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
      .where(Sequel.ilike(:email, 'foobar00%@email.com')).to_a

This is kind of weak, it is not doing anything with the actual data, so if you defer materialize you automatically get a huge speed boost. We should probably always generate an actual string from these benchmarks to ensure that we materialize stuff, winning on deferred materialization is kind of winning by cheating. Yes, it happens sometimes in the real world but it is a bit of a crutch, why are you even selecting data you are not going to use?

  1. Why do I have to work so hard to find the comparison benchmark?

This actual feature is enormous @bmarkons you have made an excellent improvement to rubybench :hugs: let’s just nut out all the rough edges!


#2

I am super confused at how Active Record is faster

I have run these two locally against postgres_scope_where to make myself sure and AR is indeed faster than Sequel here. Maybe doing some actual work with fetched records will show us different results. I will change both benches to construct string from data.

  1. Are we using sequel pg?

No, all the Sequel benchmarks are being run with GitHub - jeremyevans/sequel: Sequel: The Database Toolkit for Ruby.
Do you want the same setup for GitHub - jeremyevans/sequel_pg: Faster SELECTs when using Sequel with pg ?

  1. Why can I not see the source for the comparison test, we got to add that…

Will add it :thumbsup:

Why do I have to work so hard to find the comparison benchmark?

Sure, that needs to be improved. I will work on it :thumbsup:

Thanks for feedback @sam

:slight_smile: :tada:


#3

I asked Jeremy on Twitter and got:

So @bmarkons please have a look at this, also, I want you to test on local:

  • Perf improvement of sequel pg

  • Fixing the sequel test so it uses like and not ilike, our tests need to be consistent

  • Generating a big string so nobody gains from defer materialization.

  • Various bits of Jeremy’s feedback

Can you have a shot on local and report back here?


#4

Now that makes sense. Sure @sam, I’ll post the results here :thumbsup:


#5

Here are results i got locally:

Sequel.ilike : ~900 ips, 163 objects
Sequel.like : ~2500 ips, 164 objects
Sequel.lit : ~2500 ips, 153 objects


#6

Nice, so it already beats AR big time… we better fix our specs

What kind of diff does adding sequel_pg gem make?


#7

Hmm, adding sequel_pg doesn’t affect performance noticeably. For example Sequel.like:
Iterations per second result is the same : ~2500 ips
Objects: 148

I setup Gemfile with gem 'sequel_pg', :require => 'sequel' as noted in sequel_pg README.
I am not sure if I need to change something in benchmark? But I guess not.


#8

Try materializing a bunch of rows to see what kind of impact it makes, especially dates. Eg: try selecting 10,000 dates from a table and then making a giant string out of it, with or without the gem.


#9

I compared it against scope_all benchmark and there is an obvious speed up:

with sequel_pg : 620 ips, 4026 objects
without : 420 ips, 5032 objects


#10

After adding two more DateTime columns in model and creating string from 10,000 records:

with sequel_pg : 680 ips, 219 objects
without : 530 ips, 777 objects

Following spec I used:

require 'bundler/setup'
require 'sequel'
require_relative 'support/benchmark_sequel'

DB = Sequel.connect(ENV.fetch('DATABASE_URL'))

DB.create_table!(:users) do
  primary_key :id
  String :name, size: 255
  String :email, size: 255
  DateTime :birthday
  DateTime :graduation_day
  DateTime :created_at, null: true
  DateTime :updated_at, null: true
end

DB.add_index :users, :email, unique: true

class User < Sequel::Model
  self.raise_on_save_failure = true
  self.set_allowed_columns :name, :email, :birthday, :graduation_day, :created_at, :updated_at
end

10000.times do |i|
  User.create({
    name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
    email: "foobar#{"%03d" % i}@email.com",
    birthday: Time.at(i),
    graduation_day: Time.at(i)
  })
end

Benchmark.sequel("sequel/#{db_adapter}_scope_where", time: 5) do
  str = ""
  User
    .where(name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
    .where(Sequel.lit('email LIKE ?', 'foobar00%@email.com')).each do |user|
      str << "name: #{user.name} email: #{user.email} birthday: #{user.birthday} graduation_day: #{user.graduation_day}\n"
    end
end

#11

Excellent, I guess this highlights a few problems we have that I would like you to address:

  • We should always be generating strings in our “selection” perf tests. We should fix all our existing specs to do so. I don’t care if we already have data for specs, we can backfill the corrections.

  • We should add a spec that selects a large amount of columns from a table and generates a giant string.

  • We should either always use sequel_pg OR optionally use it. I think for simplicity sake we should always use it, cause that is what people are using in production.

  • We got to ensure parity between our AR and Sequel specs, otherwise we can not compare them, fixing up that ILIKE is urgent.

Thoughts?

cc @jeremyevans


#12

All right, I will fix the specs first and configure sequel suite to run only with sequel_pg (since that should be miniature change).

Later on I will configure sequel suite to run with and without sequel_pg. That should be similar to running with and without prepared statements.


#13

I think it makes sense to generate strings in all benchmarks, and make sure the output is the same for both (or at least have the differences not be material). It may also be worthwhile to check that the SQL used is the same or comparable to make sure you are comparing ORM differences instead of query differences, unless the ORMs have different approaches to handling similar cases (such as eager loading of associations).

Adding a spec that selects a large amount of columns makes sense. You also may want to use different column types in the spec, or separate specs for large amounts of columns, one for each type.

Always using sequel_pg is probably best, since the Sequel postgres adapter picks it up automatically. But if you want to benchmark both with and without sequel_pg, that’s fine too.


#14

Regardless of whether that’s the right thing to do or not, it is the most common usage pattern for Rails users, so it’s going to be the usage pattern we optimize for. I like @jeremyevans idea of materializing a subset of fields


#15

Keep in mind our actual spec here has no select clause, so this is already captured

https://rubybench.org/rails/rails/commits?result_type=active_record/postgres_discourse&display_count=2000

We do not have a “super optimized” AR going yet, but I would like to focus on figuring out what we have messed up on our test runner that is messing up results first.