@sgrif was not using the sequel-pg gem and I think you have correctly opted just to use your c implementation as opposed to the pg gem coder implementation that is slower. Everyone who uses sequel really should use sequel-pg. I do not think this is cheating.
sequel is not defer materializing so you pay the cast cost upfront
Is there any way to configure sequel to defer materialize cause as it stands it can introduce an enormous cost for people who migrated sequel.
For context this is the corrected bench:
Raw blows both sequel and activerecord out of the park being 5x faster due to defer materialization. But once you select all the columns sequel and raw are mighty close.
I know it is a battle, choosing defer materialization just to work around users writing inefficient db access code, but over selection is an epidemic in the Rails world, it is not going away.
No. Sequel does not defer typecasting. Typecasting happens at the dataset-retrieval level, not the model level.
What Sequel offers instead is the lazy_attributes plugin, which does not select the column during the query, but runs a new query on demand if the column value is actually needed (including handling of N+1 issues). This is superior in most cases, though it does require the plugin be specifically used.
It is theoretically possible to delay typecasting in adapters where the adapter does the typecasting instead of the driver (postgres when sequel_pg is not used). You could remove all of the type conversion procs, and then have the model column getter methods use the same procs for the conversion. Sequel doesnāt have an implementation for that, but itās probably not hard. However, itās not something I would want to support.
Basically, ActiveRecord is optimized for the case where the user is using inefficient queries, and Sequel is optimized for the case where the user is using efficient queries. Based on the average user of each, you could say that both are making the correct choice.
Given that⦠we MUST ensure we are super clear with our benchmarks when we are over-selecting. I definitely want to keep benchmarks around for over-selection since it is a Rails epidemic.
I think every bench that āover-selectsā should end with the text āover selectā or something. @bmarkons
One issue with this line of thinking is that it makes it quite brutal to migrate from AR to Sequel, cause you will not be properly winning till you name every column you are selecting. It also gets particularly brutal with complex joins involving 5 tables.
I get the complexity here though. Nothing that can easily be resolved.
There are certainly cases where ActiveRecord can be faster than Sequel, and selecting columns you arenāt using is one of them. I think itās valuable for people to know that, so I would certainly recommend publishing the benchmark that shows ActiveRecord is faster than Sequel in that area, especially along with a comparison in the same graph of only selecting columns being used and/or using all columns selected.
As a general rule, I think you should only optimize application code when you need to. If something is 3x slower and not a bottleneck, no big deal. If it becomes a bottleneck, and you try to optimize, then that the fact that the optimized version is much more optimized leads to much better overall performance. Itās certainly the case that converting from ActiveRecord to Sequel can make things that werenāt bottlenecks before into bottlenecks, but addressing those specific issues will lead to much better performance. In other words, I prefer a low floor and a high ceiling to a high floor and a low ceiling.
As to why sequel_pg is faster than pg itself, Iām not sure. I wrote sequel_pgās date parsing implementation before pg did typecasting at all, and Iāve never looked at pgās implementation.
@sam Are you sure ruby-pg decodes date/timestamp values in C? From a brief look at their repository, they appear to be using ruby-pg/text_decoder.rb at master Ā· ged/ruby-pg Ā· GitHub, which does everything in ruby using regexps to extract values (no wonder it is slower).
Sure enough:
$ irb -r pg
irb(main):001:0> conn = PG.connect(user: 'sequel_test')
=> #<PG::Connection:0x0011e94f96e1d8>
irb(main):002:0> conn.type_map_for_results = PG::BasicTypeMapForResults.new conn; nil
=> nil
irb(main):003:0> conn.exec('select current_date').values
=> [[#<Date: 2017-08-25 ((2457991j,0s,0n),+0s,2299161j)>]]
irb(main):004:0> PG::TextDecoder::Date.instance_method(:decode).source_location
=> ["/usr/local/lib/ruby/gems/2.4/gems/pg-0.20.0/lib/pg/text_decoder.rb", 11]
irb(main):005:0> PG::TextDecoder::Date.send(:remove_method, :decode)
=> PG::TextDecoder::Date
irb(main):006:0> conn.exec('select current_date').values
RuntimeError: no decoder function defined
from (irb):6:in `decode'
from (irb):6:in `values'
from (irb):6
from /usr/local/bin/irb:11:in `<main>'
Thanks for pointing out! Iāll fix this benchmark.
@sam when you refer to over-selecting you mean selecting columns you donāt actually use?
I will address these benchmarks and suffix them with āover_selectā.
Just a tiny update here, recently Lars added native date/time typecasting into the pg gem, so it looks like we need to look again at these raw results using the PG gem. AR likes using timestamp without timezone which is the easiest date format to deal with.