What about a spec that pushes AR to the limit?

sam · June 23, 2017, 1:27pm

@bmarkons I do not think we have any proper indicative spec for “real world” edges that you hit with ActiveRecord.

For example to render the home page on Discourse we generate a query that looks like this:

SELECT "topics"."id" AS t0_r0,
       "topics"."title" AS t0_r1,
       "topics"."last_posted_at" AS t0_r2,
       "topics"."created_at" AS t0_r3,
       "topics"."updated_at" AS t0_r4,
       "topics"."views" AS t0_r5,
       "topics"."posts_count" AS t0_r6,
       "topics"."user_id" AS t0_r7,
       "topics"."last_post_user_id" AS t0_r8,
       "topics"."reply_count" AS t0_r9,
       "topics"."featured_user1_id" AS t0_r10,
       "topics"."featured_user2_id" AS t0_r11,
       "topics"."featured_user3_id" AS t0_r12,
       "topics"."avg_time" AS t0_r13,
       "topics"."deleted_at" AS t0_r14,
       "topics"."highest_post_number" AS t0_r15,
       "topics"."image_url" AS t0_r16,
       "topics"."like_count" AS t0_r17,
       "topics"."incoming_link_count" AS t0_r18,
       "topics"."category_id" AS t0_r19,
       "topics"."visible" AS t0_r20,
       "topics"."moderator_posts_count" AS t0_r21,
       "topics"."closed" AS t0_r22,
       "topics"."archived" AS t0_r23,
       "topics"."bumped_at" AS t0_r24,
       "topics"."has_summary" AS t0_r25,
       "topics"."vote_count" AS t0_r26,
       "topics"."archetype" AS t0_r27,
       "topics"."featured_user4_id" AS t0_r28,
       "topics"."notify_moderators_count" AS t0_r29,
       "topics"."spam_count" AS t0_r30,
       "topics"."pinned_at" AS t0_r31,
       "topics"."score" AS t0_r32,
       "topics"."percent_rank" AS t0_r33,
       "topics"."subtype" AS t0_r34,
       "topics"."slug" AS t0_r35,
       "topics"."deleted_by_id" AS t0_r36,
       "topics"."participant_count" AS t0_r37,
       "topics"."word_count" AS t0_r38,
       "topics"."excerpt" AS t0_r39,
       "topics"."pinned_globally" AS t0_r40,
       "topics"."pinned_until" AS t0_r41,
       "topics"."fancy_title" AS t0_r42,
       "topics"."highest_staff_post_number" AS t0_r43,
       "topics"."featured_link" AS t0_r44,
       "categories"."id" AS t1_r0,
       "categories"."name" AS t1_r1,
       "categories"."color" AS t1_r2,
       "categories"."topic_id" AS t1_r3,
       "categories"."topic_count" AS t1_r4,
       "categories"."created_at" AS t1_r5,
       "categories"."updated_at" AS t1_r6,
       "categories"."user_id" AS t1_r7,
       "categories"."topics_year" AS t1_r8,
       "categories"."topics_month" AS t1_r9,
       "categories"."topics_week" AS t1_r10,
       "categories"."slug" AS t1_r11,
       "categories"."description" AS t1_r12,
       "categories"."text_color" AS t1_r13,
       "categories"."read_restricted" AS t1_r14,
       "categories"."auto_close_hours" AS t1_r15,
       "categories"."post_count" AS t1_r16,
       "categories"."latest_post_id" AS t1_r17,
       "categories"."latest_topic_id" AS t1_r18,
       "categories"."position" AS t1_r19,
       "categories"."parent_category_id" AS t1_r20,
       "categories"."posts_year" AS t1_r21,
       "categories"."posts_month" AS t1_r22,
       "categories"."posts_week" AS t1_r23,
       "categories"."email_in" AS t1_r24,
       "categories"."email_in_allow_strangers" AS t1_r25,
       "categories"."topics_day" AS t1_r26,
       "categories"."posts_day" AS t1_r27,
       "categories"."allow_badges" AS t1_r28,
       "categories"."name_lower" AS t1_r29,
       "categories"."auto_close_based_on_last_post" AS t1_r30,
       "categories"."topic_template" AS t1_r31,
       "categories"."suppress_from_homepage" AS t1_r32,
       "categories"."contains_messages" AS t1_r33,
       "categories"."sort_order" AS t1_r34,
       "categories"."sort_ascending" AS t1_r35,
       "categories"."uploaded_logo_id" AS t1_r36,
       "categories"."uploaded_background_id" AS t1_r37,
       "categories"."topic_featured_link_allowed" AS t1_r38,
       "categories"."all_topics_wiki" AS t1_r39,
       "categories"."show_subcategory_list" AS t1_r40,
       "categories"."num_featured_topics" AS t1_r41,
       "categories"."default_view" AS t1_r42,
       "categories"."subcategory_list_style" AS t1_r43,
       "categories"."default_top_period" AS t1_r44
FROM "topics"
LEFT OUTER JOIN "categories" ON "categories"."id" = "topics"."category_id"
LEFT OUTER JOIN topic_users AS tu ON (topics.id = tu.topic_id
                                      AND tu.user_id = 1)
WHERE (topics.archetype <> 'private_message')
  AND (COALESCE(categories.topic_id, 0) <> topics.id)
  AND (topics.deleted_at IS NULL)
  AND (COALESCE(tu.notification_level,1) > 0)
  AND ( NOT EXISTS
         ( SELECT 1
          FROM category_users cu
          WHERE cu.user_id = 1
            AND cu.category_id = topics.category_id
            AND cu.notification_level = 0
            AND cu.category_id <> -1
            AND (tu.notification_level IS NULL
                 OR tu.notification_level < 2) ))
  AND (pinned_globally
       AND pinned_at IS NOT NULL
       AND (topics.pinned_at > tu.cleared_pinned_at
            OR tu.cleared_pinned_at IS NULL))
ORDER BY topics.bumped_at DESC
LIMIT 30;

We have multiple includes, references, joins and so on, we select a boat of columns with tons of conditions.

I feel like we need some specs that deal with this level of arel complexity so we can properly measure how much overhead is there. My flamegraphs show that in cases like this Active Record really shows its dark side.

Arrows point at pg gem. Rest of the time is in “overhead”.

Not sure exactly how the test should look, but we need something.

@system thoughts?

My initial thoughts is just to “redo” this particular query in AR and stage the tables etc.

bmarkons · June 23, 2017, 2:21pm

I have thought about adding spec from real world project to have one with bunch of columns. Thanks for this query, I will just reproduce it to get spec as real world example.

Maybe we should have more specs categorized as real world examples. Having only this one is perfect for start.

Analogous to testing code correctness, real world specs could be seen as integration specs while having isolated and specific ones is good as having unit tests.

bmarkons · June 23, 2017, 3:11pm

@sam can you pls link to ActiveRecord query you posted above?

sam · June 23, 2017, 3:41pm

It is a bit tricky, it is generated using this class…

discourse/topic_query.rb at master · discourse/discourse · GitHub

But you need to dig through the methods to see how it is constructed cause it is composed using multiple methods.

system · July 4, 2017, 12:17am

We could keep one or two complex query in our benchmark but what I think might be more useful is to have an interface that allows people to run custom SQL queries for say the last 5 Rails version. People would be able to use that report to submit to the Rails core team for bugs.

bmarkons · July 4, 2017, 1:13pm

@system

Maybe simply encourage people to submit benchmark for query from their real world projects. We could have these categorized as real world benchmarks.

sgrif · July 24, 2017, 5:49pm

I would love to have this as an available benchmark FWIW. I’ve been doing a lot of work on Arel recently and this was exactly the benchmark I was looking for. (Preferably one that does this over a shit load of chained Relation calls in the least efficient way possible)

Topic		Replies	Views
Project PostgreSQL Hyper-record	10	2039	July 2, 2018
Can we fill up our benchmarks?	12	2344	September 7, 2017
Comparing all Rails benchmarks to raw counterparts	1	2148	April 22, 2015
How can active record be faster?	14	2283	July 24, 2017
Rails Micro Benchmarks!	11	3536	June 2, 2015

What about a spec that pushes AR to the limit?

Related Topics