What about a spec that pushes AR to the limit?


#1

@bmarkons I do not think we have any proper indicative spec for “real world” edges that you hit with ActiveRecord.

For example to render the home page on Discourse we generate a query that looks like this:

SELECT "topics"."id" AS t0_r0,
       "topics"."title" AS t0_r1,
       "topics"."last_posted_at" AS t0_r2,
       "topics"."created_at" AS t0_r3,
       "topics"."updated_at" AS t0_r4,
       "topics"."views" AS t0_r5,
       "topics"."posts_count" AS t0_r6,
       "topics"."user_id" AS t0_r7,
       "topics"."last_post_user_id" AS t0_r8,
       "topics"."reply_count" AS t0_r9,
       "topics"."featured_user1_id" AS t0_r10,
       "topics"."featured_user2_id" AS t0_r11,
       "topics"."featured_user3_id" AS t0_r12,
       "topics"."avg_time" AS t0_r13,
       "topics"."deleted_at" AS t0_r14,
       "topics"."highest_post_number" AS t0_r15,
       "topics"."image_url" AS t0_r16,
       "topics"."like_count" AS t0_r17,
       "topics"."incoming_link_count" AS t0_r18,
       "topics"."category_id" AS t0_r19,
       "topics"."visible" AS t0_r20,
       "topics"."moderator_posts_count" AS t0_r21,
       "topics"."closed" AS t0_r22,
       "topics"."archived" AS t0_r23,
       "topics"."bumped_at" AS t0_r24,
       "topics"."has_summary" AS t0_r25,
       "topics"."vote_count" AS t0_r26,
       "topics"."archetype" AS t0_r27,
       "topics"."featured_user4_id" AS t0_r28,
       "topics"."notify_moderators_count" AS t0_r29,
       "topics"."spam_count" AS t0_r30,
       "topics"."pinned_at" AS t0_r31,
       "topics"."score" AS t0_r32,
       "topics"."percent_rank" AS t0_r33,
       "topics"."subtype" AS t0_r34,
       "topics"."slug" AS t0_r35,
       "topics"."deleted_by_id" AS t0_r36,
       "topics"."participant_count" AS t0_r37,
       "topics"."word_count" AS t0_r38,
       "topics"."excerpt" AS t0_r39,
       "topics"."pinned_globally" AS t0_r40,
       "topics"."pinned_until" AS t0_r41,
       "topics"."fancy_title" AS t0_r42,
       "topics"."highest_staff_post_number" AS t0_r43,
       "topics"."featured_link" AS t0_r44,
       "categories"."id" AS t1_r0,
       "categories"."name" AS t1_r1,
       "categories"."color" AS t1_r2,
       "categories"."topic_id" AS t1_r3,
       "categories"."topic_count" AS t1_r4,
       "categories"."created_at" AS t1_r5,
       "categories"."updated_at" AS t1_r6,
       "categories"."user_id" AS t1_r7,
       "categories"."topics_year" AS t1_r8,
       "categories"."topics_month" AS t1_r9,
       "categories"."topics_week" AS t1_r10,
       "categories"."slug" AS t1_r11,
       "categories"."description" AS t1_r12,
       "categories"."text_color" AS t1_r13,
       "categories"."read_restricted" AS t1_r14,
       "categories"."auto_close_hours" AS t1_r15,
       "categories"."post_count" AS t1_r16,
       "categories"."latest_post_id" AS t1_r17,
       "categories"."latest_topic_id" AS t1_r18,
       "categories"."position" AS t1_r19,
       "categories"."parent_category_id" AS t1_r20,
       "categories"."posts_year" AS t1_r21,
       "categories"."posts_month" AS t1_r22,
       "categories"."posts_week" AS t1_r23,
       "categories"."email_in" AS t1_r24,
       "categories"."email_in_allow_strangers" AS t1_r25,
       "categories"."topics_day" AS t1_r26,
       "categories"."posts_day" AS t1_r27,
       "categories"."allow_badges" AS t1_r28,
       "categories"."name_lower" AS t1_r29,
       "categories"."auto_close_based_on_last_post" AS t1_r30,
       "categories"."topic_template" AS t1_r31,
       "categories"."suppress_from_homepage" AS t1_r32,
       "categories"."contains_messages" AS t1_r33,
       "categories"."sort_order" AS t1_r34,
       "categories"."sort_ascending" AS t1_r35,
       "categories"."uploaded_logo_id" AS t1_r36,
       "categories"."uploaded_background_id" AS t1_r37,
       "categories"."topic_featured_link_allowed" AS t1_r38,
       "categories"."all_topics_wiki" AS t1_r39,
       "categories"."show_subcategory_list" AS t1_r40,
       "categories"."num_featured_topics" AS t1_r41,
       "categories"."default_view" AS t1_r42,
       "categories"."subcategory_list_style" AS t1_r43,
       "categories"."default_top_period" AS t1_r44
FROM "topics"
LEFT OUTER JOIN "categories" ON "categories"."id" = "topics"."category_id"
LEFT OUTER JOIN topic_users AS tu ON (topics.id = tu.topic_id
                                      AND tu.user_id = 1)
WHERE (topics.archetype <> 'private_message')
  AND (COALESCE(categories.topic_id, 0) <> topics.id)
  AND (topics.deleted_at IS NULL)
  AND (COALESCE(tu.notification_level,1) > 0)
  AND ( NOT EXISTS
         ( SELECT 1
          FROM category_users cu
          WHERE cu.user_id = 1
            AND cu.category_id = topics.category_id
            AND cu.notification_level = 0
            AND cu.category_id <> -1
            AND (tu.notification_level IS NULL
                 OR tu.notification_level < 2) ))
  AND (pinned_globally
       AND pinned_at IS NOT NULL
       AND (topics.pinned_at > tu.cleared_pinned_at
            OR tu.cleared_pinned_at IS NULL))
ORDER BY topics.bumped_at DESC
LIMIT 30;

We have multiple includes, references, joins and so on, we select a boat of columns with tons of conditions.

I feel like we need some specs that deal with this level of arel complexity so we can properly measure how much overhead is there. My flamegraphs show that in cases like this Active Record really shows its dark side.

Arrows point at pg gem. Rest of the time is in “overhead”.

Not sure exactly how the test should look, but we need something.

@tgxworld thoughts?

My initial thoughts is just to “redo” this particular query in AR and stage the tables etc.


#2

I have thought about adding spec from real world project to have one with bunch of columns. Thanks for this query, I will just reproduce it to get spec as real world example.

Maybe we should have more specs categorized as real world examples. Having only this one is perfect for start.

Analogous to testing code correctness, real world specs could be seen as integration specs while having isolated and specific ones is good as having unit tests.


#3

@sam can you pls link to ActiveRecord query you posted above?


#4

It is a bit tricky, it is generated using this class…

discourse/topic_query.rb at master · discourse/discourse · GitHub

But you need to dig through the methods to see how it is constructed cause it is composed using multiple methods.


#5

We could keep one or two complex query in our benchmark but what I think might be more useful is to have an interface that allows people to run custom SQL queries for say the last 5 Rails version. People would be able to use that report to submit to the Rails core team for bugs.


#6

@tgxworld

Maybe simply encourage people to submit benchmark for query from their real world projects. We could have these categorized as real world benchmarks.


#7

I would love to have this as an available benchmark FWIW. I’ve been doing a lot of work on Arel recently and this was exactly the benchmark I was looking for. (Preferably one that does this over a shit load of chained Relation calls in the least efficient way possible)