NEW! Trying to keep your (suddenly) remote team productive? Try Discourse for Teams!

Blog

Updates & community insights

  • Standing on the Shoulders of a Giant Elephant: Upgrading Discourse to PostgreSQL 13

    elephant

    When the Discourse project started, way back in 2013, the team had to pick and choose between a handful tools into what would become our “stack”, the foundational software on which Discourse would be built upon. Some choices proved sub-optimal early, but we were able to quickly migrate over, like for example our migration from CoffeeScript to Javascript.

    On the other hand, most of our choices proved to be great, with picking PostgreSQL for the database being the finest. To illustrate how happy we are with it, let’s talk about our favorite feature of PostgreSQL latest version: B-Tree deduplication.

    A little back history into our hosting service

    While Discourse is, of course, 100% open source software first and foremost we are a hosting company. And since we started our commercial hosting services back in 2014, we grew our hosting into serving over 400 millions page views and storing over 4 million new posts each month.

    All this data is stored into PostgreSQL instances, so as you can imagine we were very interested when the PostgreSQL 13 release notes contained news about “significant improvements to its indexing and lookup system that benefit large databases, including space savings and performance gains for indexes”. It even made us consider breaking from our tradition of skipping the PostgreSQL odd versions and only upgrade every two years. And in order make an informed decision we had to benchmark.

    Activate the Shrink Ray

    hearthstone-shrink-ray-card

    In order to evaluate if the new B-Tree deduplication feature would benefit Discourse in any way, we decided to check if it would have effect in what is the largest table in most Discourse instances in our hosting, the posts_timings table. This tables stores read time of each user in each post and is defined as:

    discourse=# \d post_timings
                  Table "public.post_timings"
       Column    |  Type   | Collation | Nullable | Default 
    -------------+---------+-----------+----------+---------
     topic_id    | integer |           | not null | 
     post_number | integer |           | not null | 
     user_id     | integer |           | not null | 
     msecs       | integer |           | not null | 
    Indexes:
        "index_post_timings_on_user_id" btree (user_id)
        "post_timings_summary" btree (topic_id, post_number)
        "post_timings_unique" UNIQUE, btree (topic_id, post_number, user_id)
    

    We are also investigating if we can drop the post_timings_summary index, as it’s a subset of the left-most columns in the post_timings_unique one, which means it can potentially be re-used.

    In a particular instance we host, this table recently just went over a billion rows, so we used this number of rows for our test. Also, since in a live system this table receives a constant influx of updates, due to the MVCC we can end up with quite a bit of “bloat” that can skew our analysis. So in order to compare in a clean environment we used brand new installs of the last release of both 12 and 13 pg versions. After loading each version, the numbers are as follows:

    Total table Size

    PostgreSQL 12: 114 GB
    PostgreSQL 13:  85 GB
    

    A 25% reduction in the relation size? That’s awesome! 🥳

    Digging into specifics we have:

    PostgreSQL 12
      Table: 42 GB
      Index: 72 GB
    
    PostgreSQL 13
      Table: 42 GB
      Index: 43 GB
    

    As foretold in the release notes, the optimization only applies to the index, and we can reproduce it here. The table size is still the same, but the index size is almost half.

    If we enhance it further:

    PostgreSQL 12
    
                   relation               |    size    
    --------------------------------------+------------
     public.post_timings                  |   42 GB
     public.post_timings_unique           |   30 GB
     public.index_post_timings_on_user_id |   21 GB
     public.post_timings_summary          |   21 GB
    
    
    PostgreSQL 13
    
                   relation               |    size    
    --------------------------------------+------------
     public.post_timings                  |   42 GB
     public.post_timings_unique           |   30 GB
     public.post_timings_summary          | 6939 MB
     public.index_post_timings_on_user_id | 6766 MB
    

    Again, as expected, the UNIQUE index that by definition has 0 duplication saw no change in it’s size, but the indexes with repeating values got optimized into just a third of their original size.

    Not only index size changes, but also performance. According to the PostgreSQL documentation on the topic:

    This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.

    They also add a caveat that for write-heavy workloads with no duplication will incur a small fixed performance penalty. It’s not our case here, but if it was this would be alleviated by the fact that this is written in a completely async code path in our application: it’s a background request in our client and a non-blocking route in our Rails app that leverages Rack Hijack.

    presentation-slide-about-hack-hijack

    So the prophecy was true: PostgreSQL 13 brings significant improvement to Discourse!

    That’s a big deal, because here we saw the effect in one table in a single database, where our database schema has dozens of tables. And we host thousands of Discourse instances, with multiple PostgreSQL instances each for High Availability, so the gains are multiplied many times over.

    Discourse ❤️ PostgreSQL

    As we said in Discourse Gives Back 2017, Discourse has always been a 100% open source project that builds upon the decades of hard work of many other open source projects to survive. As we grow we’re happy to be able to also contribute directly to funding the projects we rely on the most. That is why last year we made another monetary donation to the PostgreSQL foundation and we aim to do the same every year.

    Comments
  • Discourse Essentials Starter Kit

    Want to use Discourse but unsure about where to start? This curated list of articles will help enhance your Discourse knowledge right away!

    Photo by Dave Catchpole / CC BY

    Discourse New User Guide

    Dive into your first Discourse site after learning how to browse through topics, read posts and participate in civilized discussion!

    Discourse Moderation Guide

    If you’re a Discourse moderator, this guide will run through most common scenarios in detail and show you how each can be handled with Discourse.

    Interface Nomenclature Guide

    Categories or channels? Topics or threads? Posts or messages? Read our nomenclature guide and know the correct term for every situation, every time.

    Beginner’s guide to using Discourse Themes

    Fascinated by a Discourse that looks nothing like it’s supposed to? Find out more by jumping into the world of themes, theme-components, color palettes and more. Discourse can be customized to nearly any extent, see our diverse list of customer sites if you don’t believe that yet.


    All of the advice above is valid for every Discourse instance; regardless of whether you self-host or use our fully managed hosting service. If you have more questions about Discourse, do a quick search or post them on the Discourse Meta, where our helpful community would be happy to assist.

    Comments
  • Discourse 2.6 released!

    Today we release Discourse 2.6, building on Discourse 2.5 from June. For post 2.0 releases we’ve chosen a new set of codenames based on the history of human communication; this release is Baudot.

    Improved Category Moderation

    In this release, category moderators have been granted many more abilities.

    In addition to handling the review queue for a category (including flagged posts and topics), category moderators now have the ability to mark topics as solved, close topics, archive topics, move posts between topics, edit the category description, and add staff notices. Category moderators also receive a shield icon on posts within their categories, and have a CSS class on their posts for sites that wish to further customize the style.

    Dedicated Category Settings Pages

    Categories are one of the primary methods of organization and security in Discourse. There are so many (important!) category level settings that we’ve finally broken out category settings into their own dedicated pages.

    This interface improvement makes it much easier to find, link to, and change category level settings.

    Skip New User Tutorial

    Are you a Discourse power user? Do you have more “first like” badges than you know what to do with? Tired of thinking someone replied to you when all you got was yet another badge notification? You can now opt out of the new user tutorial, and the “new to Discourse” badges, when you sign up.

    Existing users can enable or disable this via their Skip new user onboarding tips and badges user preference.

    Automatic Continuation of Megatopics

    We’re still not sure megatopics are ultimately healthy for communities, but we also want to provide reasonable support for communities that feel they “need” longer ongoing topics – though we generally tend to recommend a parallel live chat system as a healthier option for most communities.

    Once the maximum reply limit (which defaults to 10,000) is reached, Discourse will now automatically split off a new discussion and link the two discussions together, rather than just closing the discussion.

    Slow Mode for Contentious Topics

    One feature we’ve considered for a while now is the ability to enter “slow mode” for contentious topics. In this release, staff has the ability to manually flip on slow mode for a topic via the admin wrench:

    Once enabled, slow mode enforces a short (configurable) wait between posts by the same user in busy topics.

    Right now this is a bit of an experiment; we’re considering ways to make slow mode automatic depending on topic velocity and number of new users participating in a topic. Thoughtful discussion is always the goal with Discourse, and some topics are inherently more contentious than others.

    Font selection, auto dark mode, and WCAG theme

    We continue to improve and refine the Discourse Setup Wizard. In this release we’ve added the ability to select the body and heading fonts from the available options at fonts.google.com.

    Additionally, we’ve added support for automatic detection of dark mode via the (new) built in browser and operating system support, and a new WCAG compliant high contrast theme now ships with every copy of Discourse.

    Improved Oneboxing

    We continue to refine oneboxing – the process of taking a URL, when posted on a line by itself, and expanding it based on OpenGraph and oEmbed tags in the target page.

    In this release, we try to be much more flexible and provide specific, detailed feedback as to why a link may not be oneboxing.

    And So Much More!

    There’s way too much going on in even one Discourse release to cover in a single blog post! View the release-notes tag to get a detailed account of changes in every beta leading up to this release, or see the full release notes.

    Easy One Click Upgrade

    If you are on our hosting, you’re already upgraded. Otherwise, upgrading is as easy as clicking the Update button linked from your Discourse dashboard.

    Discourse Admin Dashboard, version upgrade notice

    We have a public exploit bounty program at Hacker One as a part of our security policy. We believe in being secure by default at Discourse, and we diligently follow up on any security issues brought to us. As usual, there are several important security fixes in this release, so we urge everyone to upgrade as soon as possible.

    If you don’t have a Discourse to upgrade, why not? Install it yourself in under 30 minutes, or start an absolutely free, no strings attached 14 day hosting trial today!

    Thank You

    First and foremost, a huge thanks to our customers. We quite literally could not do this without your direct financial support, and we’re proud to give our open source code back to the world with your help.

    You can measure the health of any open source project by one simple metric – its contributors. Thanks for code contributions in this release from:

    victorFFFF, Ahmedgagan, Chunhao Zheng, chunhz, fzngagan, angusmcleod , jahan-ggn, Gavin-X, dmcaulay, md-misko, ermolaev, leshik, weallwegot, Enduvar, thorpelawrence, RickyC0626, sbernhard, naineet, siriwatknp, pfaffman, merefield, pavelloz, tricknotes, terrapop, jtsagata, VaperinaDEV, herron9, renato, rogercreagh, buildthomas, mrksu, xronos-i-am, AndreyPnm, etnt93, fauno, wolfgangrittner, dyoung522, danielmai, jelle619, dlecan, jericson, jaydrogers, hyandell, marcinkoziej, mbrrg, AndreaHabib, discoursehosting, bf4, notstephen, haomingw, crydotsnake, MrChrisW, danyang685, TheTripleV, fafa-junhe, nikolai-b, timbocode, pjv, galaktor, marielaSAG, rachel-carvalho, pacharanero, anthonydillon, delarosashane, hawm, alebertacco, ValdikSS, prayagverma, ti0.

    Many thanks to the translators who generously contributed their time and effort translating Discourse into dozens of languages for this release.

    Discourse is a tool for communities, and we greatly value our own community of people who run Discourse instances while also helping us out by posting support / bug requests / feedback topics on meta.discourse.org to improve Discourse. If you operate or support a Discourse community, we would love to hear from you!

    If you’re wondering what’s coming up for Discourse in version 2.7 and beyond, visit the releases category to get a sneak preview of what we’ll be working on next.

    Comments