On the other hand, most of our choices proved to be great, with picking PostgreSQL for the database being the finest. To illustrate how happy we are with it, let’s talk about our favorite feature of PostgreSQL latest version: B-Tree deduplication.
A little back history into our hosting service
While Discourse is, of course, 100% open source software first and foremost we are a hosting company. And since we started our commercial hosting services back in 2014, we grew our hosting into serving over 400 millions page views and storing over 4 million new posts each month.
All this data is stored into PostgreSQL instances, so as you can imagine we were very interested when the PostgreSQL 13 release notes contained news about “significant improvements to its indexing and lookup system that benefit large databases, including space savings and performance gains for indexes”. It even made us consider breaking from our tradition of skipping the PostgreSQL odd versions and only upgrade every two years. And in order make an informed decision we had to benchmark.
Activate the Shrink Ray
In order to evaluate if the new B-Tree deduplication feature would benefit Discourse in any way, we decided to check if it would have effect in what is the largest table in most Discourse instances in our hosting, the
posts_timingstable. This tables stores read time of each user in each post and is defined as:
discourse=# \d post_timings Table "public.post_timings" Column | Type | Collation | Nullable | Default -------------+---------+-----------+----------+--------- topic_id | integer | | not null | post_number | integer | | not null | user_id | integer | | not null | msecs | integer | | not null | Indexes: "index_post_timings_on_user_id" btree (user_id) "post_timings_summary" btree (topic_id, post_number) "post_timings_unique" UNIQUE, btree (topic_id, post_number, user_id)
We are also investigating if we can drop the
post_timings_summaryindex, as it’s a subset of the left-most columns in the
post_timings_uniqueone, which means it can potentially be re-used.
In a particular instance we host, this table recently just went over a billion rows, so we used this number of rows for our test. Also, since in a live system this table receives a constant influx of updates, due to the MVCC we can end up with quite a bit of “bloat” that can skew our analysis. So in order to compare in a clean environment we used brand new installs of the last release of both 12 and 13 pg versions. After loading each version, the numbers are as follows:
Total table Size
PostgreSQL 12: 114 GB PostgreSQL 13: 85 GB
A 25% reduction in the relation size? That’s awesome! 🥳
Digging into specifics we have:
PostgreSQL 12 Table: 42 GB Index: 72 GB PostgreSQL 13 Table: 42 GB Index: 43 GB
As foretold in the release notes, the optimization only applies to the index, and we can reproduce it here. The table size is still the same, but the index size is almost half.
If we enhance it further:
PostgreSQL 12 relation | size --------------------------------------+------------ public.post_timings | 42 GB public.post_timings_unique | 30 GB public.index_post_timings_on_user_id | 21 GB public.post_timings_summary | 21 GB PostgreSQL 13 relation | size --------------------------------------+------------ public.post_timings | 42 GB public.post_timings_unique | 30 GB public.post_timings_summary | 6939 MB public.index_post_timings_on_user_id | 6766 MB
Again, as expected, the
UNIQUEindex that by definition has 0 duplication saw no change in it’s size, but the indexes with repeating values got optimized into just a third of their original size.
Not only index size changes, but also performance. According to the PostgreSQL documentation on the topic:
This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.
They also add a caveat that for write-heavy workloads with no duplication will incur a small fixed performance penalty. It’s not our case here, but if it was this would be alleviated by the fact that this is written in a completely async code path in our application: it’s a background request in our client and a non-blocking route in our Rails app that leverages Rack Hijack.
So the prophecy was true: PostgreSQL 13 brings significant improvement to Discourse!
That’s a big deal, because here we saw the effect in one table in a single database, where our database schema has dozens of tables. And we host thousands of Discourse instances, with multiple PostgreSQL instances each for High Availability, so the gains are multiplied many times over.
Discourse ❤️ PostgreSQL
As we said in Discourse Gives Back 2017, Discourse has always been a 100% open source project that builds upon the decades of hard work of many other open source projects to survive. As we grow we’re happy to be able to also contribute directly to funding the projects we rely on the most. That is why last year we made another monetary donation to the PostgreSQL foundation and we aim to do the same every year.
Want to use Discourse but unsure about where to start? This curated list of articles will help enhance your Discourse knowledge right away!
Photo by Dave Catchpole / CC BY
Dive into your first Discourse site after learning how to browse through topics, read posts and participate in civilized discussion!
If you’re a Discourse moderator, this guide will run through most common scenarios in detail and show you how each can be handled with Discourse.
channels? Topics or threads? Posts or messages? Read our nomenclature guide and know the correct term for every situation, every time.
Fascinated by a Discourse that looks nothing like it’s supposed to? Find out more by jumping into the world of themes, theme-components, color palettes and more. Discourse can be customized to nearly any extent, see our diverse list of customer sites if you don’t believe that yet.
All of the advice above is valid for every Discourse instance; regardless of whether you self-host or use our fully managed hosting service. If you have more questions about Discourse, do a quick search or post them on the Discourse Meta, where our helpful community would be happy to assist.
Improved Category Moderation
In this release, category moderators have been granted many more abilities.
In addition to handling the review queue for a category (including flagged posts and topics), category moderators now have the ability to mark topics as solved, close topics, archive topics, move posts between topics, edit the category description, and add staff notices. Category moderators also receive a shield icon on posts within their categories, and have a CSS class on their posts for sites that wish to further customize the style.
Dedicated Category Settings Pages
Categories are one of the primary methods of organization and security in Discourse. There are so many (important!) category level settings that we’ve finally broken out category settings into their own dedicated pages.
This interface improvement makes it much easier to find, link to, and change category level settings.
Skip New User Tutorial
Are you a Discourse power user? Do you have more “first like” badges than you know what to do with? Tired of thinking someone replied to you when all you got was yet another badge notification? You can now opt out of the new user tutorial, and the “new to Discourse” badges, when you sign up.
Existing users can enable or disable this via their
Skip new user onboarding tips and badgesuser preference.
Automatic Continuation of Megatopics
We’re still not sure megatopics are ultimately healthy for communities, but we also want to provide reasonable support for communities that feel they “need” longer ongoing topics – though we generally tend to recommend a parallel live chat system as a healthier option for most communities.
Once the maximum reply limit (which defaults to 10,000) is reached, Discourse will now automatically split off a new discussion and link the two discussions together, rather than just closing the discussion.
Slow Mode for Contentious Topics
One feature we’ve considered for a while now is the ability to enter “slow mode” for contentious topics. In this release, staff has the ability to manually flip on slow mode for a topic via the admin wrench:
Once enabled, slow mode enforces a short (configurable) wait between posts by the same user in busy topics.
Right now this is a bit of an experiment; we’re considering ways to make slow mode automatic depending on topic velocity and number of new users participating in a topic. Thoughtful discussion is always the goal with Discourse, and some topics are inherently more contentious than others.
Font selection, auto dark mode, and WCAG theme
We continue to improve and refine the Discourse Setup Wizard. In this release we’ve added the ability to select the body and heading fonts from the available options at fonts.google.com.
Additionally, we’ve added support for automatic detection of dark mode via the (new) built in browser and operating system support, and a new WCAG compliant high contrast theme now ships with every copy of Discourse.
In this release, we try to be much more flexible and provide specific, detailed feedback as to why a link may not be oneboxing.
And So Much More!
There’s way too much going on in even one Discourse release to cover in a single blog post! View the release-notes tag to get a detailed account of changes in every beta leading up to this release, or see the full release notes.
Easy One Click Upgrade
If you are on our hosting, you’re already upgraded. Otherwise, upgrading is as easy as clicking the Update button linked from your Discourse dashboard.
We have a public exploit bounty program at Hacker One as a part of our security policy. We believe in being secure by default at Discourse, and we diligently follow up on any security issues brought to us. As usual, there are several important security fixes in this release, so we urge everyone to upgrade as soon as possible.
If you don’t have a Discourse to upgrade, why not? Install it yourself in under 30 minutes, or start an absolutely free, no strings attached 14 day hosting trial today!
First and foremost, a huge thanks to our customers. We quite literally could not do this without your direct financial support, and we’re proud to give our open source code back to the world with your help.
You can measure the health of any open source project by one simple metric – its contributors. Thanks for code contributions in this release from:
victorFFFF, Ahmedgagan, Chunhao Zheng, chunhz, fzngagan, angusmcleod , jahan-ggn, Gavin-X, dmcaulay, md-misko, ermolaev, leshik, weallwegot, Enduvar, thorpelawrence, RickyC0626, sbernhard, naineet, siriwatknp, pfaffman, merefield, pavelloz, tricknotes, terrapop, jtsagata, VaperinaDEV, herron9, renato, rogercreagh, buildthomas, mrksu, xronos-i-am, AndreyPnm, etnt93, fauno, wolfgangrittner, dyoung522, danielmai, jelle619, dlecan, jericson, jaydrogers, hyandell, marcinkoziej, mbrrg, AndreaHabib, discoursehosting, bf4, notstephen, haomingw, crydotsnake, MrChrisW, danyang685, TheTripleV, fafa-junhe, nikolai-b, timbocode, pjv, galaktor, marielaSAG, rachel-carvalho, pacharanero, anthonydillon, delarosashane, hawm, alebertacco, ValdikSS, prayagverma, ti0.
Discourse is a tool for communities, and we greatly value our own community of people who run Discourse instances while also helping us out by posting support / bug requests / feedback topics on meta.discourse.org to improve Discourse. If you operate or support a Discourse community, we would love to hear from you!
If you’re wondering what’s coming up for Discourse in version 2.7 and beyond, visit the releases category to get a sneak preview of what we’ll be working on next.