It’s been a long, long time since we wrote about growing our team to 20. The last few years have been good to us and as a result, we’ve grown steadily and have also continued giving back to open source wherever possible. You might have heard of the new arrivals if you frequent the Discourse Meta, but here’s a list anyway.
Meet the rest of the team!
- Daniela Bogazzi - Technical Advocate
- Kyle Mitchell - Lawyer
- Jeff Wong - Software Engineer
- Johani Faris Saeed - Designer
- Ginevra Brown - Community Accounts Specialist
- David Taylor - Software Engineer
- Rishabh Nambiar - Community Team Lead
- Bianca Nenciu - Software Engineer
- Penar Musaraj - Software Engineer
- Saj Goonatilleke - Operations Engineer
- Dan Ungureanu - Software Engineer
- Taylor Henry - Technical Advocate
- Roman Rizzi - Software Engineer
- Justin DiRose - Technical Advocate
- Daniel Waterworth - Software Engineer
- Jarek Radosz - Software Engineer
- Kris Kotlarek - Software Engineer
- Mark VanLandingham - Software Engineer
- Martin Brennan - Software Engineer
- Osioke Itseuwa - Community Advocate
- Will Chau - Customer Success Manager
- Jordan Vidrine - Designer
- Kane York - Software Engineer
- Michelle Vendrame - Technical Advocate
- Tobias Eigen - Teams Product Manager
- Jamie Wilson - Software Engineer
- Michael Fitz-Payne - Operations Engineer
- Osama Sayegh - Software Engineer
- Blake Sorrell - Customer Success Manager
- Eleni Michalaki - Operations Engineer
- Andrei Prigorshnev - Software Engineer
- Alex Reed - Administrative Assistant
While it’s not a prerequisite, it’s clear we love to hire from our community. To read more about each member (ft. glorious drawings) and working with us, check out discourse.org/team.
We’re a fully remote company, working from 19 different countries and 15 different timezones, but does that make you wonder how we coordinate our work?
That’s right, we use Discourse as our primary team coordination tool to build Discourse! As it excels at asynchronous, distributed teamwork, we can keep interruptions like instant messaging, calls, and meetings to a minimum. If that approach sounds interesting, don’t forget to try Discourse for Teams.
Here’s to…the future of Discourse and to our community 🍻
On the other hand, most of our choices proved to be great, with picking PostgreSQL for the database being the finest. To illustrate how happy we are with it, let’s talk about our favorite feature of PostgreSQL latest version: B-Tree deduplication.
A little back history into our hosting service
While Discourse is, of course, 100% open source software first and foremost we are a hosting company. And since we started our commercial hosting services back in 2014, we grew our hosting into serving over 400 millions page views and storing over 4 million new posts each month.
All this data is stored into PostgreSQL instances, so as you can imagine we were very interested when the PostgreSQL 13 release notes contained news about “significant improvements to its indexing and lookup system that benefit large databases, including space savings and performance gains for indexes”. It even made us consider breaking from our tradition of skipping the PostgreSQL odd versions and only upgrade every two years. And in order make an informed decision we had to benchmark.
Activate the Shrink Ray
In order to evaluate if the new B-Tree deduplication feature would benefit Discourse in any way, we decided to check if it would have effect in what is the largest table in most Discourse instances in our hosting, the
posts_timingstable. This tables stores read time of each user in each post and is defined as:
discourse=# \d post_timings Table "public.post_timings" Column | Type | Collation | Nullable | Default -------------+---------+-----------+----------+--------- topic_id | integer | | not null | post_number | integer | | not null | user_id | integer | | not null | msecs | integer | | not null | Indexes: "index_post_timings_on_user_id" btree (user_id) "post_timings_summary" btree (topic_id, post_number) "post_timings_unique" UNIQUE, btree (topic_id, post_number, user_id)
We are also investigating if we can drop the
post_timings_summaryindex, as it’s a subset of the left-most columns in the
post_timings_uniqueone, which means it can potentially be re-used.
In a particular instance we host, this table recently just went over a billion rows, so we used this number of rows for our test. Also, since in a live system this table receives a constant influx of updates, due to the MVCC we can end up with quite a bit of “bloat” that can skew our analysis. So in order to compare in a clean environment we used brand new installs of the last release of both 12 and 13 pg versions. After loading each version, the numbers are as follows:
Total table Size
PostgreSQL 12: 114 GB PostgreSQL 13: 85 GB
A 25% reduction in the relation size? That’s awesome! 🥳
Digging into specifics we have:
PostgreSQL 12 Table: 42 GB Index: 72 GB PostgreSQL 13 Table: 42 GB Index: 43 GB
As foretold in the release notes, the optimization only applies to the index, and we can reproduce it here. The table size is still the same, but the index size is almost half.
If we enhance it further:
PostgreSQL 12 relation | size --------------------------------------+------------ public.post_timings | 42 GB public.post_timings_unique | 30 GB public.index_post_timings_on_user_id | 21 GB public.post_timings_summary | 21 GB PostgreSQL 13 relation | size --------------------------------------+------------ public.post_timings | 42 GB public.post_timings_unique | 30 GB public.post_timings_summary | 6939 MB public.index_post_timings_on_user_id | 6766 MB
Again, as expected, the
UNIQUEindex that by definition has 0 duplication saw no change in it’s size, but the indexes with repeating values got optimized into just a third of their original size.
Not only index size changes, but also performance. According to the PostgreSQL documentation on the topic:
This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.
They also add a caveat that for write-heavy workloads with no duplication will incur a small fixed performance penalty. It’s not our case here, but if it was this would be alleviated by the fact that this is written in a completely async code path in our application: it’s a background request in our client and a non-blocking route in our Rails app that leverages Rack Hijack.
So the prophecy was true: PostgreSQL 13 brings significant improvement to Discourse!
That’s a big deal, because here we saw the effect in one table in a single database, where our database schema has dozens of tables. And we host thousands of Discourse instances, with multiple PostgreSQL instances each for High Availability, so the gains are multiplied many times over.
Discourse ❤️ PostgreSQL
As we said in Discourse Gives Back 2017, Discourse has always been a 100% open source project that builds upon the decades of hard work of many other open source projects to survive. As we grow we’re happy to be able to also contribute directly to funding the projects we rely on the most. That is why last year we made another monetary donation to the PostgreSQL foundation and we aim to do the same every year.
Want to use Discourse but unsure about where to start? This curated list of articles will help enhance your Discourse knowledge right away!
Photo by Dave Catchpole / CC BY
Dive into your first Discourse site after learning how to browse through topics, read posts and participate in civilized discussion!
If you’re a Discourse moderator, this guide will run through most common scenarios in detail and show you how each can be handled with Discourse.
channels? Topics or threads? Posts or messages? Read our nomenclature guide and know the correct term for every situation, every time.
Fascinated by a Discourse that looks nothing like it’s supposed to? Find out more by jumping into the world of themes, theme-components, color palettes and more. Discourse can be customized to nearly any extent, see our diverse list of customer sites if you don’t believe that yet.
All of the advice above is valid for every Discourse instance; regardless of whether you self-host or use our fully managed hosting service. If you have more questions about Discourse, do a quick search or post them on the Discourse Meta, where our helpful community would be happy to assist.