blog

archives

Our New Datacenter Cabinet

Jeff Atwood November 12, 2015

As Discourse grows, we’re adding more server capacity and newer servers to make sure our hosting remains blazing fast.

We have one server cabinet at Hurricane Electric, which we have been very happy with, and we just upgraded our account to add another full 42U server cabinet and another gigabit internet connection.

new-cabinet-switch

new-cabinet-servers

Cables, as always, are color-coded:

IPMI VPN, private local intra-server network, switch cross connect, cabinet cross connect

Right now it’s a partially populated cabinet, just our standard Cisco 2960X switch (and an inexpensive netgear switch dedicated to the private IPMI management connections), and seven of our new, faster, Skylake based 1U servers for internal testing. This also lets us exercise our cross-cabinet connection muscles, so if we need to add 3, 4 or even more cabinets to support our future customers, we’ll be ready.

Colocation isn’t our only plan. We’ve also been pushing for more of a hybrid cloud arrangement, where …

  1. Free trials can be deployed to the cloud for potential customers, and then migrated to our super fast hosted infrastructure when the trial converts.
  2. Enterprise customers who are uncomfortable with our single datacenter can have a backup cloud instance on hot standby that we can automatically switch to in the event that something happens to the he.net datacenter.

The first item is particularly exciting since it would let us scale up our free trials and offer Discourse to many more people at lower cost.

Here’s to an even faster Discourse hosting plan in 2016! Stay tuned!

0 comments

Our Discourse Hosting Configuration

Michael Brown January 27, 2015

We’ve talked about the Discourse server hardware before. But now let’s talk about the Discourse network. In the physical and software sense, not the social sense, mmmk? How exactly do we host Discourse on our servers?

Here at Discourse, we prefer to host on our own super fast, hand built, colocated physical hardware. The cloud is great for certain things but like Stack Exchange, we made the decision to run on our own hardware. That turned out to be a good decision as Ruby didn’t virtualize well in our testing.

The Big Picture

(Yes, that’s made with Dia – still the quickest thing around for network diagramming, even if the provided template images are from the last century.)

Tie Interceptors

Nothing but the finest bit of Imperial technology sits out front of everything, ready to withstand an assault from the Internet at large on our application:



The tieinterceptor servers handle several important functions:

  • ingress/egress: artisinally handcrafted iptables firewall rules control all incoming traffic and ensure that certain traffic can’t leave the network
  • mail gateway: the interceptors are responsible for helping to ensure that mail reaches the destination. DKIM signing, hashcash (potentially) signing, header cleanup
  • haproxy: using haproxy allows the interceptors to act as load balancers, dispatching requests to the web tier and massaging the responses
  • keepalived: we use keepalived for its VRRP implementation. We give keepalived rules such as “if haproxy isn’t running, this node shouldn’t have priority” and it performs actions based on those rules – in this case adding or removing a shared IPv4 (and IPv6) address from the Internet-facing network

Tie Fighters (web)

The tiefighters represent the mass fleet of our Death Star, our Docker servers. They are small, fast, identical – and there are lots of them.

They run the Discourse server application in stateless docker containers, allowing us to easily set up continuous deployment of Discourse with Jenkins.

What’s running in each Docker container?

  • Nginx: Wouldn’t be web without a webserver, right? We chose Nginx because it is one of the fastest lightweight web servers.
  • Unicorn: We use Unicorn to run the Ruby processes that serve Discourse. More Unicorns = more concurrent requests.
  • Anacron: Server scheduling is handled by Anacron, which keeps track of scheduled commands and scripts, even if the container is rebooted or offline.
  • Logrotate and syslogd: Logs, logs, logs. Every container generates a slew of logs via syslogd and we use Logrotate to handle log rotation and maximum log sizes.
  • Sidekiq: For background server tasks at the Ruby code level we use Sidekiq.

Shared data files are persisted on a common GlusterFS filesystem shared between the hosts in the web tier using a 3-Distribute 2-Replicate setup. Gluster has performed pretty well, but doesn’t seem to tolerate change that well – replacing/rebuilding a node is a bit of a gut-wrenching operation that kind of feels like yanking a disk out of a RAID10 array, plugging a new one in and hoping the replication goes well. I want to look at Ceph as a distributed filesystem store – able to provide both a S3-like interface as well as a multimount POSIX filesystem.

Tie Fighters (database)

We are using three of the Ties with newer SSDs as our Postgres database servers:

  1. One hosts the databases for our business class (single Discourse application image hosting many sites) containers and standard-tier plans.
  2. One hosts the databases for the enterprise class instances.
  3. One is the standby for both of these – it takes the streaming replication logs from the primary DBMSes and is ready to be promoted in the event of a serious failure.

Tie Fighter Prime

The monster of the group is tiefighter1. Unlike all the other Ties, it is provisioned with 8 processors and 128 GB memory. We’ve been trying to give it more and more to do over the past year. I’d say that’s been a success:

Although it is something of a utility and VM server, one of the most important jobs it handles is our redis in-memory network cache.

Properly separating redis instances from each other has been on our radar for a while – they were already configured to use separate databases for partitioning, but that also meant that instances could still affect each other. Notably the multisite configurations which connected to the same redis server but used different redis databases.

We had an inspiration: use the password functionality provided by the redis server to automatically drop any connections using the same password into its own isolated redis backend. A new password will automatically create a new instance specifically tied to that password. Separation, security, ease of use. A few days later, Sam came back with redismux. It’s been chugging along after being moved inside docker in September.

Jenkins

Jenkins is responsible for “all” of our internal management routines – why would you do tasks over and over when you can automate them? Of course, the elephant is our build and deployment process. We have a series of jobs set up that automatically run on a github update:

Total duration from a push to the main github repository to the code running in production: 12 minutes, 8 of which are spent building the new master docker image.

It’s taken us about a year (and many, many betas of Docker and Discourse) to reach this as a reasonably stable configuration for hosting Discourse. We’re sure it will change over time, and we will continue to scale it out as we grow.

0 comments

The Discourse Servers

Jeff Atwood April 15, 2013

When we moved to our new datacenter, I didn’t elaborate on exactly what sort of hardware we have in the rack. But now I will.

he-cabinet-1-servers

There are 13 servers here now, all variously numbered Tie Fighters — derived from our internal code name for the project while it was a secret throughout 2012.

Tie Fighter 1

This is a very beefy server that we bought first with the idea that we’d do a lot of virtualization on one large, expensive, many-core server.

  • Intel Xeon E5-2680 2.7 GHz / 3.5 GHz 8-core turbo
  • 128 GB DDR3 1333MHz ECC Reg (8 x 16GB)
  • 8 × Samsung 850 Pro 1TB SSD
  • LSI 3Ware 9750 8i SAS RAID Controller

Specs:

  • 8 x 2.5″ hot-swap drive bays
  • Dual gigabit ethernet (both connected as peers)
  • Integrated IPMI 2.0 ethernet
  • 330W gold efficiency power supply
  • SuperMicro X9SRE-3F mobo
  • SuperMicro 1017R-MTF case

We didn’t build this one, but purchased it from PogoLinux where it is known as the Iris 1168. We swapped out the HDDs for SSDs in early 2016.

Tie Fighter 2 – 10

Turns out that Ruby is … kind of hard to virtualize effectively, so we ended up abandoning that big iron strategy and going with lots of cheaper, smaller, faster boxes running on bare metal. Hence these:

  • Intel Xeon E3-1280 V2 Ivy Bridge 3.6 Ghz / 4.0 Ghz quad-core turbo
  • 32 GB DDR3 1600MHz ECC Reg (4 x 8 GB)
  • 2 × Samsung 512 GB SSD in software mirror (830 on 2-5, 840 Pro on 6-11)

Specs:

  • 4 x 2.5″ hot-swap drive bays
  • Dual gigabit ethernet (both connected as peers)
  • Integrated IPMI 2.0 ethernet
  • 330W gold efficiency power supply
  • SuperMicro X9SCM-F-O mobo
  • SuperMicro CSE-111TQ-563CB case

I built these, which I documented in Building Servers for Fun and Prof… OK, Maybe Just for Fun. It’s not difficult, but it is a good idea to amortize the build effort across several servers so you get a decent return for your hours invested.

Tie Bomber

he-cabinet-1-netapp-tie-bomber

Our one concession to ‘big iron’, this is a NetApp FAS2240A storage device. It has:

  • two complete, redundant devices with high speed crossover rear connections
  • dual power supplies per device
  • 5 ethernet connections (1 management, 4 data) per device
  • 12 7.2K RPM 2TB drives per device

It’s extremely redundant and handles all our essential customer files for the sites we host.

What about redundancy?

We have good live redundancy with Tie 1 – 11; losing Tie 3 or 4 wouldn’t even be noticed from the outside. The most common failure points for servers are hard drives and PSUs, so just in case, we also keep the following “cold spare” parts on hand, sitting on a shelf at the bottom of the rack:

he-cabinet-1-spare-parts

  • 2 × X306A-R5 2TB drives for Tie Bomber
  • 4 × Samsung 512 GB SSD spares
  • 2 × Samsung 1 TB SSD spares
  • 2 × SuperMicro 330W spare PSU for Tie 1 – 11

It’s OK for the routing servers to be different, since they are almost fixed function devices, but if I could go back and do it over again, I’d spend the money we paid for Tie 1 on three more servers like Tie 2 – 11 instead. The performance would be better, and the greater consistency between servers would simplify things.

Networking

he-cabinet-1-switches

Cables are color-coded:

IPMI VPN, private local intra-server network, incoming Internet, switch cross connect, cabinet cross connect, NetApp file storage device

  • The primary networking duties are handled by a rack mount Cisco Catalyst 2960X 48 port switch.
  • We have a second stacked Catalyst 2960X live and ready to accept connections. (We’re slowly moving half of each bonded server connection to the other switch for additional redundancy)
  • There’s also a NetGear ProSafe 24-port switch dedicated to IPMI routing duties to each server.

We use four Tie Shuttle boxes as inexpensive, solid state Linux OpenVPN access boxes to all the IPMI KVM-over-Internet dedicated ethernet management ports on each server. IPMI 2.0 really works!

But… is it webscale?

During our launch peak loads, the servers were barely awake. We are ridiculously overprovisioned at the moment, and I do mean ridiculously. But that’s OK, because hardware is cheap, and programmers are expensive. We have plenty of room to scale for our partners and the eventual hosting service we plan to offer. If we do need to add more Tie Fighters, we still have a ton of space in the full rack, and ample power available at 15 amps. These servers are almost exclusively Ivy Bridge CPUs, so all quite efficient — the whole rack uses around 6-10 amps in typical daily work.

As for software, we’re still hashing out the exact configuration details, but we’ll be sure to post a future blog entry about that too. I can tell you that all our servers run Ubuntu Server 14.04 LTS x64, and it’s working great for us!

41 comments

For more blog posts, visit the archives