How We Fixed YAML Comment Preservation in Ruby (And Why We Sponsored It)
Discourse solved a five-year infrastructure headache by sponsoring psych-pure, a Ruby YAML library that preserves comments through programmatic edits - keeping years of hard-won institutional knowledge intact.
For five years, our infrastructure runbooks had a problem: every time they programmatically edited a YAML file, all the comments disappeared.
The Problem
Discourse’s infrastructure team maintains hundreds of YAML configuration files. We also have runbooks (Ruby scripts) that modify these files for operations like database failovers, cluster migrations, and credential rotations.
The issue: Ruby’s standard YAML library (Psych) doesn’t preserve comments.
data = YAML.load_file('config.yml')
data['new_key'] = 'value'
File.write('config.yml', YAML.dump(data))
# Every comment in the file is now gone
This isn’t a bug; the YAML spec treats comments as “presentation details” that parsers may discard, and most do.
By 2021, the problem was actively causing pain. Certain runbooks were described as “butchering the docker-hosting files.” Comments gone, indentation changed, strings randomly quoted or unquoted.
Engineers either manually re-added comments after every runbook run, or stopped adding comments because they knew they’d be lost. Neither option is good.
Previous Attempts
A colleague tried to solve this in 2021 using psych-pure, a pure Ruby implementation of Psych. But psych-pure wasn’t stable enough at the time. Parsing would fail on edge cases in our production files. The project stalled.
Other approaches we considered:
- String manipulation: Regex find-and-replace. Breaks on any non-trivial YAML
structure. - Psych AST manipulation: You can coax Psych to preserve some information by
working with its internal AST, but it’s undocumented and brittle. - Port ruamel.yaml from Python: Possible, but a massive undertaking.
The Solution: Sponsoring psych-pure
In late 2025, Kevin Newton had been actively improving psych-pure. We reached out about sponsoring the remaining work to make it production-ready.
The key insight behind psych-pure: while the YAML spec says comments are “presentation details,” nothing prevents a parser from keeping them. psych-pure attaches comments to their adjacent nodes in the AST. When you serialize back to YAML, the comments come along.
We sponsored Kevin to stabilize the library and fix the edge cases that broke our production files. After a few rounds of testing against 1,736 production YAML files across two repositories, we had zero parsing failures.
Integration: YamlHelper
With psych-pure stable, I built YamlHelper in our ops repo. The API is simple:
YamlHelper.edit_file('container.yml') do |data|
data['env']['NEW_KEY'] = 'value'
data['templates'] << 'new_template.yml'
end
This loads the file, yields the parsed data for modification, then writes it back with all comments preserved.
Previously, many of our runbooks stripped comments. Now they don’t.
Testing
The test suite has 11 tests covering all the operations these runbooks perform: cluster management, database failover, S3 key rotation, container config edits. All passing.
One expected behavior: comments directly attached to deleted keys are also removed. If you delete database_url, the comment above it explaining what database_url does also disappears. This is correct behavior from psych-pure.
All other comments are preserved exactly.
Why Comments Matter
Comments in infrastructure config aren’t documentation for documentation’s sake. They’re institutional knowledge:
# DO NOT CHANGE - legacy app expects this exact port
port: 8080
# Increased from 512MB after OOM on 2024-03-15, see incident #423
memory_limit: 1024
# TODO: Remove after Rails 8 upgrade
legacy_cookie_format: true
Lose these and you lose context that took years to accumulate. Engineers make worse decisions. Incidents take longer to debug. Tribal knowledge stays tribal.
Automation that destroys comments actively discourages documentation. Why write a comment explaining a workaround if the next script run deletes it?
The Sponsorship Model
Sponsoring Kevin to finish psych-pure unlocks a fully maintained open source library that benefits the entire Ruby ecosystem.
This is the kind of targeted sponsorship that makes sense: a specific problem, a developer already working on a solution, and a clear deliverable; and working with Kevin has been fantastic.
Using psych-pure
The gem is on RubyGems:
gem install psych-pure
yaml-janitor
If you’re building infrastructure automation in Ruby and touching YAML files, this solves a real problem. I also built yaml-janitor, a linter around psych-pure that can fix formatting inconsistencies while preserving comments.