WorkOS Logo

WorkOS

Database Reliability Engineer

Reposted 4 Days Ago
Remote
Hiring Remotely in Canada
Senior level
Remote
Hiring Remotely in Canada
Senior level
Own reliability, performance, and scalability of PostgreSQL infrastructure. Implement HA, replication, observability, capacity planning, automation, and DR. Support engineering teams with migrations, query optimization, on-call incident response, runbooks, and tooling to enable safe DB operations.
The summary above was generated by AI

About WorkOS 🚀

WorkOS builds modern developer tools and APIs that make it easy for companies to become Enterprise Ready. Our platform powers authentication, identity, authorization, and other critical infrastructure that developers need to securely scale their products to large organizations.
We recently raised a $100M Series C, valuing the company at $2B, led by Meritech and Sapphire with participation from Greenoaks, Craft, Abstract, and Audacious. WorkOS powers enterprise features for many of the fastest-growing AI companies, including OpenAI, Cursor, and Perplexity, Vercel, and Plaid.
As AI reshapes software, WorkOS is at the frontier of Human and Agent Authentication, Identity, and Access Control—helping companies answer a new critical question: who are your agents, and what are they allowed to do? Our fast-growing customer base includes hundreds of modern software companies building the next generation of enterprise-ready products.

About the Infrastructure Team

The Infrastructure team ensures the WorkOS platform remains fast, reliable, and resilient at scale. We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, minimizing downtime, and continuously improving service performance. Our team works across the stack and collaborates closely with product engineering teams.

As a Database Reliability Engineer on this team, you'll bring specialized database expertise to the Infrastructure organization. You'll own the full lifecycle of database management, from design and capacity planning through performance optimization and disaster recovery, ensuring data durability and scalability as WorkOS grows.

The Role

As a Database Reliability Engineer, you'll be the expert our engineering teams turn to for everything database-related. You'll work across the stack to ensure our PostgreSQL infrastructure (and related data stores) can support WorkOS's growth, from query optimization to capacity planning to incident response. You'll combine the mindset of a software engineer with deep database administration expertise to build automation, improve observability, and make our data layer self-healing wherever possible.

What You'll Do

  • Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure.

  • Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery.

  • Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs.

  • Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation.

  • Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team.

  • Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes.

  • Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads.

  • Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer.

  • Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation.

  • Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.

About You

  • 5+ years of experience running PostgreSQL in production at scale, with strong knowledge of internals (WAL, MVCC, vacuum tuning, query planner, indexing, replication).

  • Solid software engineering skills. You write production-quality code, not just scripts. Experience with Python, Go, Ruby, or similar languages.

  • Experience with infrastructure-as-code and configuration management (Terraform, Ansible, Chef, or similar).

  • Strong SQL skills and the ability to review and optimize complex queries for high-throughput, low-latency environments.

  • Experience with database high-availability patterns: streaming replication, connection pooling (PgBouncer), failover automation (Patroni or similar).

  • Familiarity with cloud database services on AWS (RDS, Aurora, DynamoDB, ElastiCache) or equivalent platforms.

  • Experience with monitoring and observability tools (Datadog, Prometheus, Grafana, or similar) applied to database workloads.

  • Comfort with on-call responsibilities and a track record of effective incident response.

  • Strong written and verbal communication skills. You document your work and share context proactively.

  • A proactive, ownership-driven mindset. When you see something broken, you fix it. When you see a pattern of toil, you automate it.

Nice to Have

  • Experience with other data stores beyond PostgreSQL (Redis, DynamoDB, ClickHouse, Elasticsearch).

  • Familiarity with Ruby on Rails or Django and how ORMs interact with the database layer.

  • Experience with database migration tooling and blue-green or zero-downtime migration strategies.

  • Contributions to open-source database tooling or the PostgreSQL ecosystem.

  • Background in security-sensitive environments, particularly around data encryption, access controls, and compliance requirements.

Projects You Could Work On

  • Designing and implementing automated failover and self-healing for our PostgreSQL clusters.

  • Building a query performance analysis pipeline that surfaces slow queries and recommends index improvements before they become production issues.

  • Developing a database change management system that lets engineers safely run migrations with automated rollback capabilities.

  • Improving our disaster recovery posture by testing backup restoration, reducing recovery time objectives, and automating DR drills.

  • Creating capacity planning models that forecast database growth and trigger scaling actions proactively.

  • Building internal tooling and dashboards that give engineering teams visibility into their database usage patterns.

  • Optimizing our connection pooling and load balancing strategy across read replicas to improve throughput and reduce latency.

Benefits and Perks (US Only) 💖

At WorkOS, we offer resources that emphasize personal and familial well-being. We offer healthcare coverage for you and your family, including medical, dental, and vision. We offer parental leave, paid-time off and fully remote working arrangements.

 

Benefits include:

- Competitive pay

- Substantial equity grants

- Healthcare insurance (Medical, Dental and Vision) for you and your family

- 401k matching

- Wellness and fitness monthly allowances

- PTO + paid holidays + unlimited sick leave

- Unlimited token usage

Please inquire directly with our recruiting team for benefits available to those working outside the US.

 

Equal Opportunity Employer

WorkOS is an equal opportunity employer, committed to diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Similar Jobs

14 Days Ago
Remote
Ontario, ON, CAN
Senior level
Senior level
Information Technology • Marketing Tech • Social Media
Design, develop, and manage automated solutions for production databases, enhance database reliability and performance, and architect infrastructure solutions.
Top Skills: AnsibleBashCassandraCi/CdDockerGoKubernetesLinuxMySQLPostgresPuppetRest ApisSaltstack
16 Days Ago
Remote or Hybrid
Senior level
Senior level
Healthtech • Software
The Database Reliability Engineer manages the cloud database infrastructure, improves processes through automation, liaises with engineering teams, and enhances database performance and availability.
Top Skills: AnsibleAWSAzureC#GCPGitGrafanaInfluxdbMySQLPostgresPowershellPythonSQLSQL ServerTerraform
52 Minutes Ago
Easy Apply
Remote
Canada
Easy Apply
Entry level
Entry level
Cloud • Security • Software • Cybersecurity • Automation
As a Sales Development Representative at GitLab, you'll drive outreach, manage inbound leads, and generate qualified sales opportunities while collaborating with marketing and sales teams.
Top Skills: Outreach.IoSalesforce

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account