Maze (mazehq.com) Logo

Maze (mazehq.com)

Infra/DevOps Engineer

Reposted 7 Days Ago
Remote
28 Locations
Senior level
Remote
28 Locations
Senior level
As an Infra/DevOps Engineer, you'll design, implement, and maintain our infrastructure, ensuring scalability, security, and reliability while collaborating with engineers and optimizing performance.
The summary above was generated by AI

Summary of the Role:

As Infra/DevOps Engineer at Maze, you'll be the architect of our complex, multi-account Kubernetes infrastructure, building and scaling the foundation that powers our AI-driven cybersecurity platform across isolated enterprise environments. This is a unique opportunity to join as one of the early engineering team members of a well-funded startup building at the intersection of generative AI and cybersecurity. You'll design, code, and maintain sophisticated infrastructure spanning 12-15 AWS accounts, each with dedicated Kubernetes clusters, ensuring complete data segregation for our security-conscious enterprise customers.

You'll take full ownership of our infrastructure-as-code implementation, managing multiple Kubernetes clusters at scale using cutting-edge tools like Karpenter, Flux, and Kustomize. Your success will be measured by infrastructure reliability, deployment velocity, and your ability to build self-managed, distributed systems that scale elegantly as we grow from startup to enterprise scale. This role is perfect for a hands-on infrastructure engineer who has mastered complex Kubernetes deployments at scale, writes production-grade infrastructure code, and thrives on building simple, elegant solutions to complex distributed systems challenges.

Your Contributions to Our Journey:

  • Architect Multi-Cluster Kubernetes Infrastructure: Design, implement, and write infrastructure-as-code for our complex Kubernetes setup spanning multiple AWS accounts, ensuring each cluster is completely isolated for enterprise security requirements while maintaining operational efficiency

  • Build Self-Managed, Distributed Systems: Develop infrastructure that manages itself through GitOps workflows using Flux and Kustomize, creating distributed systems where actions in one place automatically trigger appropriate changes across the infrastructure without manual intervention

  • Scale Kubernetes Operations: Manage and optimize dozens of Kubernetes clusters across our multi-tenant and single-tenant environments, implementing auto-scaling solutions with Karpenter and ensuring seamless scaling as customer workloads grow exponentially

  • Develop Production-Grade Automation: Write robust, maintainable code to build and maintain CI/CD pipelines, custom automation tools, and deployment scripts that enable rapid feature delivery while maintaining the highest reliability standards

  • Ensure Enterprise Security: Implement security best practices and compliance measures that protect our highly sensitive security data, managing firewalls, encryption, IAM policies, and network segregation across our multi-account AWS architecture

  • Optimize Platform Performance: Build comprehensive monitoring, logging, and alerting systems that proactively identify issues, using tools like Prometheus and Grafana to ensure our infrastructure scales efficiently as we handle increasingly complex workloads

  • Enable Engineering Velocity: Work closely with backend and data engineering teams to build self-service infrastructure capabilities, allowing teams to provision databases, deploy services, and scale resources independently without constant infrastructure team involvement

What You Need to Be Successful:

  • Kubernetes Mastery at Scale: 5+ years of infrastructure/DevOps experience with deep, hands-on expertise managing complex Kubernetes deployments—you must have experience with multiple Kubernetes clusters (tens of clusters) in sophisticated setups, not just simple single-cluster environments

  • GitOps and Modern K8s Tooling: Proven production experience with Karpenter (for auto-scaling), Flux (for GitOps), and Kustomize (for configuration management)—if you have these three, you'll be a fish in the water with our infrastructure approach

  • AWS Infrastructure Expertise: Deep knowledge of AWS with hands-on experience managing complex multi-account architectures, understanding how to design for isolation, security, and scalability across numerous AWS accounts with proper networking and IAM configuration

  • Infrastructure-as-Code Excellence: Strong coding skills with production experience using Terraform or CloudFormation, writing maintainable, well-architected infrastructure code that follows best practices and scales with organizational growth. Proficiency in Python is essential for automation, tooling, and infrastructure development

  • Hands-On Coding: Currently active as a developer writing production code in Python for infrastructure automation, custom tooling, and operational scripts—you're not just an architect who delegates implementation

  • Simplicity-Driven Architecture: Proven ability to build simple, elegant solutions to complex infrastructure problems—you instinctively know the "right way" to use tools like Helm charts and avoid over-engineering while maintaining scalability

  • Platform Thinking: Experience building infrastructure with a platform mindset, creating systems that support multiple products and enable team self-service rather than building one-off solutions for individual applications

  • AWS Managed Services Philosophy: Understanding of when to use AWS managed services (RDS, MSK, EMR) versus building custom solutions, with experience scaling startups using managed services efficiently before investing in complex self-hosted infrastructure

  • Distributed Systems Mindset: Deep understanding of distributed systems principles with experience building infrastructure that is decentralized rather than centralized, allowing independent operation across multiple clusters and regions

  • Nice to haves:

    • Experience with AWS auto-scaling across complex, multi-cluster environments

    • Background in security-focused infrastructure or handling sensitive enterprise data

    • Previous experience at scale-ups that grew infrastructure from 20-100+ engineers

    • Knowledge of infrastructure observability tools beyond Prometheus/Grafana (e.g., ELK Stack)

    • Track record of building infrastructure that went through SOC2, ISO, or similar compliance certifications

Why Join Us:

  • Ambitious Infrastructure Challenges: We're using generative AI (LLMs and agents) to solve critical cybersecurity challenges, requiring sophisticated infrastructure that handles sensitive security data across isolated enterprise environments. You'll build the foundation for breakthrough AI-powered security solutions at unprecedented scale.

  • Expert Team: We are a team of hands-on leaders with deep experience in Big Tech and Scale-ups. Our team has been part of the leadership teams behind multiple acquisitions and an IPO.

  • Impactful Work: Cybersecurity is a force for good—helping stop cyber attacks ultimately helps deliver better outcomes for all of us. The infrastructure you build will directly enable security teams to protect organizations worldwide from real threats.

  • Build an AI-Native Company: We're building a new company in the AI era with the opportunity to design everything from the ground up—you'll architect infrastructure using cutting-edge Kubernetes practices and establish platform standards that will scale with us from startup through hypergrowth.

  • Technical Leadership Growth: Direct partnership with experienced engineering leadership, significant equity upside, and the opportunity to own and shape the entire infrastructure function as we scale our platform to support the world's largest enterprises.

Top Skills

Ansible
AWS
Azure
Chef
CloudFormation
Elk Stack
GCP
Grafana
Prometheus
Puppet
Terraform

Similar Jobs

Yesterday
Easy Apply
Remote
28 Locations
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
As Director of Regional Sales, you will lead new logo acquisition across EMEA, develop sales strategies, hire and coach a team, and collaborate with marketing and sales operations.
Top Skills: ClariGongOutreachSalesforce
Yesterday
Easy Apply
Remote
28 Locations
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
As a Senior Security Engineer, you'll lead incident response, create documentation, conduct post-incident analysis, and improve security processes in a 24/7 environment.
Top Skills: AWSGCPPythonSIEM
Yesterday
Easy Apply
Remote
30 Locations
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
Lead the design and implementation of GitLab's authentication infrastructure, focusing on security and performance. Collaborate with multiple teams to ensure effective identity services and support a transition to zero-trust architecture.
Top Skills: Ai-Powered DevsecopsCi/CdDistributed SystemsEnvoyGitopsGoGoogle SpannerGrpcHaproxyKubernetesNginxPostgresProxy TechnologiesRdsRest ApisRustTraefik

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account