Intact (intactfc.com) Logo

Intact (intactfc.com)

SRE specialist

Posted 13 Days Ago
Be an Early Applicant
In-Office
Montréal, QC, CAN
Senior level
In-Office
Montréal, QC, CAN
Senior level
The SRE specialist will lead investigations into incidents, improve resiliency, implement observability tools, and coach teams on best practices within multi-cloud environments.
The summary above was generated by AI

Our employees are at the heart of everything we do. Together, we help people, businesses, and society prosper in good times and be resilient in bad times.

Our employee promise represents Intact’s commitment to you in exchange for living our Values, striving to do your best work, being open to change and investing in your career. In return, we promise to provide support, opportunities and performance-led financial rewards at a workplace where you can shape the future, win as a team and grow with us.

Pay at Intact is about much more than just salary.

  • Flexible work arrangements and a hybrid work model

  • Possibility to purchase up to 5 extra days off per year

  • Multiple benefits offered to support physical and mental wellbeing, including telemedicine, Wellness account and much more

  • Share plan & other savings: up to 12% of salary or even more (ask how you could earn guaranteed income for life)

Salary range (but not limited to):

109,900 - 134,300

Annual bonus target, based on the base salary, with a potential payout of up to double the target (subject to personal and company performance):

15%

As part of our commitment to Win As A Team, we share our success with employees through our annual bonus plan and Employee Share Purchase Plan (ESPP) – with Intact matching 50% of your net shares.

Our pension offerings provide flexibility and long-term security for our employees beyond their careers. We are one of the few companies offering the opportunity to receive guaranteed income for life via our defined benefit pension plan.

Salary for the candidate will be determined taking into consideration a number of factors including: experience, skills, qualifications, anticipated contribution to role, internal equity, etc. The salary range presented above is based on a 35-hour workweek and would represent a majority of different candidate profiles. However, we encourage candidates who may fall outside of this range to apply as well.

About the role

We are seeking a hands-on Site Reliability Engineer within the Intelligent Operations Department’s SRE & Resiliency team. This role operates across Azure, AWS, GCP, and on‑prem environments, embedded in the broader enterprise resiliency and production reliability strategy. The SRE will function as part of a special investigations unit that empowers and enables Applicative Support, Infrastructure Support, and the Incident Management team—coaching, guiding, and leading investigations into active incidents and proactive reliability improvements. Core responsibilities include deep investigations, advanced observability (OpenTelemetry, Dynatrace, Elastic), auto-healing tooling, SLI/SLO stewardship, and business-aligned reliability reporting.


What you'll do here: 

Incidents & Investigations

  • Lead high‑severity investigations and RCA with App/Infra/Incident teams.

  • Proactively find systemic risks and resilience gaps; drive durable fixes.

  • Run blameless post‑mortems and coach teams.

Observability (OTel, Dynatrace, Elastic)

  • Implement end‑to‑end traces/metrics/logs with consistent semantics.

  • Build insights and anomaly detection; create topology‑aware health models.

  • Integrate synthetics, contract tests, and distributed tracing.

Auto‑Healing & Reliability Tooling

  • Build policy‑driven remediation (circuit breakers, throttling, retries).

  • Enable progressive delivery (blue/green, canary) with safe rollbacks.

  • Provide resilience tooling: validation, safeguards, chaos, DR, runbooks.

SLI/SLOs & Reporting

  • Define user‑centric SLIs/SLOs; enforce error budget policies.

  • Publish reliability reports and scorecards; drive continuous improvement.

Coaching & Leadership

  • Upskill support/incident teams; standardize playbooks and training.

  • Promote automation‑first, data‑driven, resilience culture.

Cloud & Platform Reliability

  • Operate across Azure/AWS/GCP/on‑prem; GLB, DNS, TLS, CDN, failover.

  • Improve K8s/mesh (AKS/EKS/GKE, Istio/Linkerd) and data/streaming resilience.

AI for Reliability

  • Use AI for causal detection/anomalies to cut MTTR.

  • Develop reliability copilots; monitor AI systems for reliability and cost.



What you bring to the table: 

  • 8+ years of experience in SRE/Platform/Infrastructure/Software Engineering operating large-scale production systems across multi-cloud and on‑prem.

  • Strong proficiency in:

    • Observability: OpenTelemetry instrumentation and standards; Dynatrace (Davis AI, SmartScape, service-level analysis, baselining); Elastic/ELK (Beats/Agent, ingest pipelines, ILM, Kibana).

    • Reliability engineering: SLIs/SLOs/SLAs, error budgets, alert strategy, capacity modeling, graceful degradation, circuit breaking, retries/backoff.

    • CI/CD and deployment patterns: blue/green, canary, progressive delivery, automated rollback, pipeline safeguards.

    • Kubernetes and service meshes; platform-level resilience and operability.

    • Data and event systems: replication, snapshots/PITR, CDC, streaming (Kafka, RabbitMQ, Pub/Sub) with DLQs/reprocessing; dependency risk management.

    • Networking and traffic: DNS, load balancers, CDN/edge, TLS/mTLS; fundamentals of BGP and global traffic management.

  • Solid software engineering skills in at least one of: Go, Python, or TypeScript; experience with IaC (Terraform), GitOps (Argo CD/Flux), and policy-as-code.

  • Experience running chaos engineering, game days, and DR exercises; ability to design safe experiments and embed learnings into production hardening.

  • Excellent communication (written, visual, verbal); adept at coaching, leading investigations, and presenting to mixed technical/business audiences.

  • Bilingual (French and English): Need to interact on a regular basis with an English-speaking clientele and colleagues across the country. 

  • No Canadian work experience required however must be eligible to work in Canada 

#LI-Hybrid 

Ce poste jouera un rôle essentiel au sein de notre équipe. | This position will fill an essential role in our team.


We are an equal opportunity employer

At Intact, our Value of respect is founded on seeing diversity as a strength. We strive to create an accessible workplace where employees feel valued, included and encouraged to share their unique perspectives.

We encourage applications from individuals who are members of equity-deserving groups, including but not limited to women, Indigenous peoples, persons with disabilities, Black people, and members of the 2SLGBTQI+ community.

As part of Intact’s commitment to reconciliation, we acknowledge that we work, meet and travel across the land currently called Canada, originally inhabited by First Nations, Metis and Inuit people. This history extends through many centuries and continues to evolve today.

We have policies to ensure equal access and participation for people with disabilities, including providing workplace adjustments (accommodations). A copy of applicable policies is available on request.

If we can provide a specific adjustment to make the recruitment process more accessible for you, please let us know when we reach out about a job opportunity. We’ll work with you to meet your needs.

Learn more about our recruitment process and your candidate journey here.

Please note that Intact does not provide sponsorship or other support for immigration-related matters including but not limited to employer-specific closed work permits. Candidates must be eligible to work in Canada from the anticipated start date and throughout their employment and are solely responsible for maintaining their work eligibility.

If you are an employee of Intact or belairdirect, please apply for this role on Internal Career Site.

Top Skills

AWS
Azure
Dynatrace
Elastic
GCP
Gitops
Go
Kafka
Kubernetes
Opentelemetry
Pub/Sub
Python
RabbitMQ
Terraform
Typescript

Similar Jobs

9 Days Ago
Easy Apply
In-Office or Remote
CA
Easy Apply
Senior level
Senior level
Security • Software
The Senior Site Reliability Engineer will ensure system reliability, implement automation, monitor performance, and collaborate on service objectives and incident responses.
Top Skills: AWSCircleCIGCPGithub ActionsGoGrafanaKubernetesPrometheusPythonTerraform
7 Days Ago
In-Office or Remote
Montréal, QC, CAN
Senior level
Senior level
Software
The Site Reliability Engineer will maintain and optimize the reliability of cloud infrastructure, focusing on automation, observability, and incident management in SaaS environments.
Top Skills: AWSBashDatadogGitlab Ci/CdJavaKubernetesPythonTerraform
19 Days Ago
In-Office
Montréal, QC, CAN
Mid level
Mid level
Machine Learning • Software
As a Site Reliability Engineer, you'll ensure platform stability, scalability, and security, focusing on infrastructure reliability and operational excellence.
Top Skills: AWSDatadogDockerGrafanaKubernetesLinuxPrometheusPulumiTerraform

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account