Intelcom- Dragonfly Logo

Intelcom- Dragonfly

Site Reliability Engineer (SRE)

Posted 13 Days Ago
Be an Early Applicant
In-Office
Montréal, QC
Mid level
In-Office
Montréal, QC
Mid level
As an SRE Specialist, you'll manage incidents, automate tasks, optimize performance, and collaborate with development teams to improve applications and systems. Responsibilities include monitoring application health, disaster recovery planning, and promoting a culture of innovation within the team.
The summary above was generated by AI
Intelcom | Dragonfly 

With more than 100 sorting stations and operations across three continents, Intelcom | Dragonfly is Canada’s leader in last-mile logistics. Our vision is clear: to deliver fast, accurate, and reliable service powered by cutting-edge technology. 

 

A Strategic Role at the Heart of Logistics

Responsibilities

  • Incident Management: Detect and respond to issues, ensuring rapid recovery to minimize downtime. Current on-call contributors need better coordination and structure in investigations. This role involves off-hours events, but these are cyclical with quieter periods. Define and implement an escalation process. Ensure the communication and adhesion of all the stakeholders across the business to the process. Document incident reports and conduct post-mortems to promote a continuous improvement approach.
  • Collaboration: Work closely with development and operations teams to ensure smooth deployment and operation of applications. Provide primary operational support and engineering for large-scale distributed software applications. Collaborate with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. This requires a diligent follow-up and close collaboration with all teams.
  • Influence: Create sustainable systems and services through automation and enhancements. Promote a culture of innovation and continuous improvement within the SRE team and the broader organization. Coordinate with the SRE team manager in establishing and executing operational policies that promote agility and scalability. Coordinate and mentor other SRE team members, fostering professional growth and development. Work closely with development and operations teams to ensure smooth deployment and operation of applications.
  • Automation: Automate repetitive tasks to improve efficiency and reduce human errors. Improve the reliability, quality, and time-to-market of our software solutions.  Measure and optimize system performance anticipating business needs.
  • Monitoring and Alerting: Implement and enhance monitoring systems (e.g., Datadog) to track the health and performance of applications and infrastructure. There are existing systems, but additional ones are needed. Monitor and maintain the production environment, ensuring high availability and system health. Gather and process metrics from operating systems and applications to assist in performance tuning and fault finding. Develop an health monitoring dashboard to enable the visibility of our various stakeholders on our production environment.
  • Disaster Recovery: Prepare and implement disaster recovery plans to manage unexpected outages.
  • Performance Optimization: Continuously improve system performance and scalability.
  • Capacity Planning: Ensure the infrastructure can handle current and future demands.
  • Chaos Engineering: Intentionally introduce failures to test system resilience and improve robustness.

Qualifications

  • Bachelor's degree in software engineering, computer science or equivalent.
  • 3+ years experience in cloud management, development and/or SRE responsibilities.
  • Experience in Agile methodology and technical project execution. Knowledgeable in DevOps concepts, AWS, Azure, GCP, observability tools (Datadog, cloudflare), Terraform, PagerDuty and how to integrate all these things together.

Other Skills:

  • Strong initiative and resilience, with a demonstrated ability to explore new ideas and innovative approaches to solving complex problems.
  • Excellent interpersonal and communication skills in both French and English.
  • Be able and comfortable evolving in fast-moving environment.

Schedule: Primarily daytime hours, but on-call availability is required for the initial months to observe and refine existing processes.

Join Our Team

Be part of a dynamic and innovative company at the forefront of the last-mile delivery industry. If you are a strategic thinker, results-driven leader, and passionate about driving business growth, we’d love to hear from you.

Why Join Us? 

At Intelcom | Dragonfly, you’ll thrive in a flexible and stimulating environment, surrounded by passionate talent. You’ll also enjoy a wide range of benefits: 

  • On-site gym with a personal trainer 

  • Employer-provided lunch of your choice 

  • Comprehensive group insurance 

  • Group RRSP plan 

  • Wellness days 

  • Partial reimbursement for public transportation 

  • Employee Assistance Program 

 …and much more. 

 

Diversity & Inclusion 

At Intelcom | Dragonfly, we move forward guided by strong values: collaboration, innovation, excellence, and responsibility. 

We embrace diversity, ensure equity, and foster a true sense of belonging. 

Accommodation measures are available for individuals with disabilities throughout our recruitment process, in compliance with the law. Please let us know if you have any specific needs. 

Top Skills

AWS
Azure
Cloudflare
Datadog
GCP
Pagerduty
Terraform

Intelcom- Dragonfly Montréal, Québec, CAN Office

200-1380 William Street,, , Montréal, Quebec , Canada, H3C 1R5

Similar Jobs

25 Days Ago
In-Office
2 Locations
Senior level
Senior level
Software
The Senior Site Reliability Engineer ensures systems reliability, scalability, and performance for classified government projects, focusing on DevOps methodologies and coding expertise.
Top Skills: ElasticGitGoGrafanaHelmKubernetesLinuxPowershellPrometheusPythonRubyShell ScriptingSplunk
6 Days Ago
In-Office
Montréal, QC, CAN
Senior level
Senior level
Logistics • Transportation
The Senior Site Reliability Engineer will manage incidents, enhance automation, optimize performance, and collaborate with teams for operational efficiency, while mentoring SRE members and developing disaster recovery plans.
Top Skills: AWSAzureCloudflareDatadogGCPPagerdutyTerraform
8 Days Ago
In-Office or Remote
2 Locations
Senior level
Senior level
Fintech • Payments • Financial Services
As the Observability SRE, you will oversee observability, monitoring, and reliability for Flinks' products, ensuring compliance with SLIs/SLOs, managing incident responses, and automating processes while collaborating across teams to implement best practices.
Top Skills: ApmC#ElkGoGrafanaKubernetesOpentelemetryOtelPrometheusPyroscope

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account