Coveo Logo

Coveo

Site Reliability Engineer - SRE

Reposted 15 Days Ago
Be an Early Applicant
Québec, QC
Mid level
Québec, QC
Mid level
The Site Reliability Engineer will enhance system reliability through automation, incident management, and operational efficiency improvements, while collaborating with developers and driving proactive performance monitoring.
The summary above was generated by AI

Driving system reliability through automation, monitoring, and proactive incident management.

At Coveo, the Site Reliability Engineer (SRE) for the Index team will focus on improving operational efficiency and automating manual tasks. The SRE's workload will be evenly split, with 50% dedicated to operational tasks such as troubleshooting, debugging, and communication, and 50% focused on improvements, including automation, tool development, dashboard creation, and documentation.
Ultimately, the SRE will help foster a proactive, collaborative culture that anticipates and addresses performance issues

Here is a glimpse at your responsibilities:

  • Define critical KPIs to monitor system health, and develop centralized dashboards for real-time visibility and performance tracking.
  • Act as the first line for all unplanned requests, maintain awareness of incoming tasks, and ensure progress tracking and proactive communication. Present maintenance summaries at each sprint review.
  • Identify common debugging workflows and create runbooks, tools, and dashboards to streamline the debugging process, improving resolution speed and system predictability.
  • Design and implement automation solutions to reduce manual interventions, and collaborate with developers to improve operational efficiency.
  • Propose and manage system limits to enhance predictability and prevent issues, and identify tools or processes to proactively address customer performance challenges.
  • Establish regular syncs with the Support team to align on priorities, gather feedback, and ensure visibility into ongoing efforts and challenges.

Here is what will qualify you for the role:

  • Solid technical knowledge of programming and scripting, particularly with Python.
  • Strong analytical and problem-solving skills.
  • Great communications skills and the desire to connect with developers and stakeholders.

What would make you stand out :

  • Ability to evaluate the broader impact of actions, balancing innovation with caution.
  • Experience with cloud based distributed systems.

Do you think you can bring this role to life? 

You don’t need to check every single box; passion goes a long way and we appreciate that skillsets are transferable.

Send us your application, we want to get to know you! Join the Coveolife! 

We encourage all qualified candidates to apply regardless of, for example, age, gender, disability, gaps in CV, national or ethnic background. We know that applying for a new role is a lot of work and we really appreciate your time.


#li-hybrid

Top Skills

Cloud Based Distributed Systems
Python

Similar Jobs

2 Hours Ago
Hybrid
Montréal, QC, CAN
Entry level
Entry level
Agency • Digital Media • eCommerce • Professional Services • Software • Analytics • Consulting
Bounteous seeks a Site Reliability Engineer to create innovative solutions by partnering with clients and utilizing technical expertise. This role emphasizes collaboration and driving digital transformation.
Yesterday
Montréal, QC, CAN
Senior level
Senior level
Artificial Intelligence • Software
As a Site Reliability Engineer, you will design and implement an Internal Development Platform, collaborating with stakeholders to enhance developer conveniences and automate tasks using Python and Golang, while managing GCP and AWS infrastructure.
Top Skills: AWSGCPGitlabGoJenkinsPython
8 Days Ago
Montréal, QC, CAN
Senior level
Senior level
Cloud • eCommerce • Payments • Sales • Software
The Senior Site Reliability Engineer is responsible for collaborating with teams to enhance software delivery processes, designing scalable cloud infrastructure, managing systems, and advocating for best practices in automation and reliability. The role involves responding to production incidents and ensuring high availability of services.

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account