Gauss Labs Logo

Gauss Labs

Site Reliability Engineer (Vancouver)

Posted 10 Days Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Vancouver, BC
Senior level
In-Office or Remote
Hiring Remotely in Vancouver, BC
Senior level
As a Site Reliability Engineer at Gauss Labs, you will ensure system reliability and performance through monitoring, automation, and incident responses while collaborating with various teams to optimize operations.
The summary above was generated by AI
Gauss Labs is seeking a highly skilled Site Reliability Engineer to join our team in Vancouver. As an SRE at Gauss Labs, you will play a critical role in ensuring our industrial AI platform's reliability, performance, and scalability. You will be responsible for building and maintaining a robust solution that supports our growing business at customer sites. This role requires a high level of technical expertise, a collaborative mindset, and a strong desire to continuously improve systems and processes.

Responsibilities

  • Monitoring and Alerting: Creating and maintaining robust monitoring systems to proactively identify and resolve issues before they impact customers. Implementing effective alerting mechanisms to ensure timely response to critical events.
  • Incident Response: Participating in on-call rotations and leading incident response efforts to minimize downtime and restore service quickly.
  • Automation: Developing and implementing automation tools and scripts to streamline operations, reduce manual effort, and improve efficiency.
  • Capacity Planning: Forecasting resource needs, optimizing resource utilization, and ensuring customers' infrastructure can handle increasing workloads.
  • Performance Optimization: Identifying and resolving performance bottlenecks, optimizing system performance, and improving response times.
  • Collaboration: Partnering with software engineers, data scientists, and other teams to ensure alignment and efficient operations.
  • Customer Focus: Working closely with the AI Program Manager and Technical Account Manager to understand customer issues, provide technical support, and improve customer satisfaction.
  • Continuous Improvement: Driving a culture of continuous improvement by identifying opportunities to enhance system reliability, performance, and efficiency.

Basic Qualifications

  • Bachelor's degree in computer science, engineering, or a related discipline
  • 5+ years of industry experience as a Site Reliability Engineer
  • Experience with cloud platforms (AWS, GCP, Azure), containerization technologies (Docker, Kubernetes), observability and alerting tools (Prometheus, Grafana, ElasticSearch, Jaeger)
  • Experience with scripting languages (Python, Bash)
  • Working knowledge of Github, Github actions, CI/CD concepts
  • Experience in ticket management, issue resolution, and troubleshooting
  • Strong problem-solving and troubleshooting skills
  • Excellent customer communication and interpersonal skills, fluency in verbal and written English

Preferred Qualifications

  • Knowledge of AI/ML infrastructure and workloads
  • Knowledge of big data technologies (Kafka, Flink)
  • Knowledge of database technologies (MongoDB, PostgreSQL)

[Hiring process]
Application review - Phone interview - Virtual onsite interview - VP interview/Core Value interview

Top Skills

AWS
Azure
Bash
Ci/Cd
Docker
Elasticsearch
Flink
GCP
Git
Grafana
Jaeger
Kafka
Kubernetes
MongoDB
Postgres
Prometheus
Python

Similar Jobs

3 Hours Ago
Remote or Hybrid
Canada
Senior level
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
As a Principal Systems Integration Consultant, you'll guide integrations and system designs, ensuring seamless functionality while mentoring junior staff.
7 Hours Ago
Easy Apply
Remote or Hybrid
Canada
Easy Apply
Junior
Junior
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
As a Customer Success Manager at Samsara, you'll work with customers to enhance their operations using IoT solutions, collaborating cross-functionally and ensuring their success and satisfaction with the product.
Top Skills: Iot
7 Hours Ago
Remote
Canada
Senior level
Senior level
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
Lead strategic initiatives for Coupa's enterprise clients, facilitating workshops and creating deliverables that align business objectives with spend management capabilities. Influence procurement strategies and ensure value realization from Coupa's platform.

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account