We are seeking a hands-on and forward-thinking AI Infrastructure Engineer to help build and operate the intelligent systems that power Xsolla's infrastructure. As part of our Infrastructure Team, you will implement AI-driven solutions across cloud optimization, security, automation, and developer support — helping us shift from manual and reactive operations to predictive, self-optimizing infrastructure management.
The ideal candidate brings solid infrastructure engineering experience combined with practical knowledge of AI/ML integration. You are comfortable working with LLMs, ML pipelines, and AI automation frameworks, and you know how to apply them to real operational problems at scale. You thrive in environments that require both technical depth and the ability to experiment, iterate, and deliver.
If you're passionate about using AI to transform how infrastructure is built and operated — and want to be part of a team that is driving that transformation at a global gaming company — we'd love to hear from you.
ABOUT USXsolla is a global commerce company with robust tools and services designed to help developers solve the inherent challenges of the video game industry. From indie to AAA, companies partner with Xsolla to help them fund, distribute, market, and monetize their games. Grounded in the belief in the future of video games, Xsolla is resolute in the mission to bring opportunities together, and continually make new resources available to creators. Headquartered and incorporated in Los Angeles, California, Xsolla operates as the merchant of record and has helped over 1,500+ game developers to reach more players and grow their businesses around the world.
For more information, visit xsolla.com.
Responsibilities:
- Design and implement AI/ML-powered solutions for infrastructure use cases, including predictive autoscaling, anomaly detection, intelligent cost optimization, and automated remediation across GCP and multi-cloud environments
- Build and maintain AI-driven monitoring and observability systems that correlate logs, metrics, and traces to surface root causes, predict bottlenecks, and reduce mean time to resolution (MTTR)
- Develop and operate automated incident response workflows using AI-powered playbooks that diagnose, contain, and resolve infrastructure issues with minimal manual intervention
- Integrate AI tooling into CI/CD pipelines to improve deployment reliability, automate test prediction, score release health, and support rollback automation
- Contribute to the development of internal AI agents and virtual assistants integrated into developer workflows (Slack, IDEs, Confluence) — enabling self-service for provisioning, troubleshooting, and infrastructure guidance
- Implement AI/ML-based anomaly detection and automated vulnerability management workflows to enhance the security posture of Xsolla's infrastructure
- Prototype and productionize Generative AI solutions for infrastructure automation, including auto-generation of Terraform/Puppet modules, IaC configurations, runbooks, and change documentation
- Collaborate with senior engineers and leadership to evolve and execute the infrastructure AI strategy across its implementation phases
- Maintain clear documentation of AI tools, integrations, and automated workflows; share knowledge and best practices across the team
Qualifications:
- 5–7 years of experience in infrastructure engineering, DevOps, SRE, or a related field
- Hands-on experience with GCP (priority) and/or AWS; solid understanding of cloud resource management, scaling, and cost structures
- Practical experience building or integrating AI/ML-powered tools in an operational context (anomaly detection, predictive models, LLM-based automation, or similar)
- Experience with infrastructure-as-code tools — Terraform, Puppet, Ansible, or equivalent
- Proficiency in Python for scripting, automation, and AI/ML integration; Bash or Go a plus
- Working knowledge of Kubernetes and container orchestration in production environments
- Familiarity with observability and monitoring stacks (Prometheus, Grafana, ELK, Datadog, or similar)
- Familiarity with LLM APIs (OpenAI, Anthropic, or similar) and prompt engineering for operational use cases
- Strong problem-solving mindset with a bias toward automation and eliminating toil
- Fluent in English (written and verbal)
Nice To Have:
- Experience with AI workflow orchestration frameworks (LangChain, LlamaIndex, n8n, or similar)
- Exposure to AIOps platforms (Dynatrace, Datadog AI, Moogsoft, BigPanda, or similar)
- Background in FinOps or AI-driven cloud cost optimization
- Familiarity with vector databases (Weaviate, Pinecone, Qdrant) for knowledge retrieval systems
- Experience with VMware or hybrid cloud environments
- GCP and/or AWS cloud certifications
- Prior experience in gaming, high-growth tech, or SaaS platform environments
- The duties and responsibilities of this position may evolve over time to support the organization's goals and individual growth
Top Skills
Similar Jobs
What you need to know about the Montreal Tech Scene
Key Facts About Montreal Tech
- Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
- Major Tech Employers: SAP, Google, Microsoft, Cisco
- Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
- Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
- Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
- Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal



