Huawei Canada Logo

Huawei Canada

Researcher - Reinforcement Learning

Reposted 4 Days Ago
Be an Early Applicant
In-Office
Montréal, QC, CAN
Expert/Leader
In-Office
Montréal, QC, CAN
Expert/Leader
The role involves advancing reinforcement learning techniques for LLMs, including designing training pipelines and evaluating agentic behaviors, contributing to scientific publications.
The summary above was generated by AI

Huawei Canada has an immediate 12-month contract opening for a Reinforcement Learning Researcher.


About the team:

Founded in 2012, the Noah’s Ark lab has evolved into a prominent research organization with notable achievements in academia and industry. The lab’s mission focuses on advancing artificial intelligence and related fields to benefit the company and society. Driven by impactful, long-term projects, the aim is to enhance state-of-the-art research while integrating innovations into the company's products and services, including LLMs, RL, NLP, computer vision, AI theory, and Autonomous driving.

About the job:

  • Enabling Large Language Models (LLMs) to learn from experience, interaction, and environment feedback, moving beyond static fine-tuning toward continual, agentic self-improvement.

  • LLM post-training paradigms (e.g., RLHF, GRPO, reward-free methods, etc.).

  • Agentic reinforcement learning for tool-using and browsing-based LLMs trained in interactive environments.

  • Agentic evaluation and benchmarking, including design of multi-turn, verifiable reasoning tasks.

  • Your work will involve implementing and evaluating new training and evaluation pipelines for reasoning-enhanced LLMs and tool-using agents, scaling experiments on large GPU clusters, and contributing to scientific insights and publications in this emerging area.

About the ideal candidate:

  • PhD degree in Computer Science or related fields or master's degree with comparable experience.

  • Strong foundation in deep learning, including architectures such as Transformers and optimization techniques for large models.

  • Practical or research experience in reinforcement learning, self-supervised learning, or language model fine-tuning.

  • Proven research record in AI by having at least one paper as the first author in top tier venues, such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ICRA.

  • Solid proficiency in Python and experience with PyTorch, DeepSpeed, Megatron and other distributed training frameworks.

  • Familiarity with LLM post-training pipelines (RLHF, GRPO/PPO, SFT, LoRA, MoE, etc.) is an asset.

  • Experience with multi-agent RL, tool-use / browser/coding agents, is an asset.

  • Strong communication and writing skills; enthusiasm for open research and collaborative problem-solving.

Huawei aims to support a French-speaking work environment for its employees in Quebec. We have taken steps to avoid requiring a language other than French for this position. However, proficiency in English is essential for this role for the following reasons:

The person will be required to communicate regularly with colleagues located outside Quebec, where English is the primary language used for communication between offices. In addition, the nature of the tasks related to this position, which falls within a highly specialized field of artificial intelligence, also requires knowledge of English.

Similar Jobs

An Hour Ago
In-Office
Senior level
Senior level
Food • Retail • Agriculture • Manufacturing
Lead enterprise change enablement by defining frameworks, governance, and playbooks; provide OCM leadership for high-impact programs; design adoption journeys, stakeholder strategies, and sponsor coaching; build global training and champion networks; integrate change portfolio and analytics to measure readiness, adoption, and value realization.
Top Skills: AcmpAdkarAgileLeanProsci
An Hour Ago
In-Office
Senior level
Senior level
Food • Retail • Agriculture • Manufacturing
Lead global supply chain network design and intercompany operations through scenario modeling, digital twin analytics, and business case development. Optimize footprint, manage InterCo flows, ensure compliance, drive process improvements, and mentor staff while collaborating cross-functionally to support growth, resiliency, and cost/service trade-offs.
3 Hours Ago
Hybrid
Junior
Junior
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Perform full-cycle accounts payable tasks including invoice review and entry, vendor statement reconciliation, supplier communication, payment processing (EFT/ACH/cheque/wire), month-end accruals and reconciliations, corporate card and expense coding, filing and archive, and general finance administrative support. Work closely with internal teams to resolve discrepancies and maintain accurate AP records and reporting.
Top Skills: ExcelMS Office

What you need to know about the Montreal Tech Scene

With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.

Key Facts About Montreal Tech

  • Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
  • Major Tech Employers: SAP, Google, Microsoft, Cisco
  • Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
  • Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
  • Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
  • Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account