Hyphen Connect Limited
LLM Pre-training & Distributed Engineer (AI Infrastructure)
Be an Early Applicant
Design, orchestrate, and optimize large-scale LLM pre-training across 1,000+ GPUs. Implement 3D parallelism, manage GPU clusters (SLURM/Kubernetes), optimize InfiniBand/RDMA networking and memory, and automate checkpointing and failure recovery for long training runs.
We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.
Responsibilities:
- Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM.
- Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.
- Automate checkpointing and failure recovery during month-long training runs.
Required Skills:
- Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
- Experience managing SLURM or Kubernetes-based GPU clusters.
- Strong systems engineering background (C++, CUDA, Python).
Similar Jobs
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Lead global M&A, investments, and post-closing tax integration/compliance. Partner with internal and external stakeholders on structuring, perform tax modeling (e.g., Sections 382/383), research complex tax issues, manage income tax audits, and support Treasury, state planning, and special tax projects.
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Lead and own a portfolio of issuer and processor partner relationships end-to-end. Drive partner onboarding, technical integration, BIN setup, approvals, pilot and GA readiness. Coordinate cross-functional stakeholders, define milestones, mitigate risks, build governance/SLAs/incident protocols, and advise leadership on partner strategy and regulatory changes. Engage executive partners and use AI tooling to streamline reporting and diligence.
Top Skills:
AIBinLlmMastercardVisa
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Outbound senior account executive focused on restaurant SMBs. Build pipeline via cold outreach, prospecting, discovery, demos, and field visits. Close new-logo deals selling the Square ecosystem, partner with Business Development, Product and Marketing, use Salesforce to track activity, and exceed monthly sales targets and KPIs.
Top Skills:
SalesforceSquare
What you need to know about the Montreal Tech Scene
With roots dating back to 1642, Montreal is often recognized for its French-inspired architecture and cobblestone streets lined with traditional shops and cafés. But what truly sets the city apart is how it blends its rich tradition with a modern edge, reflected in its evolving skyline and fast-growing tech industry. According to economic promotion agency Montréal International, the city ranks among the top in North America to invest in artificial intelligence, making it le spot idéal for job seekers who want the best of both worlds.
Key Facts About Montreal Tech
- Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
- Major Tech Employers: SAP, Google, Microsoft, Cisco
- Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
- Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
- Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
- Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal

