CrowdStrike
Sr. Problem Management Engineer – Engineering Service Management (Remote)
As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day. We have 3.44 PB of RAM deployed across our fleet of C* servers - and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.
About the Role:
We are seeking a Senior Engineering Problem Manager to lead the transformation of our Problem Management Engineering function. This strategic role will focus on embedding resilient, automated, and intelligent problem management practices into our engineering, operations, and platform ecosystems. You will be responsible for building technical integrations, leveraging AI/ML for advanced root cause analysis, and driving a culture of continuous learning and operational excellence.
You’ll lead end-to-end delivery of initiatives that reduce incident recurrence, improve service stability, and create measurable business value — with a strong focus on automation, governance, and DevOps alignment.
What You'll Do:
Design and implement modern problem management workflows, tightly integrated into engineering and operations toolchains.
Lead the governance of key problem management deliverables including post-incident action tracking, known error records, and systemic remediation.
Drive continuous evolution of a structured retrospective process that promotes learning and resilience engineering.
Partner with platform, SRE, and observability teams to automate known error workarounds, temporary fixes, and proactive health checks.
Utilize AIOps and ML-driven tooling to correlate events, detect patterns, and identify root causes more effectively.
Work closely with business units and product teams to perform business impact analysis and prioritize problem resolution based on value and risk.
Integrate post-incident review outcomes into continuous improvement loops, product backlogs, and technical roadmaps.
Maintain and evolve the tooling ecosystem supporting problem management, including dashboards, knowledge repositories, and workflows.
Act as a coach and change agent to promote a culture of accountability, proactive risk reduction, and shared ownership of reliability.
Key Focus Areas:
Retrospective Process Management: Facilitate structured reviews and systemic RCA that drive long-term improvements.
Automation of Known Errors & Workarounds: Reduce manual overhead through scripts, workflows, and proactive detection.
AI-Augmented Root Cause Analysis: Integrate ML models and historical telemetry to improve diagnostic speed and accuracy.
Post-Incident Governance: Ensure action items are documented, assigned, and driven to closure with cross-functional visibility.
Business Impact Analysis: Collaborate with stakeholders to prioritize recurring problems based on cost, customer experience, and risk.
Toolchain Integration: Seamlessly embed problem management into DevOps tools (e.g., Jira, ServiceNow, PagerDuty, GitHub).
What You'll Need:
8+ years of experience in Engineering Operations, DevOps, Service Management, Platform/SRE Engineering.
Strong understanding of ITSM, particularly Problem, Incident, and Change Management.
Experience managing or building post-incident processes, RCAs, and follow-through governance models.
Proven ability to automate operational workflows and known error processes using scripting or platform tooling.
Proficiency with observability platforms and AIOps tools (e.g., Datadog, Splunk, New Relic, Moogsoft, or similar).
Exceptional collaboration and communication skills across technical and non-technical stakeholders.
Data-driven mindset with the ability to perform root cause trend analysis and report on service health metrics.
Experience working in DevOps, cloud-native, or agile environments.
Preferred Qualifications:
Experience with structured problem-solving methodologies (e.g., 5 Whys, Fishbone, Fault Tree).
Familiarity with knowledge management systems, runbooks, and self-healing infrastructure practices.
Background in software engineering, platform reliability, or infrastructure automation.
Certifications in ITIL, SRE, Agile, or SAFe frameworks.
#LI-LY1
#LI-Remote
#HTF
This role will require the candidate to periodically undergo and pass additional background and fingerprint check(s) consistent with government customer requirements.Benefits of Working at CrowdStrike:
Remote-friendly and flexible work culture
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Resource Groups, geographic neighbourhood groups and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.
CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.
If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at [email protected] for further assistance.
Find out more about your rights as an applicant.
CrowdStrike participates in the E-Verify program.
Notice of E-Verify Participation
Right to Work
CrowdStrike, Inc. is committed to equal pay for equal work in its compensation practices. The base salary range for this position in the U.S. is $155,000 - $255,000 per year + variable/incentive compensation + equity + benefits. A candidate's salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications, job level, supervisory status, and location.Top Skills
Similar Jobs at CrowdStrike
What you need to know about the Montreal Tech Scene
Key Facts About Montreal Tech
- Number of Tech Workers: 255,000+ (2024, Tourisme Montréal)
- Major Tech Employers: SAP, Google, Microsoft, Cisco
- Key Industries: Artificial intelligence, machine learning, cybersecurity, cloud computing, web development
- Funding Landscape: $1.47 billion in venture capital funding in 2024 (BetaKit)
- Notable Investors: CIBC Innovation Banking, BDC Capital, Investissement Québec, Fonds de solidarité FTQ
- Research Centers and Universities: McGill University, Université de Montréal, Concordia University, Mila Quebec, ÉTS Montréal