Senior Program Manager - Incident / Problem Management
As a Senior Program Manager, you will be responsible for driving initiatives that will deliver high standards for reliability and availability for MathWorks Online Products. You will drive the problem management process to identify root causes and implement countermeasures to prevent incidents, achieve our SLOs/SLAs, and meet our operational quality goals that are strategic to the success of our Online Products. You will partner with Product Owners, Developers, Platform Engineering/DevOps, and Site Reliability Engineers to define and implement tools, processes, standards, and best practices to plan, build and run highly reliable Online Products.
- Conduct incident post-mortems and retrospectives after every major incident and create incident close-out reports. Collaborate with incident managers and Development teams to identify action items and track them to closure
- Facilitate root cause analysis to identify countermeasures to prevent similar incidents
- Perform trend analysis on problems, root causes, and countermeasures and identify patterns and themes. Report out the analysis and recommendations to address those problem themes to management
- Define and implement tools and processes for incident and problem management. Monitor effectiveness of those tools and processes, identify opportunities for improvement, and lead the effort to design and implement them
- Monitor ageing problems and countermeasures. Follow up with teams and escalate as needed and ensure completion in a timely manner. Once countermeasures are implemented, validate that the problem is solved and share learnings as applicable
- Proactively identify risks and issues; define and implement mitigation strategies
- A bachelor's degree and 7 years of professional work experience (or a master's degree and 5 years of professional work experience, or a PhD degree, or equivalent experience) is required.
- Strong problem-solving skills. Experience with facilitating root cause analysis and similar problem-solving techniques
- Experience in defining and managing incident management and problem management tools and processes
- Knowledge and application of Site Reliability Engineering, Platform Engineering, and DevOps framework and concepts like Observability, Reliability, Availability, and Performance
- Experience with managing cross-organizational programs focused on building and running highly available and reliable online/SaaS products
- Ability to influence others even when you do not have direct authority over them
- Ability to communicate effectively, both oral and written with senior management
- Experience using work management and collaboration tools like JIRA, Confluence, SharePoint, and Microsoft Teams
It’s the chance to collaborate with bright, passionate people. It’s contributing to software products that make a difference in the world. And it’s being part of a company with an incredible commitment to doing the right thing – for each individual, our customers, and the local community.
MathWorks develops MATLAB and Simulink, the leading technical computing software used by engineers and scientists. The company employs 5000 people in 16 countries, with headquarters in Natick, Massachusetts, U.S.A. MathWorks is privately held and has been profitable every year since its founding in 1984.
The MathWorks, Inc. is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, and other protected characteristics. View The EEO is the Law poster and its supplement.
The pay transparency policy is available here.
MathWorks participates in E-Verify. View the E-Verify posters here.