Incident Response Manager - San Jose

San Jose

Apply now Apply later

Responsibilities

TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and its offices include New York, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.

About the team
The Data Systems Infrastructure (DSI) team sits within the ByteDance global technology structure and supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services, making sure they are scalable and are reliable.

The Incident Response Center (IRC) is the first layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conducting thorough investigation of alerts, classification and triage. The Incident Response Manager is responsible for delivering operations within the IROC across all ByteDance datacenter sites in the respective regions. IRC team is expected to respond to all alarms/alerts set in Server Automation Operations System (SAOS), Data Center Infrastructure Management (DCIM) to quickly discover anomalies and engage Subject Matter Expert (SME) teams to start issue triage. The IRC team provides business intelligence through rigorous analysis of alerts and issues which reduce and prevent recurring incidents .

Responsibilities
- Delivering global operations within the IROC (Incident Response Operation Center) ByteDance datacenter.
- First responder and layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conduct thorough investigation of alerts, classification and triage.
- Respond to all infrastructure, facilities, security, and safety events notified via various means, such as alarms/alerts set in Server Operations and Maintenance, Datacenter Infrastructure Management, Network & Grafana, and other functions.
- Respond to incidents and critical situations in a problem-solving manner, and conduct in-depth investigation of alerts.
- Provide insights into the effectiveness of the incident response and recovery process through regular reports
- Analyze trends and patterns in events to identify opportunities for improvement and optimization
- Monitor the performance of incident response against the agreed-upon SLAs by alerting and notifying stakeholders
- Escalation Management notifying or initiating discussions with higher-level support teams engaging in resolution processes
- Identify, assess and communicate potential risks arising through event monitoring that could affect customer's service
- Support program managers and facilitate project deliverables, improve overall operational security and engineering initiatives
- The Incident Response team is expected to work at ByteDance datacenter site. This is an on-site role.

Qualifications

Minimum Qualifications
- Knowledge of technical elements associated with systems such as Server Health, Datacenter Environment and IP Networks.
- Outstanding verbal and written communication skills required, work with minimal direction, meeting goals, attention to details and an eye for continuous improvements.

Preferred Qualifications
- Degree in Information Technology.
- 5 years experience in service center, or similar 24x7 operations center environment.
- 3 years of experience in a technology company or experience as a team lead, and experience in operation program management.
- 5 years experience as an incident and problem manager.
- Good data analytics and presentation skills.
- Ability to successfully interact at all levels of the organization, including with clients, while functioning as a team player.
- Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure.
- Working knowledge and/or certifications in ITIL, CompTIA Server+, Schneider Electric Data Center Certified Associate (DCCA), Data Analytics and Visualization.
- Willingness to be on call including weekends, nights, and holidays.
- Works well under pressure and within time constraints to solve problems and complete deliverables.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://shorturl.at/cdpT2

#LI-MZ3

Job Information

【For Pay Transparency】Compensation Description (annually)

The base salary range for this position in the selected city is $109600 - $218400 annually.​

Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.​

Our company benefits are designed to convey company culture and values, to create an efficient and inspiring work environment, and to support our employees to give their best in both work and life. We offer the following benefits to eligible employees: ​

We cover 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents and offer a Health Savings Account(HSA) with a company match. As well as Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans. In addition to Flexible Spending Account(FSA) Options like Health Care, Limited Purpose and Dependent Care. ​

Our time off and leave plans are: 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year as well as 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability. ​

We also provide generous benefits like mental and emotional health benefits through our EAP and Lyra. A 401K company match, gym and cellphone service reimbursements. The Company reserves the right to modify or change these benefits programs at any time, with or without notice.​

For Los Angeles County (unincorporated) Candidates:​

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:​

1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;​

2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and​

3. Exercising sound judgment.​

Apply now Apply later
Job stats:  0  0  0

Tags: Analytics Automation Business Intelligence Cloud CompTIA Data Analytics GDPR Grafana Incident response ITIL Monitoring SLAs

Perks/benefits: 401(k) matching Career development Equity / stock options Fitness / gym Flex hours Flexible spending account Flex vacation Health care Insurance Medical leave Parental leave Team events Transparency

Region: North America
Country: United States

More jobs like this

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.