K-id
Lead Site Reliability Engineer
Company
Role
Lead Site Reliability Engineer
Location
Job type
Full-time
Posted
8 hours ago
Salary
Job description
ABOUT K-ID
k-ID is the global leader in privacy-first compliance and age verification infrastructure. Recognized as one of TIME’s Best Inventions of 2025, named a Tech Pioneer by the World Economic Forum and a winner of Fast Company’s Next Big Things in Tech, we are building the Age Layer for the internet—the fundamental infrastructure that allows digital platforms to verify age and manage compliance globally without friction.
Our core platform, anchored by the Compliance Development Kit (CDK) and AgeKit, is the trusted engine for the world’s largest game publishers and digital ecosystems. We replace fragmented, manual compliance with a unified API that handles age verification, parental consent, and regulatory logic across 200+ markets. Backed by top-tier venture capital firms like a16z and Lightspeed, k-ID is entering a phase of growth to define the standard for global digital safety.
ABOUT THE ROLE
We are hiring a Lead Site Reliability Engineer and NOC Lead to own production reliability and operational excellence across the platform.
This is a senior role for someone who can lead from the front. You will be responsible for the reliability, availability, observability, and operational maturity of k-ID’s systems, while also leading the Network Operations Center function. That means this person is not just responding to incidents. They are building the systems, processes, tooling, and team standards that make incidents less frequent, less severe, and faster to resolve when they do happen.
This role is more senior than our senior NOC hires. We want someone who can set the operating model for the NOC, raise the technical bar for incident management, partner deeply with engineering leadership, and drive the long term reliability roadmap for the business. You should be comfortable switching between hands on technical work, operational leadership, incident command, and team development.
LOCATION & LANGUAGE
- Location: Singapore
- Languages: Proficiency in English
KEY RESPONSIBILITIES
- Own the reliability and operational health of k-ID’s production systems and critical services
- Lead the NOC function, including shift structure, escalation paths, incident handling standards, readiness processes, and operational reporting
- Act as the senior escalation point for major incidents and serve as incident commander for high severity events when needed
- Design and improve monitoring, alerting, and operational tooling so the NOC can detect issues early and respond effectively
- Drive root cause analysis and post incident review practices that produce real corrective action rather than superficial summaries
- Partner with engineering teams to improve system resilience, deployment safety, service ownership, and production readiness
- Identify systemic risks across infrastructure, services, dependencies, and operational processes, then drive plans to reduce them
- Improve platform performance, availability, and recovery time through architecture changes, better automation, and stronger operating discipline
- Build and maintain runbooks, readiness checklists, service health standards, and escalation playbooks across the organization
- Help define service level objectives, operational metrics, and reliability targets that align with business needs
- Support and mentor senior NOC engineers and other operations team members, helping raise technical depth and decision quality across the function
- Contribute hands on to infrastructure and reliability engineering work where needed, especially in high leverage areas
QUALIFICATIONS
- 7 or more years of experience in site reliability engineering, infrastructure engineering, platform engineering, or software engineering with significant production ownership
- Strong experience operating production systems in AWS
- Strong hands on experience with Kubernetes, containerized services, and modern infrastructure tooling
- Experience building and improving observability across metrics, logs, tracing, alerting, and service health
- Deep understanding of distributed systems, service failure modes, traffic management, capacity planning, and recovery design
- Experience designing or running incident response programs, on call operations, escalation frameworks, and post incident review processes
- Experience leading or managing NOC, production operations, or support functions in a high availability environment
- Strong experience with infrastructure as code such as Terraform
- Experience improving CI and CD workflows, release safety, rollback practices, and change management
- Ability to write code or automation in one or more languages such as Go, Python, or TypeScript
- Strong written and verbal communication skills, especially in high pressure operational settings
- Experience working in fast moving startup environments is strongly preferreded
BENEFITS
COMPETITIVE SALARY
- A competitive startup salary aligned with experience and market benchmarks.
- Employee Stock Ownership Plan so you participate directly in the long term upside of the company.
HEALTH AND WELLBEING
- Comprehensive family health coverage, including medical, dental, and vision benefits
- Provided Mental Health and Wellness support benefit
PROFESSIONAL DEVELOPMENT
- Hands on exposure with key clients in a scaling global tech company
- Opportunities for continuous learning through real ownership rather than formal training alone.
- Direct collaboration with the Founders and the tech leadership team
CULTURE AND WAYS OF WORKING
- A collaborative, inclusive and low politics work environment.
- Flexible, trust based working culture shaped by a US startup operating model.
- A mission driven company focused on improving online experiences for kids and teens globally.
Applicants Privacy Policy https://k-id.com/job-applicants-privacy-notice
Explore more
Similar jobs
Senior Site Reliability Engineer
K-id
Senior Site Reliability Engineer, Spend
Airwallex
Staff Site Reliability Engineer
Diligentcorporation
Test and Validation Engineer II - HWIC
Globalhr
Principal Reliability Engineer
Globalhr
Senior Site Reliability Engineer
Dealertire