clickhouse
Senior Site Reliability Engineer- Remote
At a Glance
- Location
- United States
- Work Regime
- remote
- Experience
- 8+ years
- Posted
- 2026-03-12T05:49:32-04:00
Key Requirements
Required Skills
Domain Knowledge
- Automation
- Cloud
- Engineering
Requirements
Previous experience using ClickHouse in production.
Hands on experience with Go and/or Python.
Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
Hands on experience with container orchestration tools such as Kubernetes or Docker Swarm.
Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
Responsibilities
Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents.
Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.
Continuously improve the reliability and performance of our ClickHouse services.
Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.