clickhouse

Senior Site Reliability Engineer- Remote

Apply Now

At a Glance

Location
United States
Work Regime
remote
Experience
8+ years
Posted
2026-03-12T05:49:32-04:00

Key Requirements

Required Skills

AWSAzureDockerGCPKubernetesPythonSQLTerraform

Domain Knowledge

  • Automation
  • Cloud
  • Engineering

Requirements

Previous experience using ClickHouse in production.

Hands on experience with Go and/or Python.

Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.

Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.

Hands on experience with container orchestration tools such as Kubernetes or Docker Swarm.

Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.

Responsibilities

Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.

Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.

Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents.

Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.

Continuously improve the reliability and performance of our ClickHouse services.

Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.