clickhouse

Senior Site Reliability Engineer- Remote

Apply Now

At a Glance

Location
Canada
Work Regime
remote
Experience
8+ years
Posted
2026-03-13T02:05:10-04:00

Key Requirements

Required Skills

AWSAzureDockerGCPKubernetesPythonSQLTerraform

Domain Knowledge

  • Automation
  • Cloud
  • Engineering

Requirements

Bachelor’s or Master’s degree in Computer Science or a related field.

At least 8 years of experience in Site Reliability Engineering or a related field.

Hands-on experience with Go and/or Python.

Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.

Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.

Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.

Responsibilities

We are committed to providing our customers with reliable and secure services so we are expanding our central Site Reliability Engineering team. You will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of our cloud infrastructure. You will collaborate with different teams like Control Plane, Data Plane, Core, Security, Support and Operations and guide them to design and implement scalable, secure, highly available and fault-tolerant distributed systems. You will also own the areas of incident management and response, post-mortem analysis including running blameless postmortems, and continuous improvement of our Cloud services. You will be leveraging your software engineering expertise to develop software platforms and tools to optimize the operational and engineering efficiencies of ClickHouse Cloud. This role is a unique opportunity to make a significant impact on our elastic, limitless scale, high-performance ClickHouse Cloud.

Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.

Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.

Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane,ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents.

Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.