xai
Site Reliability Engineer - US Government
At a Glance
- Location
- Palo Alto, CA; Washington, D.C.
- Experience
- 5+ years
- Compensation
- one week. Annual Salary Range $180,000 - $440,000 USD Benefits Base salary is j
- Posted
- 2026-03-03T19:52:08-05:00
Key Requirements
Required Skills
Certifications
- CISSP
Domain Knowledge
- Automation
- Engineering
- Government
Requirements
Active Top Secret (TS) security clearance.
5+ years of experience as an Infrastructure Engineer, Site Reliability Engineer, or similar role, with a focus on building and maintaining reliable, scalable systems, preferably in secure or government environments.
Proficiency in managing storage infrastructure with IaC tools such as Pulumi, Terraform, or Ansible.
Deep understanding of the Kubernetes stack, including CNI, CRI, CSI, and related components.
Demonstrated ability to improve system reliability through incident management, postmortems, and defining SLAs/SLOs.
Excellent communication and documentation skills, with the ability to handle sensitive information concisely and accurately.
Compensation & Benefits
$180,000 - $440,000 USD
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer. For details on data processing, view our
Recruitment Privacy Notice
.
Responsibilities
We are seeking a highly skilled Senior Infrastructure Engineer to join our US Government Team, focused on designing, building, and operating secure, scalable infrastructure for critical government projects. In this role, you will develop and manage training and inference clusters, as well as highly reliable applications, across bare metal, classified cloud, and hybrid cloud architectures. You will leverage your expertise in Kubernetes and GPU hardware to deliver robust, secure systems that support large-scale AI workloads while meeting stringent federal compliance requirements. This role demands a passion for automation, observability, and ensuring system integrity in a fast-paced, high-security environment.
Develop and optimize software to provision and manage xAI’s infrastructure across on-premise, virtual machine, and classified cloud environments, enabling efficient scaling for US government initiatives.
Enhance the reliability, performance, and cost-effectiveness of infrastructure to support large-scale AI and application workloads in secure, classified settings.
Collaborate with xAI engineers to understand workload requirements and design tailored solutions that meet government-specific needs and compliance standards.
Implement robust observability, monitoring, and security practices to ensure the integrity, availability, and confidentiality of critical systems, adhering to federal protocols.