scoutai

AI Cloud Infrastructure Engineer - Fury Team

Preview — apply on company site for full detailsApply Now

AWSAzureData EngineeringDockerETLGCPKubernetesPyTorchPython

3+ years of experience in ML infrastructure, MLOps, or large-scale data systems

Proven experience with distributed training (PyTorch DDP, DeepSpeed, Ray, or similar) and workflow orchestration (Kubernetes, Airflow, or equivalent)

Strong proficiency in Python and cloud-native infrastructure (AWS, GCP, or Azure)

Deep understanding of data engineering (ETL pipelines, object storage, data versioning, metadata management)

Familiarity with containerization and deployment (Docker, Kubernetes) and monitoring systems (Prometheus, Grafana)

Experience optimizing GPU cluster utilization, scaling training jobs, and profiling model performance

Competitive base salary and bonus

Meaningful equity

Premium medical, dental, and vision plans with $0 paycheck contribution

Competitive PTO and company holiday calendar

Catered lunch daily and fully stocked kitchen

EV charging

Design and implement data pipelines for ingesting, transforming, and storing petabytes of multimodal data from Fury’s robotic and operator systems

Develop internal tooling for dataset exploration, curation, versioning, and quality monitoring over time

Build and maintain distributed training infrastructure (cloud and on-prem) for large-scale multimodal and foundation model training

Implement job orchestration workflows for launching, tracking, and debugging large-scale model runs

Identify and remediate bottlenecks in compute, memory, storage, and network performance to optimize throughput and cost efficiency

Collaborate with AI, autonomy, and systems teams to ensure data and training infrastructure supports real-time and mission-critical use cases