braze

Platform Support Engineer

Preview — apply on company site for full detailsApply Now

At a Glance

Location: São Paulo
Experience: 1–3 years
Posted: 2026-02-24T11:07:20-05:00

Key Requirements

Required Skills

AWSAzureBashGCPKubernetesPython

Certifications

ITIL

Domain Knowledge

Advertising
Automation
Engineering
Marketing
Medical

Benefits & Perks

Health Insurance

ment. From offering comprehensive benefits to fostering hybrid ways of working, we

Requirements

Experience:

1-3 years of experience in technical operations, system administration, or entry-level cloud engineering roles

Familiarity with cloud platforms (AWS, GCP, Azure), kubernetes, and basic computing, storage, and networking concepts

Experience with monitoring and alerting tools (e.g., Datadog, Prometheus, Grafana) is a plus

Skills:

Strong troubleshooting and problem-solving skills, with the ability to follow processes and escalate appropriately

Responsibilities

Platform Support Engineers (PSEs) are the first line of defense in ensuring the health and availability of Braze’s platform and systems. As part of a global triage team, they actively monitor system performance, respond to alerts, and execute runbooks, SOPs (Standard Operating Procedures), and MOPs (Maintenance Operating Procedures) to address operational issues.

Braze operates at a massive scale with over 3.3 billion monthly active users across our customers, collecting hundreds of billions of data points each month and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. The Braze Operations Team optimizes our response mechanisms by centralizing triage and monitoring responsibilities. It allows our other engineering teams to focus on what they do best while we do what we do best. As a Platform Support Engineer at Braze, you will focus on maintaining uptime and reliability, collaborating with engineers to escalate complex issues, and contributing to continuously improving operational processes.

Main responsibilities:

Active System Monitoring:

Use monitoring tools (e.g., Datadog, Prometheus, or similar) to observe the health of platform systems and services continuously

Proactively identify and respond to performance anomalies, outages, or unusual system behavior