braze
Platform Support Engineer
At a Glance
- Location
- São Paulo
- Experience
- 1–3 years
- Posted
- 2026-02-24T11:07:20-05:00
Key Requirements
Required Skills
Certifications
- ITIL
Domain Knowledge
- Advertising
- Automation
- Engineering
- Marketing
- Medical
Benefits & Perks
ment. From offering comprehensive benefits to fostering hybrid ways of working, we
Requirements
Experience:
1-3 years of experience in technical operations, system administration, or entry-level cloud engineering roles
Familiarity with cloud platforms (AWS, GCP, Azure), kubernetes, and basic computing, storage, and networking concepts
Experience with monitoring and alerting tools (e.g., Datadog, Prometheus, Grafana) is a plus
Skills:
Strong troubleshooting and problem-solving skills, with the ability to follow processes and escalate appropriately
Responsibilities
Platform Support Engineers (PSEs) are the first line of defense in ensuring the health and availability of Braze’s platform and systems. As part of a global triage team, they actively monitor system performance, respond to alerts, and execute runbooks, SOPs (Standard Operating Procedures), and MOPs (Maintenance Operating Procedures) to address operational issues.
Braze operates at a massive scale with over 3.3 billion monthly active users across our customers, collecting hundreds of billions of data points each month and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. The Braze Operations Team optimizes our response mechanisms by centralizing triage and monitoring responsibilities. It allows our other engineering teams to focus on what they do best while we do what we do best. As a Platform Support Engineer at Braze, you will focus on maintaining uptime and reliability, collaborating with engineers to escalate complex issues, and contributing to continuously improving operational processes.
Main responsibilities:
Active System Monitoring:
Use monitoring tools (e.g., Datadog, Prometheus, or similar) to observe the health of platform systems and services continuously
Proactively identify and respond to performance anomalies, outages, or unusual system behavior