samsungsemiconductor
Senior Staff Engineer, Memory Fault Management Architect
At a Glance
- Location
- San Jose, California, United States
- Employment
- employment_required
- Experience
- 15+ years
- Posted
- 2026-03-20T14:24:39-04:00
Key Requirements
Required Skills
Domain Knowledge
- Engineering
- Medical
Requirements
Knowledge of platform memory subsystem, platform RAS (Reliability Availability Serviceability) such as ECC, page offlining, hPPR and hardware sparing.
ECC design and verification and reverse engineering experience.
Understanding on the address mapping between CPU and memory.
Memory controller register modification.
DRAM and HBM failure mode understanding.
An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
Responsibilities
Based on the knowledge of SOC controller and memory operation including RAS feature, find and recommends better solution to mitigate the field DRAM failure rate.
Needs to communicate better ECC scheme to customers based on Samsung DRAM failure mode(DQ and burst)
Interface with customers to establish the value add of enabling in-field fault management architecture
Contribute to the standardization of DRAM/HBM failure logging in the OCP.
Propose and develop platform RAS (Reliability Availability Serviceability) algorithms for memory fault management such as page offlining, hPPR and conduct POC with known failure DIMMs in the real server and application.