Job
/Fill this out to apply:
Fill this out to refer a friend:
Site Reliability Engineer
Location: Raleigh, NC
Country: United States
Salary: $50.00 - 55.00 / hr
Start Date:
Description:
Contractor Onsite position
We are looking for experienced Site Reliability Engineers to ensure the reliability, scalability, and performance of mission-critical enterprise platforms. This role is ideal for individuals with deep technical expertise in cloud infrastructure, operating systems, automation, and modern observability practices. The right candidate brings a strong engineering mindset, thrives in high-pressure environments, and drives operational excellence through metrics, automation, and continuous improvement. This is a hands-on engineering role that partners closely with cross-functional teams to maintain and enhance complex, high-availability systems.
Responsibilities
- Design, implement, and maintain reliable, scalable, and secure systems across cloud and on-premises environments.
- Manage and optimize distributed systems running on Azure, Linux (RHEL7+), and Windows Server (2019+).
- Build and enhance automation workflows using Python, Go, and Bash.
- Develop Infrastructure-as-Code solutions with Terraform, Ansible, or similar tools.
- Define, monitor, and improve SLIs, SLOs, and SLAs to maintain consistent service quality.
- Reduce operational toil by implementing automation, improving tooling, and refining processes.
- Integrate systems with modern observability platforms to improve visibility and proactively detect issues.
- Troubleshoot complex incidents, lead structured incident response efforts, and deliver clear post-mortem analyses.
- Work closely with software engineering, infrastructure, and business teams to support resilient and performant services.
- Identify and execute improvements to system reliability, performance, and maintainability with full ownership of problem spaces.
Requirements
- Proven experience as a Site Reliability Engineer, with a background in software engineering, infrastructure, or operations.
- Hands-on experience with cloud platforms such as Azure, and enterprise operating systems like Linux RHEL7+ and Windows Server 2019+.
- Solid understanding of networking and storage technologies including NFS, SAN, and NAS.
- Working knowledge of DNS, LDAP, Kerberos, Centrify, and other authentication or naming services.
- Strong scripting and automation skills using Python, Go, and Bash.
- Practical experience with Infrastructure-as-Code tools such as Terraform and Ansible.
- Demonstrated ability to design and monitor SLIs, SLOs, and SLAs, and to drive reliability improvements using metrics and automation.
- Experience integrating with observability platforms providing logs, metrics, and tracing.
- Ability to stay calm, structured, and focused during high-pressure operational events.
- Strong communication skills and the ability to collaborate effectively with cross-functional teams.
- A proactive, ownership-driven mindset with a commitment to continuous improvement.
Skills
Site Reliability Engineering, Cloud Platforms, Azure, Linux RHEL7+, Windows Server 2019+, Networking Fundamentals, NFS, SAN, NAS, DNS, LDAP, Kerberos, Centrify, Python, Go, Bash, Terraform, Ansible, Infrastructure as Code, Observability Platforms, SLIs, SLOs, SLAs, TOIL Reduction, Incident Response, Post-Mortems, Automation, Metrics-Driven Engineering, System Reliability, Cross-Functional Collaboration, Communication Skills, Ownership Mindset
Copyright © 2025 OnlyHire.Me