Site Reliability Engineer

Location: Raleigh, NC

Country: United States

Salary: $50.00 - 55.00 / hr

Start Date:

Description:

Contractor Onsite position

We are looking for experienced Site Reliability Engineers to ensure the reliability, scalability, and performance of mission-critical enterprise platforms. This role is ideal for individuals with deep technical expertise in cloud infrastructure, operating systems, automation, and modern observability practices. The right candidate brings a strong engineering mindset, thrives in high-pressure environments, and drives operational excellence through metrics, automation, and continuous improvement. This is a hands-on engineering role that partners closely with cross-functional teams to maintain and enhance complex, high-availability systems.

Responsibilities

Design, implement, and maintain reliable, scalable, and secure systems across cloud and on-premises environments.
Manage and optimize distributed systems running on Azure, Linux (RHEL7+), and Windows Server (2019+).
Build and enhance automation workflows using Python, Go, and Bash.
Develop Infrastructure-as-Code solutions with Terraform, Ansible, or similar tools.
Define, monitor, and improve SLIs, SLOs, and SLAs to maintain consistent service quality.
Reduce operational toil by implementing automation, improving tooling, and refining processes.
Integrate systems with modern observability platforms to improve visibility and proactively detect issues.
Troubleshoot complex incidents, lead structured incident response efforts, and deliver clear post-mortem analyses.
Work closely with software engineering, infrastructure, and business teams to support resilient and performant services.
Identify and execute improvements to system reliability, performance, and maintainability with full ownership of problem spaces.

Requirements

Proven experience as a Site Reliability Engineer, with a background in software engineering, infrastructure, or operations.
Hands-on experience with cloud platforms such as Azure, and enterprise operating systems like Linux RHEL7+ and Windows Server 2019+.
Solid understanding of networking and storage technologies including NFS, SAN, and NAS.
Working knowledge of DNS, LDAP, Kerberos, Centrify, and other authentication or naming services.
Strong scripting and automation skills using Python, Go, and Bash.
Practical experience with Infrastructure-as-Code tools such as Terraform and Ansible.
Demonstrated ability to design and monitor SLIs, SLOs, and SLAs, and to drive reliability improvements using metrics and automation.
Experience integrating with observability platforms providing logs, metrics, and tracing.
Ability to stay calm, structured, and focused during high-pressure operational events.
Strong communication skills and the ability to collaborate effectively with cross-functional teams.
A proactive, ownership-driven mindset with a commitment to continuous improvement.

Skills

Site Reliability Engineering, Cloud Platforms, Azure, Linux RHEL7+, Windows Server 2019+, Networking Fundamentals, NFS, SAN, NAS, DNS, LDAP, Kerberos, Centrify, Python, Go, Bash, Terraform, Ansible, Infrastructure as Code, Observability Platforms, SLIs, SLOs, SLAs, TOIL Reduction, Incident Response, Post-Mortems, Automation, Metrics-Driven Engineering, System Reliability, Cross-Functional Collaboration, Communication Skills, Ownership Mindset

I'm an employer

Job

Site Reliability Engineer

Follow Us