Site Reliability Engineer
Shape Security
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
The Role
This role serves as a critical hybrid position combining the responsibilities of a skilled Technical Support Engineer with Site Reliability Engineering (SRE) principles. The ideal candidate will embrace automation, observability, and operational excellence to ensure the reliability, scalability, and performance of our AI-powered public SaaS platform. You will operate at the intersection of system optimization and customer success, applying cloud-native technologies and distributed systems methodologies to address challenges at scale.
The role provides an opportunity to run, support, and scale an AI Security SaaS platform engineered for running AI inference across distributed architectures. Success in this role entails strong collaboration, a passion for automation, and the ability to proactively improve system reliability while assisting customers with complex technical inquiries.
Key Responsibilities
Proactive Monitoring, Performance Optimization, and Incident Management
Monitor and measure system behaviors: Ensure Service Level Objectives (SLOs) are being met through observability tools like metrics collection, logging systems, and distributed tracing. Apply proactive data insights to ensure optimal system performance and uptime.
24/7 Support Model: Drive operational excellence to maintain the availability and reliability of SaaS platforms through incident management, root cause resolutions, postmortem authorship, and service restoration processes.
Customer-Centric Incident Resolution
Act as the primary point of contact for high-priority technical inquiries and escalations.
Troubleshoot and resolve complex customer-facing issues, applying technical acumen to dissect log files, application traces, and system metrics quickly.
Identify, triage, and address technical problems, ensuring prompt communication and solution delivery.
Automation and Toil Reduction
Build and improve automated workflows through Infrastructure as Code (IaC) frameworks and scripting (e.g., Terraform, Python).
Advocate for and lead automation initiatives across monitoring, deployment processes, configuration management, and repetitive manual tasks – ensuring greater efficiency and reliability.
Collaboration with Development Teams and SRE Evolution
Collaborate with cross-functional engineering teams, sharing insights from monitoring systems, metrics, and customer interactions to contribute to improving system design, architecture, and reliability.
Evangelize and introduce SRE principles, methodologies, and best practices (e.g., High Availability frameworks, service mesh, container orchestration).
Contribute directly to improving logging, reporting, and alerting capabilities within the SaaS platform.
Operational Security & Continuous Improvement
Ensure security awareness across operational tasks by integrating security-as-code and configuration management principles into workflows.
Drive continuous service improvement by analyzing patterns of incidents/service disruptions and strategizing immediate and long-term fixes.
Qualifications
Required
Bachelor’s degree in Computer Science, Information Technology, or a related field, or demonstrable equivalent experience.
Technical Experience:
1-3+ years in a technical support, system administration, or cloud operations role.
Foundational knowledge of public cloud environments (e.g., AWS, Google Cloud, OpenStack).
Proficiency in scripting languages such as Python, Bash, and familiarity with Infrastructure as Code tools (e.g., Terraform).
Solid understanding of web technologies, protocols, and APIs (e.g., HTTP, REST, JSON).
Basic familiarity with networking, databases (e.g., PostgreSQL), and Linux server administration.
Soft Skills:
Strong analytical and problem-solving skills, with comfort in troubleshooting under pressure.
Excellent communication skills, able to collaborate effectively across teams and engage with customers professionally and empathetically.
Preferred
Direct experience with Kubernetes or container orchestration systems.
Knowledge of monitoring tools such as Prometheus, Grafana, or equivalent observability stacks.
Previous exposure to Site Reliability Engineering principles such as SLO management, automated delivery pipelines, or fault-tolerant architectures.
Familiarity with configuration management tools such as Ansible, Chef, or Puppet.
What Success Looks Like
The successful candidate will embody SRE fundamental principles and demonstrate:
System Reliability Expertise: Navigate, maintain, and scale SaaS applications by identifying optimal fixes to complex issues in production systems.
Observability Innovator: Drive advancements in monitoring, metric collection, logging, and tracing tools to achieve deeper insights into system behaviors and improve reliability.
Future-Focused Problem Solver: Strategically balance immediate tactical solutions with long-term architectural improvements, reducing operational toil and increasing scalability.
Collaborative Leadership: Demonstrate a collaborative mindset, foster innovation, and mentor peers to champion automation initiatives and SRE best practices.
The position will also require participation in an on-call rotation for out-of-hours incident response.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com).
Equal Employment Opportunity
It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.

