Staff Site Reliability Engineer
Ireland - Remote
Guidewire Software
Elevate your P&C insurance with Guidewire's industry-leading software! Streamline workflows, enhance customer experience, and drive growth. Learn more today!ESSENTIAL DUTIES AND RESPONSIBILITIES
- Collaborate with development teams to enhance the reliability and efficiency of the Guidewire Cloud Platform (GWCP) and platform services.
- Partner with our platform engineering teams to support the design and implementation of highly available and fault-tolerant systems; from the early stages of development through to deployment and operations. Serve as the primary SRE liaison and reliability consultant within engineering teams.
- Actively guide and contribute to service and tool development by writing code, automating processes, and improving reliability, while conducting code reviews to ensure best practices for scalability and maintainability.
- Work with teams on infrastructure improvements and system design, optimizing performance and scalability while integrating monitoring and alerting.
- Define and implement SLIs, SLOs, and Error Budgets, ensuring systems adhere to agreed-upon reliability standards.
- Establish and refine observability, monitoring, and alerting practices to ensure systems are operating as expected.
- Ensure services are fully prepared for incident management, leading response efforts and post-incident reviews to identify gaps and drive continuous improvement.
- Advocate for "reliability as a feature", embedding best practices into the development process.
- Conduct production readiness reviews, focusing on performance, monitoring, and fault tolerance, while collaborating with stakeholders on operability requirements.
- Mentor and coach engineers on operational best practices, including capacity planning, disaster recovery, and observability.
- Facilitate blameless postmortems, retrospectives, and technical discussions focused on improving system reliability and availability.
- Create and standardize best practices for monitoring, alerting, incident response, and operational procedures.
- Maintain a centralized repository of operational best practices and documentation, ensuring it is standardized, updated, and accessible for continuous improvement.
- Capture and share lessons learned and optimizations to promote continuous learning across engineering teams.
- Establish efficient feedback loops between product development and SRE, ensuring ongoing knowledge transfer and regular status updates to SRE groups.
- Stay up-to-date with the latest tools, technologies, and trends in SRE, DevOps, and cloud infrastructure to introduce new ideas and practices.
Required Technical Skills:
- Bachelor’s Degree in Computer Science or a related field, or equivalent demonstrable experience in a relevant technical role.
- Proficiency in software engineering and automation using Bash, Java, Go, and/or Python, including experience with writing unit tests, integration tests, and automated test frameworks to ensure code quality and reliability.
- Strong background in Linux systems engineering and administration.
- Extensive experience working with cloud environments (preferably AWS) for engineering and automation, with multi-cloud experience as a plus.
- Experience developing and/or supporting microservices architecture in production environments at scale.
- Proven expertise in using Infrastructure-as-Code (IaC) tools to automate and manage infrastructure, with experience in technologies like Crossplane, Terraform, or similar.
- Hands-on experience with DevOps/GitOps tools for managing CI/CD pipelines and automating deployments, preferably with tools such as Git, GitHub, ArgoCD, FluxCD, and TeamCity for gate promotion and production readiness.
- In-depth knowledge of containerization technologies such as Docker, Helm, Kubernetes (EKS), and networking (CNI, Ingress).
- Comprehensive experience with observability tools for logging, metrics, distributed tracing, and performance monitoring, including setting up alerting systems, creating real-time dashboards, and analyzing logs to identify and resolve performance issues.
Desired Technical Skills:
- Familiarity with the Agile software development lifecycle.
- Adept in software development principles such as object oriented programming, functional programming and event driven architectures.
- Proficient in modern software development frameworks and tools for distributed systems and microservices (e.g. Spring Boot, Kubernetes Operators). Expertise with Git and version control strategies for the effective management of large-scale codebases.
- Experience collaborating with or working directly withinproduct development teams on large-scale codebasesto embed reliability, scalability, and performance into the software development lifecycle.
- Familiarity with security best practices for cloud environments, including identity and access management (IAM) and data protection strategies.
- Experience with relational databases such as Aurora PostgreSQL and Oracle RDS.
- Strong understanding of Single Sign-On (SSO), SAML, and OAuth (Okta experience is a plus).Experience with x.509 certificates and encryption technologies.
- Advanced knowledge of Web UI design, JSON, and application architecture.
- Familiarity with event-driven and stream-processing systems like Kafka or AWS SQS.
- Understanding of Open Application Model (OAM) systems such as KubeVela or Crossplane.
- Experience managing multi-cluster Kubernetes environments, including workload distribution, scaling, and maintaining consistent configurations across clusters.
Personal Qualities & Soft Skills:
- Exceptional communication skills, capable of clearly articulating technical concepts to diverse audiences, both technical and non-technical.
- Passion for mentoring others and fostering a culture of reliability through cross-team collaboration and knowledge sharing.
- Ability to build relationships and influence stakeholders at all levels to drive initiatives, foster collaboration and influence without authority.
- Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving.
- Strong analytical mind with a penchant for process development and enhancement.A highly positive can-do attitude with desire for being a team player.
- Ability to work independently and proactively identify and address challenges while also thriving in collaborative team environments.
- Strong work ethic with a focus on follow-through, consistently meeting commitments and delivering quality results.
Other Requirements:
- Ability to read, write, and speak English
- We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support
- Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.
As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.
For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.
Guidewire Software Inc. provides equal employment opportunities to all applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
CONSENT and ACKNOWLEDGEMENT By clicking the submitting your application on the following page:
1. You consent to Guidewire collecting, retaining, disclosing and using your Personal Data as outlined above, and to its transfer of your Personal Data outside the country where you live or work, and/or to third parties for the above purposes. 2. In the event that you submit any Sensitive Personal Data, you explicitly consent to Guidewire collecting, retaining, disclosing and transferring your Sensitive Personal Data on the terms and for the same purposes as described above in relation to Personal Data. 3. You acknowledge that you have the right to access your Personal Data and Sensitive Personal Data at any time and have the right to correct any errors. 4. You acknowledge that your Personal Data will be retained for up to 24 months.
* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰
Tags: Agile Analytics Automation AWS Bash C CI/CD Cloud Computer Science DevOps Docker Encryption GitHub Helm IAM Incident response Java JSON Kafka Kubernetes Linux Microservices Monitoring Okta Oracle PostgreSQL Python R&D RDBMS SAML SDLC SLOs SQS SSO Terraform
Perks/benefits: Career development Insurance
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.