IBM Site Reliability Engineer in AUSTIN, Texas

Job Description


IBM Cloud Brokerage Services is IBM’s solution for Hybrid Cloud Enablement, giving our

client’s IT organization visibility and governance, without sacrificing speed and business agility.

Our solution is built on our recent acquisition of Gravitant. We continue to operate with a startup mentality but with access to the tremendous market reach of IBM. We are global in scale, with customers in Europe, North America, South American and Asia Pacific. We are panindustry in scope, delivering to a client base representing a range of industries including:

telecommunications, retail, aerospace, financial services and others.

IBM Cloud Brokerage is a purpose-built suite of applications that enables a self-service ability to

browse, search, order and fulfill services powered by a comprehensive, curated IT as a Service

catalog spanning Public, Private and Hybrid Clouds and Traditional IT providers. It is a core

component of IBM’s strategic investment in the IBM Services Platform with Watson (ISPW), a

complete and automated IT as a Service environment powered by the unmatched cognitive

capability of Watson.

The Cloud Brokerage Site Reliability Engineer will be part of a group deploying and managing

complex Enterprise software solutions in the areas of cloud brokerage, cloud management, data center transformation, Enterprise Hybrid Cloud Architectures and IT Governance.

Our delivery organization is made up of functional teams managing (a) Client Advocacy, (b)

Client Onboarding and Transformation, (c) Client Solution Engineering and (d) Client Services

and Enablement.

The Brokerage Site Reliability Engineer position is responsible for:

• Designing, analyzing, and troubleshooting large-scale distributed systems

• Participation in on-call rotation

• Engage with product teams to fix production outages and carry forward action items to improve ongoing reliability

• Develop effective tooling, alerts, and response to both identify and address reliability risks including automatic problem detection and mitigation

• Manage end-to-end availability and performance of Cloud Brokerage services and build automation to prevent problem recurrence. Eventually automate response to all non-exceptional service conditions.

• Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Cloud Brokerage services.

As Cloud Brokerage Site Reliability Engineer you should possess the following skills:

• DevOps Mindset

• You enjoy solving difficult engineering problems and don’t mind getting your hands dirty

• Approach troubleshooting systematically and have a deep sense of ownership for whatever you work on

• Ability to root cause sources of instability in a high-traffic, distributed system

• Passion for resolving reliability issues and identify strategies to mitigate going forward

• Willingness to work in an ever-changing environment

• You are passionate about automation and innovations that improve productivity

Required Technical and Professional Expertise

• Experience with Cloud Computing platforms – IBM Cloud, AWS, Azure, Google Cloud Platform – 3+ years

• Strong Linux system-level analysis capabilities – 5+ years

• Experience in operating highly available distributed systems, in particular microservices, in a cloud environment – 1+ years

• Experience in at least one scripting language, Python preferred. – 2+ years

• Sound understanding of CI/CD systems as well as experience in running containerized applications using tools such as Docker and Kubernetes. – 1+ years

• Experience with configuration and troubleshooting of Linux, Java/Scala, Docker systems – 1+ years

• Experience in operating RDBMS and NoSQL databases. – 3+ years

• Experience in Java, Elasticsearch, Kibana, Logstash, Grafana - 2+ years

• Understanding of large-scale complex systems from a reliability perspective – 5+ years

Preferred Tech and Prof Experience

• Proficiency in algorithms, data structures, complexity analysis and software design and expertise in Unix/Linux systems, IP networking, performance and application issues. - 5+ years

EO Statement

IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.