Site Reliability Engineer @ Capgemini - Malvern, PA

Job Overview

6 days ago

Site Reliability Engineer

Capgemini - Malvern, PA

Duration: 4+ Months


Job Description:

  • Adept user of telemetry tools, including CloudWatch, Splunk, and Honeycomb
  • Ability to read and understand application code written in NodeJS, Java, and Python
  • Ability to write and update application code confidently in at least one of the following languages: NodeJS, Java, Python
  • Deep familiarity with Siteminder, MFA, and OIDC (Kong, envoy, OPA, etc. ) protocols and implementation.
  • Experience debugging production incidents using a combination of logs, metrics, and traces
  • Familiarity with executing performance and chaos tests and analyzing results.
  • Experience building cloud-native applications/platforms
  • Ability to create, interpret, and update technical architecture diagramsSpecializations that will make an impact.


Duties and Responsibilities:

  • Cloud Platform SRE Engineer Lead (OR a really good hand's senior cloud engineer)Help Client reach 99.999% availability for our mission-critical applications in support of our Agility pillar for the Enterprise Technology Strategy!! You will be joining a high-performing enterprise cloud container service (ECS) platform which hosts the majority of the web and batch applications developed by application teams in Client, with many of them providing business\mission critical functionality for our external sites and services and internal applications.
  • If you are a great developer and are passionate about increasing the resiliency of our systems, then this is perfect for you!
  • We can train on AWS container technology and ECS and partner with the CTO SRE team on upskilling core SRE functions.
  • As a Cloud Compute SRE Lead within the ECS platform team expanding to a global multi-region presence, you'll proactively seek out points of pain and opportunities for wide-reaching improvement by analyzing enterprise-wide telemetry data.
  • Also, support reliability-centric tasks for cross-cutting concerns and applications spanning more than one sub-division.
  • You will be expected to be a hands-on developer to implement the found opportunities to enhance resiliency and harden the platform.
  • You will partner with other shared services teams (performance, chaos, security and fraud, and various ops teams) to bring a holistic approach to hardening the platform's security and resiliency posture. In this role, you will,
  • Proactively seek out operational anomalies using Honeycomb, Splunk, CloudWatch, and other telemetry tools
  • Execute chaos experiments and other resilience tests for spinal services and applications with cross-cutting impacts or high criticality
  • Define SLIs and aligned SLOs for platform services. Implement automation via synthetic monitors and formulas to capture platform availability.
  • Build\Deploy/Determine efficiencies to reduce build\deploy times and failures or application workloads. Assess build\deploy metrics to capture for further refinement and reporting
  • Update application code based on findings to improve resilience and assist in automating workloads to be stood up in a Multi Region\Out of Region environment
  • Improve the platform's security posture by easing integration with modernized authorization\authentication protocols (OIDC, Auth0, Kong, Envoy) and identifying any potential vulnerabilities
  • Help product and platform teams and their SRE representatives diagnose complex technical problems, including performance issues and intermittent errors
  • Listen in and participate in high severity major incident calls (SEV1s, some cross-cutting SEV2s) to assist with triage and recovery also participate in post-incident reviews for these incidents
  • Review critical and complex architectures, including facilitation of FMEA exercises


The Capgemini Freelancer Gateway is enabled by a cutting-edge software platform that leads the contingent labor world for technology innovation. The software platform leverages Machine Learning and Artificial Intelligence to make sure the right people end up in the right job.


A global leader in consulting, technology services, and digital transformation, Capgemini is at the forefront of innovation to address the entire breadth of clients’ opportunities in the evolving world of cloud, digital, and platforms. Building on its strong 50-year heritage and deep industry-specific expertise, Capgemini enables organizations to realize their business ambitions through an array of services from strategy to operations. Capgemini is driven by the conviction that the business value of technology comes from and through people. It is a multicultural company of over 200,000 team members in more than 40 countries. The Group reported 2018 global revenues of EUR 13.2 billion.

Similar Jobs

Site Reliability Engineer

The Venetoulis Institute for Local Journalism

Baltimore, MD

Monitor and manage site performance and bring appropriate tooling and procedure for sustainable site performance management. Build and manage CI/CD pipeline.

Release Manager

MongoDB

New York, NY

Consulting, technical support, or site reliability engineering. We're looking for a versatile, highly technical and extremely organized individual to own and…

Site Reliability Engineer

Change Healthcare

New York, NY

We deliver innovative solutions to patients, hospitals, and insurance companies to improve clinical decision making, simplify financial processes, and enable…

Site Reliability Engineer (SRE)- Executive Director

JPMorgan Chase Bank, N.A.

Jersey City, NJ

Engage with development teams throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

Site Reliability Engineer

JPMorgan Chase Bank, N.A.

New York, NY

Develop software for reliability and scale, ensuring minimal refactoring or changes. Much of our support and software development focuses on optimizing existing…

DevOps Engineer (Kubernetes/HPC) - Poly Clearance Required

Praxis Engineering

Annapolis Junction, MD

The DevOps Engineer shall be responsible for the Operational and Maintenance (O&M) efforts including installation, configuration, integration, monitoring, and…

Site Reliability Engineer

Harford Mutual Insurance Group

Bel Air, MD

Maintain and continually improve the enterprise systems in order to provide maximum performance and reliability. NET or Java web applications.

Sr. Site Reliability Engineer - DevOps

Kraken Digital Asset Exchange

New York, NY

You are a developer who is interested in deployment and network operations, or you're already somewhere on the journey to being a fully fledged DevOps…

Senior Site Reliability Engineer - Cryptowatch

Kraken Digital Asset Exchange

New York, NY

Responsible for the operation, support, and security of production infrastructure. Author automation tools to assist with deployments, logging, monitoring, and…

Site Reliability Engineering Manager - Staked

Kraken Digital Asset Exchange

New York, NY

To be successful in this role, you will need to be responsible for implementing and maintaining the Staking infrastructure's observability.

Site Reliability Engineer - Client Support Services

Kraken Digital Asset Exchange

New York, NY

You will bring your own technical expertise to monitor and support staging and production environments, build tooling, CI/CD pipelines, deployment specs and…

Senior Site Reliability Engineer - Cryptowatch

Kraken Digital Asset Exchange

Baltimore, MD

Responsible for the operation, support, and security of production infrastructure. Author automation tools to assist with deployments, logging, monitoring, and…

Site Reliability Engineering Manager - Staked

Kraken Digital Asset Exchange

Baltimore, MD

To be successful in this role, you will need to be responsible for implementing and maintaining the Staking infrastructure's observability.

Site Reliability Engineer - Client Support Services

Kraken Digital Asset Exchange

Baltimore, MD

You will bring your own technical expertise to monitor and support staging and production environments, build tooling, CI/CD pipelines, deployment specs and…

Senior Associate - Site Reliability Engineer

New York Life Insurance Co

New York, NY

Strong networking skills will enable this position to be successful while also work with a Cloud Solution Engineer to automate, deploy, and provide day-to-day…

Devops/Cloud Engineer

Qcom

Wayne, NJ

Recommend, develop and implement system enhancements that will improve the performance and reliability of the system including installing, upgrading/patching,…

Site Reliability Engineering Lead

Recruiting From Scratch

Baltimore, MD

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

New York, NY

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Brooklyn, NY

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Philadelphia, PA

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Hanover, MD

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Trenton, NJ

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Newark, NJ

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.

Site Reliability Engineering Lead

Recruiting From Scratch

Hoboken, NJ

Establishing reliability guidelines and ensuring systems meet our goals around durability, availability and performance.