Senior Site Reliability Engineer @ University of Pennsylvania Health System - Philadelphia, PA

Job Overview

8 days ago

Senior Site Reliability Engineer

University of Pennsylvania Health System - Philadelphia, PA

The Senior Site Reliability Engineer (SRE) is responsible for production systems enabling custom software development work in support of areas such as Application Development, Informatics, Predictive Healthcare, and Translational Research. The SRE applies software engineering, systems engineering, and dev-ops principles to operations; designs and implements cohesive end-to-end systems for comprehensive solutions with measurable patient and clinician outcomes; and builds resilient, self-healing systems. The Senior SRE values proactive automation, expert tool-smithing, and adherence to design and engineering principles over reactive systems management, or traditionally siloed systems administration.

  • Designs, builds, and maintains our core infrastructure, while retaining the flexibility to integrate next-generation systems for Predictive Medicine.
  • Applies Systems Engineering and Software Engineering skills to advance core infrastructure, systems design, recurring microservices, tooling, automation, and libraries to “lift all the boats” instead of fragmented support for individual applications.
  • Improves and automates system infrastructure and application deployment processes to be as boring as possible.
  • Implements proactive monitoring and alerting of symptoms, instead of reactive alerting of outages.
  • Establishes best practices for securing automated systems infrastructure and microservices.
  • Advances the maturity of SRE discipline across Penn Medicine.
  • Applies dev ops process to monitor and stabilize core infrastructure.
  • Prevents incidents, e.g., reduce baseline noise, streamline metrics, characterize expected latency, tune alert thresholds, ticket applications without effective health checks, improve playbooks for issue resolution.
  • Uses playbooks to document actions alongside code in source control to turn initial problem discovery and resolution into automated processes.
  • Participates in on-call rotation for systems infrastructure and to provide subject matter expert support for Software Engineers, Data Engineers, and Data Scientists developing, building, testing, deploying, and monitoring their microservices.
  • Collaborates to define service level agreements (SLAs), objectives (SLOs), and automates measurement of service level indicators (SLIs)
  • Triages production issues across all products and services, and at all levels of the stack.
  • Performs duties in accordance with Penn Medicine and entity values, policies, and procedures
  • Other duties as assigned to support the unit, department, entity, and health system organization
Minimum Requirements:
Required Education and Experience
  • Bachelor's Degree required, in a relevant field, including Computer Science, Systems Engineering, Data Science, Mathematics, Statistics. Master's Degree preferred.
  • 3+ years of Software engineering experience required.
  • Current Internal Penn Medicine Information Services division employees may be considered with proof of active and continued enrollment in an approved bachelor degree program
  • 1+ years of Infrastructure as code with a cloud provider or Systems Engineering required.

Required Skills:
  • Demonstrated interpersonal/verbal communication skills is required.
  • Ability to communicate effectively with all levels of staff is required.
  • Demonstrated customer service skills is required.
  • Exceptional design and programming skills in a language such as golang, python, C, C++ is required.
  • Exceptional coding skill in ANSI SQL or PL/PGSQL is required.
  • Competency in Linux and the Unix shell, or with equivalent operating systems is required.

Preferred Skills:
  • Production experience with microservice orchestration (e.g., Hashicorp, Kubernetes), logging, metrics, and alerting (e.g., Loki, Grafana, Prometheus, Kibana, Fluentd).
  • Production experience with infrastructure as code using a cloud provider (e.g., Azure, AWS, Google Cloud Platform) using tools like Terraform and python.
  • Production experience with dev ops automation directly from source control, semantic versioning, and CI/CD (e.g., GitHub actions, Circle CI, Travis CI).
  • Development of open-source products.
  • Effective asynchronous communication, documentation of process and code in source control toward automation.
  • Practical understanding of Agile software development process with code review and retrospectives.

Additional Information:
As part of our COVID-19 response, this position may currently be offering partial or full remote work. However, in the near future this position will require full or partial on-site work.

Be a part of the exciting and ground-breaking upcoming years for the Penn Medicine Information Services department!
Because growth is essential to continuing to meet the current and future needs of patients, Penn Medicine continues to expand its capabilities.
Penn Medicine's Information Services (IS) Department focuses its efforts on the clinical and financial systems that support the day-to-day operations of six (6) hospitals, several satellite practices, and more than 8,923 physicians.

We believe that the best care for our patients starts with the best care for our employees. Our employee benefits programs help our employees get healthy and stay healthy. We offer a comprehensive compensation and benefits program that includes one of the finest prepaid tuition assistance programs in the region. Penn Medicine employees are actively engaged and committed to our mission. Together we will continue to make medical advances that help people live longer, healthier lives.

Live Your Life's Work

We are an Equal Opportunity and Affirmative Action employer. Candidates are considered for employment without regard to race, ethnicity, color, sex, sexual orientation, gender identity, religion, national origin, ancestry, age, disability, marital status, familial status, genetic information, domestic or sexual violence victim status, citizenship status, military status, status as a protected veteran or any other status protected by applicable law.

Similar Jobs

Senior Site Reliability Engineer (DevOps) - REMOTE

Penn Interactive Ventures - Philadelphia

Philadelphia, PA

Support our AWS infrastructure for the engineering team, ensuring system availability, performance, capacity, and continuity.

Sr Mgr, Engineering Management

L3Harris Technologies

Philadelphia, PA

Ensure the safety, quality, and reliability of all engineered products. Position will matrix report into site business leadership and must be able to influence…

Maintenance Supervisor

TreeHouse Foods

Hanover, PA

Provides recommendations to the capital program based on equipment reliability plan for area of responsibility. Develop Area Master Maintenance List.

Refrigeration / RHVAC Technician

City Facilities Management (MA) LLC

Georgetown, DE

We offer 28 days of PTO, 401k match (1st 4%), and a competitive health benefits package. City Refrigeration Technicians repair refrigeration and HVAC mechanical…

SCADA Engineering Technician

GE Renewable Energy

New York, NY

Interface and communicate effectively with off-site support. The SCADA / WindCONTROL Specialist is responsible for defined work or projects.

Lead Site Reliability Engineer / Lead DevOps - PERM- 100% Remote

CoEnterprise, LLC

New York, NY

Build, maintain and evolve proactive monitoring and alerting infrastructure to support operations and system health. Manage and grow on CI/CD pipeline.

Security Engineer 3


West Chester, PA

Works with project management and engineers to deliver technical solutions that meet or exceed product requirements, project schedules and reliability.

SCADA Engineering Technician

General Electric

New York, NY

Interface and communicate effectively with off-site support. The SCADA / WindCONTROL Specialist is responsible for defined work or projects.

Executive Director, Reliability Engineering Excellence


Philadelphia, PA

This role will be responsible for delivering customer and business value by partnering and collaborating with leaders to shape the technology services strategy…

Site Reliability Engineer, TikTok Ads Serving


New York, NY

Deliver tools/software to improve the reliability and scalability of services. Engage in and improve the reliability, scalability and release cycle of TikTok…

Lead Automation (ASRS) Mainenance (DG35294398)

Qualified Staffing

Lancaster, PA

Leads the development of site ASRS technician capability development. Leads and coordinates work among other Automated Storage / Retrieval System (ASRS)…

Distribution Supervisor

Vicinity Energy, Inc.

Philadelphia, PA

Must be able to work internally in the physical plant and externally on the facility’s grounds or at other company sites. Technical or trade school graduate.

Lead Site Reliability Engineer


New York, NY

Our diverse teams lead with empathy, data and creativity—always in service of the experience. The Lead Site Reliability Engineer will serve as a technical lead…

Global Supply Chain Specialist

Clark Associates, Inc.

Lancaster, PA

Sourcing international suppliers and products through research; evaluate foreign suppliers based on price, quality, reliability, production and distribution…

Logistics Specialist

Cherokee Nation Businesses

Baltimore, MD

From intelligence and cybersecurity to vulnerability assessment and mission assurance, Cherokee Nation Strategic Programs (CNSP) brings experience and results.

Equities Electronic Trading Site Reliability Engineering (SRE) Team Lead

Bank of America

New York, NY

Site Reliability Engineering (SRE) is an exciting and emerging role that applies engineering discipline to proactively solve operational problems, focused on…

Senior Supply Planning Manager

Johnson & Johnson Family of Companies

West Chester, PA

Lead the collaboration with manufacturing sites. Lead the improvement of supply reliability and customer satisfaction via STEM blueprint.

Program Manager

Zentech Manufacturing I

Baltimore, MD

Zentech has developed strengths in the required manufacturing processes for high reliability, high complexity, low-to-medium volume printed circuit boards and…

Senior Analyst, Finance Applications (SAP) Job


King of Prussia, PA

Supports the site HES policy and complies with all regulatory and internal requirements. Participates in HES activities provided by site management and Arkema…


Kennedy Krieger Institute

Baltimore, MD

Organizes an efficient system to manage, coordinate, and track CDPHLT student application reviews and phone interviewers for the Institute site.

Director, Enterprise Asset Management

Metropolitan Transportation Authority

Manhattan, NY

Work with leadership at all levels to develop new organizational capabilities roles and responsibilities for business intelligence, analytics, whole life…

Nurse Aide Trainee

Tel Hai Retirement Community

Honey Brook, PA

Discounted On-site childcare & adult day services. Must have demonstrated reliability in prior jobs and/or activities.

Treasury Support BA


New York, NY

Be responsible for monitoring production environment utilizing systems like Datadog and PagerDuty and issue resolution. Source Code repository experience (git).

Guest Ambassador

Liberty Science Center

Jersey City, NJ

Maintain an accurate register throughout the day and successfully reconcile all on-site payments, check-ins, and ticket/voucher redemptions.