Site Reliability Engineers @ GitHub - New York, NY

Job Overview

8 days ago

Site Reliability Engineers

GitHub - New York, NY

GitHub is seeking systems and software engineering professionals to join a new Operations and Reliability team. This team will be working to support, automate and improve the infrastructure that underpins Github’s new managed Github SAAS offering. This role is an opportunity to learn and grow on multiple fronts.

As an operations engineer you’ll be responsible for the day to day monitoring, administration and operations necessary to run Github in a cloud environment at scale. As a reliability engineer you’ll work to automate, iterate and improve the systems that make this SAAS offering work.

Most importantly you’ll help grow our culture of inclusion, collaboration and togetherness. We are a remote team and work with teammates across the world and in many different time zones. Mental health first is one of our team mantras. In a Covid world we want to build as healthy and sustainable a team as we can through empathy, flexible schedules and empowerment. Within those confines we are passionate about making a team that is scalable through shared knowledge and well defined best practices.

Our platform is helping deliver a cloud native experience to Github’s new SAAS offering. Docker, Kubernetes, infrastructure as code, managed cloud services and distributed security is where most of our time will be spent. Experience operating managed public services and knowing how to think about monitoring, automation, day to day administration and incident management will be key. Familiarity with paired programming, TDD, kanban and a desire to ship many times a day will fit well with our way of being.

This role at GitHub is an opportunity to blend your system design, empathy, and software engineering skills on an ever-changing set of novel reliability and operations challenges. Join us on this journey and have a meaningful impact on how the world builds software.

Responsibilities:

  • Ensure smooth day to day operations of GitHub’s SAAS offering.
  • Automate away as much of the day to day as possible - “Run By Robots” is the goal.
  • Drive organization wide best practices for monitoring, alerting and incident management.
  • Identify, respond to and collaborate with support and product teams to resolve production and customer issues and incidents.
  • Help drive new features, abilities and code changes into the core product with an operations focused point of view.

Minimum Qualifications:

  • Experience with at least one cloud platform, such as Azure
  • Comfort with the GNU/Linux operating system, particularly Ubuntu
  • Familiarity with Docker and Kubernetes
  • Experience with scripting and automation, particularly bash or PowerShell
  • Experience building infrastructure and automation
  • Experience with monitoring, alerting and operations
  • Experience with distributed systems with high availability requirements
  • Experience with incident response

Preferred Qualifications:

  • Experience with Azure (AAD, Security, ARM, AKS)
  • Experience with Kubernetes
  • Comfort with at least one modern programming language, such as Golang
  • Experience operating highly available systems at scale
  • Experience building and deploying software in a SAAS environment
  • Experience as incident commander

Please note that this application is for GitHub Infrastructure Engineering and you may have the opportunity to be considered for multiple teams.

Ability to meet GitHub, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft, GitHub’s parent company, Cloud Screen upon hire/transfer and every two years thereafter.

Who We Are:

GitHub is the developer company. We make it easier for developers to be developers: to work together, to solve challenging problems, and to create the world’s most important technologies. We foster a collaborative community that can come together—as individuals and in teams—to create the future of software and make a difference in the world.

Leadership Principles:

Customer Obsessed - Trust by Default - Ship to Learn - Own the Outcome - Growth Mindset - Global Product, Global Team - Anything is Possible - Practice Kindness

Why You Should Join:

At GitHub, we constantly strive to create an environment that allows our employees (Hubbers) to do the best work of their lives. We've designed one of the coolest workspaces in San Francisco (HQ), where many Hubbers work, snack, and create daily. The rest of our Hubbers work remotely around the globe. Check out an updated list of where we can hire here: https://github.com/about/careers/remote

We are also committed to keeping Hubbers healthy, motivated, focused and creative. We've designed our top-notch benefits program with these goals in mind. In a nutshell, we've built a place where we truly love working, we think you will too.

GitHub is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people of all walks of life. We don't discriminate against employees or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!

Please note that benefits vary by country. If you have any questions, please don't hesitate to ask your Talent Partner.

#LI-POST

Similar Jobs

DevOps Engineering Lead

New York Life Insurance Co

Lebanon, NJ

The DevOps Engineering Lead will be responsible for the DevOps transformation strategy execution, will bridge the gap between development, testing, change…

Site Reliability Engineering Manager

Wells Fargo

New York, NY

The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away…

IKP Site Reliability Engineer

HSBC

Jersey City, NJ

Balance feature development speed and reliability with well-defined service level objectives. Improve reliability, quality, and time to upgrade cluster and…

Software Dev Eng II - Ads, DSP Site Reliability Engineering

Amazon.com Services LLC

New York, NY

1+ years of experience contributing to the system design or architecture (architecture, design patterns, reliability and scaling) of new and current systems.

DevOps Engineer

Children's Hospital of Philadelphia

Philadelphia, PA

This position will work approximately 80% remote, 20% on site in our Philadelphia offices. Ensure service reliability and service availability to ensure…

Site Reliability Engineer (Observability and Monitoring)

Underdog Fantasy

Brooklyn, NY

Own UD's production environments hosted in GKE and Anthos and develop processes to maintain uptime requirements. 16 weeks of fully paid parental leave.

DevOps Engineer

1010data

New York, NY

We are seeking a seasoned Senior Devops Engineer with deep Linux and Kubernetes experience to work with a team of talented engineers and developers to build and…

Devops/Cloud Engineer

Qcom

Wayne, NJ

Recommend, develop and implement system enhancements that will improve the performance and reliability of the system including installing, upgrading/patching,…

Site Reliability Engineering Manager, Trello (Storage Layer)

Atlassian

New York, NY

You’re familiar with system design, site reliability engineering and databases. Assuming you have eligible working rights and a sufficient time zone overlap…

Site Reliability/DevOps Engineer - Opportunity for Working Remotely New York, NY

VMware

New York, NY

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Site Reliability/DevOps Engineer - Opportunity for Working Remotely Bridgeport, CT

VMware

Bridgeport, CT

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Site Reliability/DevOps Engineer - Opportunity for Working Remotely Philadelphia, PA

VMware

Philadelphia, PA

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Site Reliability/DevOps Engineer - Opportunity for Working Remotely Newark, NJ

VMware

Newark, NJ

You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org.

Senior DevOps Engineer, VP - hybrid

MUFG

Jersey City, NJ

Experience implementing enterprise systems with security best practices and site reliability engineering principles. Bring code assets under version control.

Site Reliability Engineer

Jotform

Manhattan, NY

This is a full-time, fully remote opportunity in the Pacific time zone, though an exception can be made for a great fit located elsewhere in the U.S. who is…

Site Reliability Engineer / SRE : 10+ years exp needed

PC Services inc

New York, NY

Design, implement and monitor the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for the services you are supporting.

Infrastructure Site Reliability Engineer

Schrödinger

New York, NY

This position presents the unique opportunity to support researchers and developers who are continually breaching the boundaries of what's possible in drug and…

Site Reliability Engineer

infoObject

Philadelphia, PA

Interview*: 2 rounds of interviews: 1st round (30min MS Video Teams Interview), 2nd Interview: 1 hour w/ 3 Engineers on the team. 5-6 years of experience.

Site Reliability Engineer

Comcast

Philadelphia, PA

Seek out potential threats to security and reliability, advocate solutions, and assist teams to aim to successful resolution.

Site Reliability Engineer, Americas

Canonical - Jobs

New York, NY

Our site reliability engineers bring Python software-engineering skills and rigour to the operations domain. A wide range of engineering disciplines and career…

Site Reliability Engineer, Americas

Canonical - Jobs

Philadelphia, PA

Our site reliability engineers bring Python software-engineering skills and rigour to the operations domain. A wide range of engineering disciplines and career…

.Net Platform Engineer (CMS)

Comcast

Philadelphia, PA

Experience developing service-oriented architectures and an understanding of design for scalability, performance and reliability.

Site Reliability Engineer

JPMorgan Chase Bank, N.A.

Jersey City, NJ

Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

Site Reliability Engineer - Private Cloud

JPMorgan Chase Bank, N.A.

Jersey City, NJ

§ Apply standards of cloud compliance to application design to achieve reliability. § Experience in site reliability engineering in one of the following…