Senior Site Reliability Engineer--REMOTE @ Splunk - Philadelphia, PA

Job Overview

4 months ago

Senior Site Reliability Engineer--REMOTE

Splunk - Philadelphia, PA

Join us as we pursue our disruptive new vision to make data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we’re committed to our work, our customers, having fun, and most importantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!

Role

As part of Splunk's Cloud-First mission, Site Reliability Engineering (SRE) is accountable for the overall reliability of services running in our cloud production environments. We are software engineers who engage with product and infrastructure teams at every level, from directly embedding on their teams to tagging in for the gnarliest of production challenges. SREs combine the skills of software development with infrastructure engineering. Our goal is to make Splunk's production environments more transparent, more predictable, and less cognitively demanding for Splunk's service owners to operate their services in.

You Will:

  • Design and develop software to maximize the engineering velocity of teams and increase the reliability of Splunk products.
  • Mentor new software engineers to achieve more than they thought possible.
  • Work across the organization to deliver quality products that delight Splunk's passionate users.

Qualifications:

  • 5 years of related experience with a technical Bachelor’s degree; or equivalent practical experience;
  • Blend of software engineering experience and infrastructure and automation experience with a focus on how software runs in production.
  • A passion for automating away common tasks and processes.
  • You are dedicated to writing well tested and maintainable code.
  • Understanding of observability libraries and ability to instrument code to expose new application metrics
  • You enjoy making other teams successful and are fulfilled through the success of others.
  • You enjoy designing, developing, and maintaining distributed systems at scale in production.
  • You understand the challenges and trade-offs to be made when building and deploying systems to production.
  • Knowledge of standard methodologies related to security, performance, and disaster recovery.
  • Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.

Preferred skills:

  • Experience with Golang, Gitlab CI, Qbec, OpenAPI/Swagger.
  • Experience with Kubernetes environments and understanding of multi-tenancy and security implications.
  • Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing.
  • Experience with development and deployment in a hosted cloud environment, preferably AWS & GCP.
  • Experience with distributed cloud service development, infrastructure, traffic management and architecture.
  • Experience with optimized and scalable software that operates on a large number of nodes.

Splunk provides a constant stream of new things to learn. We're always expanding into new areas and exploring new technologies. We believe in growing engineers through ownership, leadership, and provide a mentorship program that encourages growth to both mentors and mentees.

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

(Colorado only*) Minimum base salary of $115,000.00. You may also be eligible for incentive pay + equity + benefits.*Note: Disclosure per sb19-085 (8-5-201 et seq).

Similar Jobs

Site Reliability Engineer

JPMorgan Chase Bank, N.A.

Wilmington, DE

Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

Lead Site Reliability Engineer

Hudson's Bay

New York, NY

Design and develop scenarios for new site functionalities and perform Load Testing for high volume Ecommerce websites of HBC using tools such as Gatling,…

Site Reliability Engineer

Infinity Consulting Solutions, Inc.

Philadelphia, PA

In this role, you will advance monitoring, reporting and alerting capabilities, enhance existing systems so that they repair themselves, automate tasks as well…

Senior Site Reliability Engineer

Angi

New York, NY

Use service level information to determine reliability on our Telemetry Platform. Experience identifying changes that improve processes from a reliability and…

Site Reliability Engineer

Piper Companies

Philadelphia, PA

Keywords: automation, systems engineering, cloud engineering, sre, site reliability, public cloud, hybrid cloud, azure, aws, devops, ansible chef, puppet,…

Lead Site Reliability Engineer, CloudOps

NBCUniversal

Englewood Cliffs, NJ

This is a critical role in NBC’s Ad Sales Custom Development organization. Build Automation into all aspects of the DevSecOps process using tools like GitLabs,…

Site Reliability Engineer ( SRE ) - Remote

Webstaurant Store, Inc.

Lititz, PA

Likewise, systems engineers that have a desire to improve infrastructure and to reduce repetitive tasks also make a good fit. We use Ansible and Terraform.

DevOps Developer

Comcast

Philadelphia, PA

Working with fellow DevOps Engineers to build and maintain our production tools to ensure ongoing reliability while improving development team efficiency.

Site Reliability Engineer

Adobe

New York, NY

You will create automated processes that will streamline team workflows, and cultivate relationships with peers and leadership across the Creative Cloud teams.

Senior Site Reliability Engineer

Celonis SE

New York, NY

A track record of proactive monitoring and automation as the base for reliability. Site Reliability and Platform Engineering.

Senior Linux Site Reliability Engineer: Pacemaker - Remote

SAP

Newtown Square, PA

You will be identifying and resolving architectural and design issues in existing Pacemaker setup, developing automation to ensure stability and reliability of…

Site Reliability Engineer II - Fully Remote

Zachary Piper Solutions

Baltimore, MD

The Site Reliability Engineer II will implement reliable infrastructure solutions according to the strategic direction of the team and support operations of the…

Site Reliability Engineer

Macquarie Group Limited

Philadelphia, PA

Knowledge of DevOps and Site Reliability Engineering principles. Development knowledge using Java or Python,. Ability to create SQL statements, Windows and Unix…

Senior Site Reliability Engineer

Pearson

Trenton, NJ

This role requires a generalist who can contribute with needs in development, system operations, infrastructure as code, automation, observability, security…

Senior Site Reliability Engineer

Pearson

Dover, DE

This role requires a generalist who can contribute with needs in development, system operations, infrastructure as code, automation, observability, security…

Senior Site Reliability Engineer

Pearson

Harrisburg, PA

This role requires a generalist who can contribute with needs in development, system operations, infrastructure as code, automation, observability, security…

Senior Site Reliability Engineer

Pearson

Annapolis, MD

This role requires a generalist who can contribute with needs in development, system operations, infrastructure as code, automation, observability, security…

Site Reliability Engineer

JPMorgan Chase Bank, N.A.

Jersey City, NJ

Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

REMOTE Sr. Site Reliability Engineer (SRE), Product Operations

Pluto TV

New York, NY

The Engineer in this role will lead the charge during critical incidents, enabling the Production Operations group to increase reliability across various tools…

Engineer, Devops and Systems Engineering

Comcast

Philadelphia, PA

Works with engineering project management and lead engineer to deliver applications that meet or exceed product requirements, project schedules and reliability.

Site Reliability Engineer, Security

Contegix

Philadelphia, PA

There are no management or leadership requirements within this role. You will be responsible for the technical design, planning, implementation, performance…

Site Reliability Engineer

Jotform

Manhattan, NY

This is a full-time, fully remote opportunity in the Pacific time zone, though an exception can be made for a great fit located elsewhere in the U.S. who is…

Senior Site Reliability Engineer

Wonder

New York, NY

The platform engineering team manages a fast-paced and constantly growing environment that seeks to implement cutting-edge processes, tools and frameworks to…

Professional Site Reliability Engineer- (Remote)

Broadridge

Newark, NJ

This is a remote role where you will work off-site. Implements additional operational improvements for automation, monitoring, and incident management to…