Sr. Engineer - System (Site Reliability)
6325 Peachtree Dunwoody Rd Atlanta, GA | Contract To Hire
This Platform Engineer will be part of the Site Reliability Engineering (SRE) team. The SRE team is an innovative team devoted to providing automated solutions and services for our client to measure, evaluate and plan for visible, reliable application delivery and maintenance. As a member of the SRE team, you will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile Dev/Ops environment. The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability. This is an opportunity to create automation, monitoring, and pipelines to improve deploy and response time across the board. We are looking for engineers who are passionate about infrastructure as code and continuous deployment to build scalable and highly reliable applications.
If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.
What?? you?? will?? do:
- Automate anything and everything! (Infrastructure build out, testing, deploying, monitoring, etc)
- Design and assist in the authoring of software tools that reliably manage application delivery
- Design and assist in the setup and maintenance of application monitoring and alerting
- Engage with Development/Capability Teams to ensure best practices are implemented
- Improve predictability and reliability of software releases, workflows and operating software.
- Reduce application deployment windows by leading company towards a Continuous Deployment environment
- Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.
The skills we require:
- Python, Ruby, Go or other systems programming (moderate skills required)
- Experience with configuration management systems (Octopus, Chef, Puppet) - Experience rolling out redundant, mission-critical applications in a Windows environment
- Experience with version control systems (Git or SVN) - Experience with Cloud Computing platforms (Amazon AWS, Kubernetes, Heroku, etc)
- Experience with continuous integration tools (Jenkins, CircleCI, etc), Artifactory (or Nexus) - Excellent written communication, problem solving, and process management skills
- Desire to work in a fast paced, evolving, growing, dynamic environment
The?? skills?? we?? prefer:
- Linux system engineering expertise
- VMWare, VirtualBox experience.
- Experience supporting Ruby or Java applications
- Experience supporting Database Server infrastructure (MySQL, Postgres, etc) - Networking Knowledge - Experience with Hashicorp tools (Vagrant, Terraform, Packer, etc), Linux Containers (docker, rocket) - Experience with Java build tools such as Ant, Maven, Gant, or Gradle - Experience with agile development, continuous integration and automated testing - Experience with dashboarding, monitoring