Engage in and improve the whole lifecycle of software development services— from inception and design, through deployment, operation, and refinement.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health in a 24x7 environment.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless postmortems.
Binding and orchestrating the system infrastructure with the application layer to enable High Availability/Clustering load balancing and integration;
Provide technical guidance or support for the development or troubleshooting of systems;
Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLOs, SLIs, and SLAs and get proactive notifications of possible issues for all systems;
Develop automated solutions to address potential problems before they result in a service interruption and demonstrate a passion for automation, including CI/CD automation;
Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria.
Bachelors of Science degree in Computer Science, Engineering, or equivalent relevant experience.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive;
Overall 6+ years of experience in one or more of the following
Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus);
You' ve created automation using Chef, Puppet or another SCM tool; Docker and container scheduler services such as ECS or Kubernetes is desirable;
You' ve worked with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper;
Experience as SCM/release engineer, or in a position with similar skill sets and responsibilities (Software Engineer, Systems Engineer, Systems Administrator);
Experience in configuring and administering JavaEE application servers (Tomcat, WebSphere, WebLogic, etc.);
Experience in with scripting language such as Unix Shells, Python, Perl, Shell, bash, ksh);
Experience in configuring, building, and supporting apps and operations in a public cloud environment (AWS, Azure, GCP);
Experience with Monitoring and Logging tools (Elastic Search, ELK, AppDynamics, Splunk, etc.);
Collaborate well with team members, developers, QA, and ownership teams to resolve issues;
Knowledge of Agile / Scrum methodologies and principles;
A real passion for and the ability to learn new technologies.