Linux Foundation Certified Kubernetes Application Developer, Administrator and Security Specialist (3 x Kubernetes Certified) with 12 years of experience in Tech, working with Web Servers, Mail Systems, High Performance Computing Clusters, Blockchain and AI/ML, with passion for automation and Open Source technologies.

Skills

IAC and Configuration Management Tools

  • Ansible
  • Terraform
  • Pulumi
  • Chef
  • Packer

Version Control

  • Git
  • GitHub
  • GitLab
  • SVN

Cloud

  • AWS
  • DigitalOcean

Container

  • Docker
  • Kubernetes
  • Nomad
  • Consul

Scripting

  • BASH
  • Python
  • Groovy

CI and Build Tools

  • Jenkins
  • Github Actions
  • Buildkite
  • Bazel
  • Make
  • CMake

Linux Operating System

  • RHEL
  • CentOS
  • Ubuntu Desktop
  • Rocky Linux

Databases

  • PostgreSQL
  • MySQL

Web Servers and LBs

  • NGiNX
  • Apache HTTPD
  • Caddy Server
  • Fabio

Security

  • OpenSSL
  • PKI
  • SSL/TLS Certificates

Collaboration Tools

  • JIRA
  • Confluence

Observability Tools

  • Cloudwatch
  • Graylog
  • Sensu
  • Prometheus
  • Grafana
  • OpenTelemetry
  • Splunk
  • Nagios/Icinga
  • LibreNMS

Blockchain

  • Hyperledger Fabric

Artifact Repositories

  • Jfrog Artifactory
  • Nexus

Work Experience (7)

Senior DevOps Engineer
Wonolo Inc.
Mar 2022 - Current

Wonolo Inc. is SaaS company, providing a two-sided tech marketplace connecting companies with workers. The platform helps worker find flexible work based on their schedule.

Part of a small CloudOps team using IaC and DevOps approach in supporting Wonolo's AWS Infrastructure. The role also involves working with closely with several internal teams, providing technical support in resolving critical business challenges.

  • Migrated Wonolo's Configuration management system from Chef to Ansible. Wrote the Packer Build AMI for creating base images which included running Ansible using `ansible-pull` method. Later worked on writing Ansible Roles for deployment of Buildkite Agents and OpenTelemetry Collector including Molecule testing framework and goss.rocks server validation tooling. Performed migration in production environment with no customer impact while ensuring stability of existing platform.

  • Improved Buildkite CI build time by deploying AWS Lambda functions to manage queues and workload. Previous to this change, Autoscaling was configured based on CPU utilization, which meant until the threshold was met, scale-out even didn't take place causing long wait times of sometimes up to 5 minutes. With the Lambda functions deployed, wait time was cut down to 45 second on average. This reduction in wait times resulted in quicker builds, which provided quicker deployments to production.

  • Upgraded Wonolo's SaaS Platform AWS Aurora PostgresSQL DB upgrade from 10 to 14. This require months of testing to ensure that the critical Production database upgrade goes smoothly, the data corruption wouldn't occur and the application still worked with the newer version of PostgresSQL engine.

  • Created new and existing AWS resources using Terraform. Regularly worked on writing Terraform code for AWS LBs, Lambdas, Autoscaling groups, Launch Templates, Security Groups, RDS, EC2, Translation, Cloudwatch, etc.

  • As a cost saving measure, worked to remove older generation EC2 instances from terraform deployments. Moved DB clusters to AWS Graviton instances, taking advantage of better price performance. Also, eliminated the need for AWS Data Migration Service for syncing production data to lower environment by switching to `pgsync` an open source tool which could be run on existing infrastructure at no additional cost.

  • Performed several AWS Account migration of Wonolo acquired businesses. This includes moving services like AWS S3, Athena and MongoDB to Wonolo managed accounts. As part of this effort, also worked on migrating Git repositories from gitlab.com to github.com.

  • Wonolo utilizes AWS EC2 spot instances for running production workload to keep infrastructure costs low. The production workload runs on HashiCorp Nomad which has issues with handling EC2 spot interruptions where AWS terminates the instances when their platform needs capacity. I wrote a custom spot interruption handler to ensure that when spot interruption notice was issues, Nomad worker was cordoned to avoid scheduling workload on the node providing a graceful termination scenario. This custom handler queried AWS EC2 Metadata endpoint to determine termination time, and if one was provided the node was set to be ineligible to run any new workload while draining existing workload avoiding sudden loss in workload capacity.

  • Improved overall monitoring posture by integration CloudWatch for sending Slack and OnCall alerts for critical resources. Also setup metrics and dashboard in Grafana for AWS resources and CI.

Senior Site Reliability Specialist
Kira Systems Inc.
Oct 2021 - Mar 2022

Kira Systems is a SaaS provider, with world's largest law firms and corporations as their clients, providing Artificial Intelligence and Machine Learning, helping in understanding the contents of their legal documents.

Part of the Core Operations team supporting both Development and Production workloads. Working closely with internal teams, providing engineering support on demand.

  • Maintained Infrastructure as Code deployments for Development environments and supporting services.

  • Increased observability of critical development infrastructure using Prometheus/Alertmanager, ensuring development teams remain focus on high business value task.

  • Updated Jenkins Infrastructure and CI pipelines to decrease build failures and provide higher reliability.

  • Reduced CI build time in Jenkins for the main monorepo, consisting of 80 plus individual components, from 1 hour 40 minutes to 58 minutes by simply rewriting the pipeline running parallel batches building interdependent components to fully parallelized locking based approach. This ensured that compute resources were utilized for the maximum amount of time, while reducing CI build time by 40%, accelerating software development cycle resulting in faster releases to production.

  • Spearheaded several initiatives successfully to reduce technical debt.

  • Wrote Github Action automation bots in Python for development CI process to enforcing security requirements.

  • As the senior team member, provided technical assistance to members regularly.

Manager, Verified.Me Support
SecureKey Technologies Inc.
May 2019 - Oct 2021

SecureKey is an Identity and authentication provider with Canadian Federal, Provincial Governments and Fintechs as clients and partners.

Led SecureKey's Verified.Me Identity Verification Service's Level 3 support team. Member of the engineering team tasked with implementing automation and resolving day-to-day technical challenges. Heavily contributed in technical projects ensuring timely achievement of objectives.

  • Started as a Level 3 Team Lead being promoted to Manager within a few months.

  • Involved in calls to help high value clients, addressing technical and integration issues.

  • Part of project management team, working through several coordinated upgrades and migrations.

  • Assisted developers in solving deployment related issues with docker-compose.

  • Converted docker-compose based local development deployment to Kubernetes compatible Kustomize deployment.

  • Wrote and maintained CI/CD pipelines to accelerate both test and development process, enabling faster release to Production.

  • Implemented CI pipelines for deployment code repositories, a stepping stone towards GitOps.

  • Created Analytics pipeline using only server and application logs which provided much needed visibility into user transaction flows, filling software implementation gaps. This pipeline was further utilized for troubleshooting and discovering issues within the distributed system.

  • Created Analytics Dashboard for Production system usage. Python code was used for extracting information from database, transforming the data to required format and exposing it via API endpoint.

DevOps Engineer
SecureKey Technologies Inc.
Oct 2016 - May 2019

Part of a small DevOps team tasked with developing Platform, writing Infrastructure-as-Code and configuration management; enabling deployment of SecureKey's new Blockchain based Identity Verification Service, Verified.Me.

  • Used tools such as Git, Terraform, Ansible, Jenkins and Docker for deployments.

  • Wrote and maintained Infrastructure as a Code repositories for automated deployments on both on-perm and cloud infrastructure.

  • Build and shipped secure Docker images to clients and partners as part of regular release process.

  • Assisted clients with technical issues ranging from service related as well as client's internal networking issues.

  • Assisted in redesigning client network architecture.

  • As Subject Matter Expert of Verified.Me, I was regularly consulted for L2/L3 support.

Systems Administrator
SecureKey Technologies Inc.
Jun 2016 - Oct 2016

Joined as a Systems Administrator Supporting SecureKey's Government Sign-in by Verified.Me Service. After recognizing my skills and passion for automation; I was offered the role of DevOps Engineer and joined the Engineering team after 4 months being part of Production Support Team.

  • Day-to-Day operations and maintenance of the server infrastructure supporting cloud services with a focus on Red Hat Linux, JAVA application servers, MySQL, and VMWare ESX.

  • Implemented production changes in out-of-hours maintenance windows.

  • On-call for critical incident support on a rotational basis.

  • Provided input to post implementation reviews and RCAs.

  • Managed issues using an ITIL-compliant ticketing system.

Linux Systems Engineer
Integra Technologies FZE
Aug 2015 - Mar 2016

Integra Technologies is a Red Hat Advanced Partner and AWS Standard Consulting Partner that focuses on Linux based solutions for enterprises.

Part of a small team providing consultancy services to wide range of clientele such as Telecom Service Providers, Educational Institutions, Airports, Airlines, Automotive Dealerships, Food Services, and Healthcare sector.

  • Installed and provided support for all Red Hat products such as Red Hat Enterprise Linux, Red Hat Satellite, Red Hat High Availability Clusters.

  • Met with potential clients to discuss their IT enterprise needs.

  • Provided 24/7 on-call support to clients on daily basis.

  • Performed remote troubleshooting to effectively resolve technical issues faced by the clients.

  • Provided clients with project specifications and technical documentations.

Linux System Administrator
Science and Technology Facilities Council
Jul 2012 - Jul 2015

One of Europe’s largest multidisciplinary research organizations.

Part of a small but dedicated operations team administering hundreds of thousands of servers part of High Performance Computing Cluster. These resources were used within the organization and also by the wider UK academic research community including prestigious research universities.

  • Administered CPU and GPU RHEL based compute clusters for UK academia.

  • Provisioned new hardware equipment including PDUs, Servers, Network switches.

  • Benchmarked new High Performance Computing cluster to determine peak performance.

  • Handled complex tasks on daily basis, ranging from simple login issues, expired server and client certificates to more involved such as troubleshooting MTU misconfiguration on network switches.

  • Liaised with vendors for replacement of faulty hardware.

Education (1)

Bachelor of Engineering
Communications Engineering
Coventry University
2008 - 2012

Certification (5)

Certified Kubernetes Administrator (CKA)
CNCF
2021
Certified Kubernetes Application Developer (CKAD)
CNCF
2021
Certified Kubernetes Security Specialist (CKS)
CNCF
2022
Red Hat Certified Engineer (RHCE)
Red Hat
2014
Red Hat Certified Systems Administrator (RHCSA)
Red Hat
2014