TRACTIAN

Senior DevOps Engineer

Job Posted 1 year ago

Job Description

Engineering at TRACTIAN

The Engineering team at TRACTIAN builds and operates the cloud-native backbone that powers our industrial IoT platform. We design for massive scale, high reliability, and security across AWS, Azure AKS, and Oracle Cloud (OCI) Kubernetes clusters.

What you'll do

- Own end-to-end delivery pipelines—from GitHub commit to production—running on GitHub Actions, ECS Fargate, AKS, and OCI Kubernetes.
- Evolve our multi-cloud, multi-cluster architecture (AWS + OCI) with zero-trust networking.
- Write and maintain IaC (Terraform + Terragrunt), Helm charts, and Kubernetes operators to automate everything.
- Optimize observability: build dashboards/alerts using Grafana OSS stack, Prometheus, Loki, Tempo, and Datadog.
- Troubleshoot complex incidents involving microservices, monoliths in containers, and AI workloads on GPU nodes.
- Improve security posture—harden images, manage secrets, enforce policies, and audit compliance.
- Help other engineers on DevOps best practices and drive continuous improvement.

Responsibilities

  • Apply DevOps practices to increase deployment speed, security, and quality.
  • Architect and run CI/CD workflows in GitHub Actions (matrix builds, reusable workflows, OIDC federation).
  • Design, build, and maintain Terraform/Terragrunt modules for VPCs, subnets, security groups, side-to-side VPNs, and private links.
  • Manage container orchestration on ECS Fargate and Kubernetes (AWS & OCI) with Helm, Keda.
  • Implement autoscaling, blue-green / canary releases, and cost-optimization for GPU and CPU workloads.
  • Diagnose performance bottlenecks across network, compute, storage, and application layers.
  • Maintain high-quality documentation.

Requirements

  • B.S. in Computer Engineering, Information Systems, or equivalent experience.
  • Strong scripting skills (Python, Bash); Go or Rust a plus.
  • Hands-on CI/CD with GitHub Actions and experience running production workloads on:
  • AWS: ECS Fargate, S3, RDS, CloudWatch, VPC networking.
  • Kubernetes: OCI OKE, Helm, Istio, Keda.
  • IaC expertise with Terraform and Terragrunt in multi-account/multi-cloud setups.
  • Solid networking foundations: VPC design, subnets, routing, VPN/IPSec tunnels, security groups, load balancers.
  • Observability stack experience (Grafana, Prometheus, Loki, Tempo, Datadog).
  • Familiarity with container security, SBOMs, image scanning, secret management, and least-privilege IAM.
  • Excellent problem-solving skills, ownership mindset, and ability to work autonomously within a distributed team.

Ready for Your Next Step?

To apply for this position, please use the link below. You will be redirected to the official application page on the company's website.

More jobs at TRACTIAN