Aman amanycodes

Aman Yadav

project-copacetic/copacetic — A CLI tool written in Go and based on buildkit that can be used to directly patch container images without full rebuilds.
prequel-dev/cre — A database of Common Reliability Enumerations (CREs) developed by the community.
prequel-dev/preq — preq (prounounced "preek") is a free and open community-driven reliability problem detector used to detect bugs, misconfigurations, anti-patterns, and known issues from a community of practitioners.
prequel-dev/reliability-research-agent (private) - CrewAI automation that renders Helm templates and audits them against 10 reliability best practices—PodDisruptionBudget, TopologySpreadConstraints, HorizontalPodAutoscaler, PriorityClass, CPU/Memory requests & limits, and Liveness/Readiness probes.

pine-gate — ai gateway written in go that serves as a unified entrypoint for large language models (llms) like ollama, openai, and local models. supports multi-model routing, api-key based access control, and real-time observability via prometheus and jaeger.
chashma — ai observability stack built for model monitoring and latency tracking. integrates prometheus for metrics, jaeger for tracing, and opentelemetry for structured telemetry.

Bulk Image Patching on project-copacetic/copacetic — implemented bulk patching, enabling batch updates across different container architectures and improving ci/cd automation for image maintenance. improved runtime performance by optimizing patch layer handling.
Non-Zero Exit Codes on project-copacetic/copacetic — addressed reliability by adding structured exit codes and failure classifications. improved observability by exposing patching errors as metrics for prometheus and integrating them into test pipelines.

Distributed systems: understanding coordination, consistency, and fault tolerance in large-scale architectures.
Kubernetes internals: diving deep into networking, scheduling, and how kube-scheduler decisions impact workload performance, resource efficiency, and service connectivity inside multi-node clusters.
Reliability research: studying how to detect, classify, and mitigate reliability issues in distributed infrastructure by correlating metrics, traces, and logs to uncover failure patterns and prevent outages.