Loading video player...
2025 was the year Aentic AI went from
interesting experiment to daily reality.
AI agents stopped being autocomplete
tools and started understanding entire
code bases, refactoring across files,
writing tests, debugging their own
mistakes, managing infrastructure and
handling operational tasks. For many
developers and ops engineers, the way
they work fundamentally
changed. 2026 is the year to take this
serious, not just for application
developers, but for DevOps engineers,
SRRES, and platform teams. The tools are
mature enough now. The productivity
gains are real. If you're not
integrating AI agents into your
workflow, you're leaving significant
significant value on the table. But
here's the thing. Aentic AI doesn't
replace everything else. You still need
solid foundations. Internal developer
platforms, testing frameworks, scripting
languages, development environments.
These tools still matter. What's changed
is that AI now intersects with all of
them. So this year's recommendations
cover both the AI tools that emerged in
2025 and the nonAI tools that remain
essential. I spent 2025 testing all of
them in real projects, real workflows,
real problems, not quick demos. What I
am sharing today are my recommendations
for 2026. It's not a comprehensive list,
not a natural comparison. Those are the
tools I actually use, the ones that
survived months of daily work, and the
ones I think you should seriously
consider. I will also point out what
didn't make the cut and why. Some of my
choices will be obvious, others might
surprise you. A few will probably make
you disagree, and that's fine. The goal
isn't to tell you what to think, is to
give you a practitioner's perspective so
you can make better decisions for your
own stock.
The AI model landscape is evolving at
the pace that makes any definitive
ranking obsolete within months, maybe
even weeks. The model that leads
benchmarks today will likely be
surpassed tomorrow. New releases drop
constantly, capabilities leap forward,
and pricing changes overnight.
Any recommendation here comes with a
built-in expiration date. That said,
some patterns are emerging. Certain
model families consistently deliver top
results for software engineering tasks.
understanding code, generating accurate
implementations, reasoning debugging
scenarios, and working with
configurations and manifests. The
leaders in coding benchmarks tend to
stay near the top even as new models
appear, and that suggests that some
providers have figured out what matters
for engineering work. For open source
options, you can run locally or
self-hosted on your own servers. Llama 4
offers a massive 10 million token
context window. Quen delivers strong
coding performance with an Apache 2.0
license and DeepSseek provides
impressive impressive capabilities at a
fraction of the price. Those models are
good and they're getting better, but
they're still not quite on par with
closed models. They might be close, but
not there yet. The other challenge is
compute requirements. You might not have
the hardware needed today and those
requirements are likely to increase as
models grow larger. Among the
proprietary alternatives, open air GPT
remains widely used with the largest
ecosystem. Mistral offers a strong let's
say a strong European alternative with
frontier performance at reduced costs.
Coher focuses on enterprise rug use
cases and XAI gro brings real time data
access. That being said, anthropic
models are my choice for software
engineering work. Clot consistently all
the time leads SV bench and other coding
benchmarks and it is often by a
meaningful margin. This isn't
accidental. Antropic has made software
engineering a core focus in a way that
other providers haven't. Their first AI
conference was entirely dedicated to
coding and developers. Cloud code is the
best terminalbased agent available.
Cursor the leading AI native IDE uses
code as its default model or at least it
used maybe it changed recently. I'm not
sure. And the results speak for
themselves. Cloud understands code bases
deeply generates accurate
implementations and reasons through
complex debugging scenarios better than
alternatives. for infrastructures code,
Kubernetes manifests, and configuration
files. The difference is noticeable. The
model understands context and produces
code that works on a first try more
often than competitors. Closet 4 now
supports a 1 million token context
window, matching Gemini's previous
advantage in this area. Now, the
trade-off is cost. Plot's API pricing is
higher than Geminis's and rate limiting
on the consumer plans can be frustrating
for heavy users, but for professional
software engineering work, the quality
difference justifies the price. Google
Germany is a close second. Cost
effectiveness is excellent with a
generous free tier and competitive API
pricing. Multimodel capabilities are
strong. They're really good. But if you
need to work with images and diagrams or
documentation that includes visuals. Now
Gemini can feel less refined than code
for some coding tasks. The output
quality is it's good but not quite at
close level for complex software
engineering work. That said the gap has
been narrowing and for many use cases
Germany delivers excellent results at
lower cost. If budget is a primary
concern, Germany is a good not good
solid choice.
We are witnessing a seismic shift in how
software engineers work. AI agents have
moved from novelty to necessity for many
application developers. They're not just
autocompleting lines anymore. They are
understanding entire code bases,
refactoring across multiple files,
writing tests, and even debugging their
own mistakes. The productivity gains are
real and they're substantial. However,
this shift hasn't fully reached the ops
world yet. DevOps engineers, SRRES, and
platform teams are still largely working
the way they did before AI agents became
mainstream. Part of this is the nature
of the work. Ops tasks often involve
production systems where mistakes have
immediate consequences, complex
debugging across distributed systems,
and tribal knowledge about why things
are configured a certain way. AI agents
struggle with context that lives in
runbooks, slack threads, and people's
heads. That said, 2026 is likely the
year this changes. The tools are
maturing rapidly. Context windows are
expanding and agents are getting better
at understanding infrastructures code
and Kubernetes manifest and cloud
configurations. If you're in ops and
haven't started experimenting with AI
agents, now is the time to get familiar.
The learning curve exists. It's there
and you don't want to be climbing it
when everyone else has already
integrated these tools into their
workflows. Now two directions are
emerging in this space. The first
follows traditional development patterns
through IDEs where AI augments the
familiar editing experience with inline
completions and chat panels and context
aware suggestions.
The second is terminal based agents that
take a different approach and they're
often operating more autonomously and
integrating better with command line
workflows that ops teams already use.
Another key distinction is model
flexibility. Some agents are tightly
coupled to specific models, which can
mean better integration for that model
strengths, but also vendor lockin.
Others are model agnostic, letting you
swap providers or use local models at
the cost of potentially less
optimization. In the ID camp, GitHub
copilot remains the most widely adopted
with seamless multi integration and
reliable completions.
Though it struggles with large scale
refactoring, Windinssurf offers a great
beginner experience with unlimited agent
access and planning mode for multi-step
tasks. Open source ID options like
continue and client provide model
agnostic alternatives but lack the
polish of commercial offerings. For
terminalbased agents tied to specific
models, Gemini CLA brings Google's
massive 1 million token context window
to the command line with the generous
free tier, though benchmark scores luck
behind competitors. OpenAI Codex offers
flexible reasoning levels and strong
GitHub integration, but has UX
challenges. Cloud specific options like
Amazon Q developer excel within their
ecosystems but offer limited value
outside them. My recommendation comes
down to three tools that cover different
use cases. I prefer terminal based
agents but some people prefer working in
IDs. So no judgment on that one at at
least not today. Within the terminal
camp, some want the best experience
regardless of vendor lockin while others
prioritize model of flexibility. All
three are valid choices depending on
your priorities. For ID users, cursor is
the clear clear winner. It's a VS code
fork with a deeply integrated into the
editing experience. You get inline
completions, chat, and precise context
control. The UI first experience is
polished and fast. The downsides are
rate limits that can be frustrating for
heavy users and pricing changes that
have upset some of the community. But if
you live in your ID and you want AI
assistance without leaving it, cursor is
the best option available. Heads down.
For terminal users who want best
experience, Cloud Code is in a league of
its own. It has the highest SV bench
scores, a true 200K token context window
and excels at autonomous multifile
operations. It understands entire code
bases and it can work through complex
tasks with minimal handholding. The
catch is that it only works with
entropic models. If you're comfortable
with that lockin, cloud code delivers
results that other terminal agents
cannot match. They can't. I use it
daily. Now for terminal users who want
model flexibility, open code is the best
choice. It is a true open-source cloud
code alternative that works with any
model provider. It's still behind cloud
code in capabilities and it has a
smaller community but is the right
choice if you want to avoid being locked
to a specific family of models. As the
model landscape continues to shift,
having the freedom to switch providers
without changing your tooling has truly
real value.
We enter the phase where companies need
to build their own AI agents.
Off-the-shelf agents are great for
general purpose tasks, but every
organization has unique workflows,
internal tools, and domain knowledge
that generic agents cannot tap into.
This is especially true for internal
developer platforms. If application
developers are increasingly using AI
agents as their primary interface for
getting work done, then platform teams
need to expose platforms capabilities to
those agents. Otherwise, developers will
be constantly context switching between
their AI assistant and the platform
portal and that defeats the purpose of
both. The way to bridge this gap is
through model context protocol or MCP
servers and custom agents. MCP allows AI
agents to discover and use tools exposed
by your platform. Custom agents can
encode your organization specific
workflows and policies and tribal
knowledge and together they enable
scenarios like developers asking their
AI agent to provision a new environment
to check deployment status or
investigate a production incident and
all that without leaving their normal
workflow because remember their
workflows are atic. Now, building custom
agents requires SDKs and frameworks. The
landscape here is still maturing with
options ranging from low-level SDKs that
give you maximum control to highlevel
frameworks that handle orchestration,
memory, and tool management for you. The
right choice depends on how much
complexity you want to manage yourself
versus how much you want abstracted
away. Now given how fast the model space
is moving, I believe we should be
building agents in a way that allows us
to switch models at any time. That makes
SDKs tied to a specific vendor a bad
choice, even if they offer features that
might not be available elsewhere.
Until clear, long-term winners in the
model space emerge, our agents must be,
and I repeat, must be agnostic. All that
rules out vendor specific options like
Contropic SDK and Google ADK and OpenAI
agents SDK as primary choices despite
and I repeat despite their polish and
tight integration with their respective
models. Microsoft semantic kernel and
autogen offer more flexibility but still
lean heavily into the Azure ecosystem.
For multi-agent orchestration, crew AI
has gained some traction with its
intuitive role-based approach. Lang
graph offers lung chain with graph-based
architecture offering low latency and
time travel debugging for simpler use
cases with type safety. Pentic AI offers
a fast API like experience. If you're
building for Kubernetes environment,
specifically K agent is the first
open-source agentic AI framework
designed for Kubernetes. And now a CNC
project with built-in integrations for
Argo, Kelm, East, and Prometheus.
However, I think K agent has a lot left
to be desired. It's useful for those
wanting to create agents quickly, but
not for custom agents that can be taken
seriously. There's more to agents than a
way to define a system prompt and
connect it to MCPs. KMCP is more
interesting as it helps you build, test,
and deploy MCP servers to Kubernetes
with proper life cycle management
through CRDs. That's an interesting
choice for building custom agents.
Versel AI SDK stands out as the best
choice. The primary reason is model
agnosticism. It supports dozens of
providers like OpenAI, Antropic, Google,
Coher, Deepseek and many more. Through a
unified API, you can swap providers
without changing your code, which aligns
with the principle that our agents must
not be locked to specific models. Beyond
provider flexibility, Versel AI SDK
offers simplicity that other frameworks
lack. where long chain requires
instantiating objects and managing
abstractions where Cell SDK uses simple
function calls with less boiler plate.
The learning curve is gentler streaming
is built in with React hooks like use
chat and use completion and that makes
realtime UIs trivial to implement. It
works across frameworks including NexJS,
React, Swelt, Vue, VU, whatever is
pronounced, NXT and NodeJS. The SDK is
actively maintained with AI SDK 5 adding
aentic loop control and type safe chat
and tool enhancements. The documentation
is excellent and if you need long chains
complex orchestration for specific use
cases, Versella SDK integrates with it,
giving you the best of both worlds. The
downside is that it is TypeScript first.
If you're not familiar with TypeScript,
you will need to either learn it or look
for an alternative. That being said, if
you're an experienced developer, picking
up a new language shouldn't be a
significant barrier, right?
Now, despite the name, this category
isn't just about reviewing application
code. It covers AI powered review of
anything you push to get. Kubernetes
manifest, terapform configurations, Helm
charts, CI/CD pipelines, shell scripts,
documentation.
If it lives in a repository and it goes
through a pull request, these tools can
review it. This is important. AI code
review is a nobrainer adoption. Nob
brainer. The time investment is close to
zero since you just enable it on your
repositories and it starts working. It's
not obtrusive either. While these tools
can often suggest fixes you can apply
with one click. Their main focus is not
that. It is providing recommendations.
You can take them into account or
dismiss them. There's no forced workflow
change, no new tool to learn, no context
switching. The AI reviews your PR in
parallel with human reviewers and you
decide what feedback is valuable. Easy
for ops teams. This is particularly
useful. Misconfigurations in Kubernetes
manifest, security issues in Terraform,
missing best practices in Helm charts.
These are exactly the kinds of issues
that slip through human review because
reviewers are focused on the logic, not
the YAML structure. A reviewers don't
get tired and they don't skip the boring
parts. The options range from simpler
tools like sorcery and codeent AAI that
handle basic reviews to more
sophisticated solutions. Tracer
identifies edge cases and performance
issues but requires a paid plan after
the trial. Grapile builds a full
semantic graph of your repo for
crossfile bug detection with sock 2
compliance though it can be noisy very
noisy. Cubic dev focuses on speed and
learns from your feedback but is GitHub
only. Code emerge on the other hand
which is formerly I think PR agent
something like that. Anyways, it is open
source and benchmarks as the fastest and
most thorough with rag powered searches
ac across all the repos. Now, my choice
is something else. My choice is called
rabbit. The primary reason is ease of
adoption. Setup takes minutes or seconds
with minimal configuration. It works
across GitHub, GitLab, Azure DevOps, and
Bitbucket. The reviews provide line by
line feedback that resembles what you
would get from a senior developer and
not just highlevel summaries. It learns
from your interactions over time,
adopting to your codebase and team
preferences. What sets code rabbit apart
is its MCP server integration. That's
the part I like the most. You can
connect it to cloud code or cursor or
any other MCP compatible agent. This
means that you can write code in your
agent, create a PR, get code rabbit's
review, then ask your agent to fetch
those review comments and implement the
fixes and all that without leaving your
workflow. The loop between writing code
and addressing review feedback stays
within a single agent session. For teams
already using AI agents for development,
this integration is significant. Code
merge is a solid alternative, especially
if you need self-hosting for strict
security requirements or you want
open-source transparency. It uses rug to
search across repositories for context.
But Cod Rabbit's ease of setup, MCP
integrations, and broad platform support
make it the better default choice for
most themes. The pricing is reasonable
with the free tier for basics and paid
tiers for more features.
The database landscape has shifted
beyond traditional SQL and NoSQL
databases. There's now a pressing need
to provide data to AI models through
agents. Due to context limitations, you
cannot just dump all your data into an
LLM and hope for the best. You need to
find the data that matters for each
specific query, which means semantic
search through embeddings. This is where
vector databases come in. They store
embeddings which are numerical
representations of text or code or
images or any other context and let you
find semantically similar items quickly.
When a developer asks an AI agent about
a production incident, the agent can
search your runbooks and past incidents
and documentation using semantic
similarity rather than keyword matching.
The results are dramatically better. And
the market has responded in two ways.
Dedicated vector databases have emerged.
They're purpose-built for storing and
quering embeddings at scale. At the same
time, existing databases are adding
vector capabilities, so you don't have
to manage another system. If your data
already lives in PosgrSQL, adding PG
vector might make more sense than
migrating to a dedicated solution. If
you want to add vector capabilities to
databases that you already run, PG
vector extends possides
edge native serverless vectors. If
you're in the Cloudflare ecosystem,
that's the only requirement. it's a
bugger. These work well for smaller
scale and when you want to avoid
managing another database. For dedicated
vector databases, Chrome offers the best
developer experience for rapid
prototyping, but it is limited to
smaller data sets. Milv scales to
billions of vectors but requires
engineering expertise. VV8 was first to
market with a deep feature set including
hybrid search and multimodel support.
Pine cone is the easiest fully managed
option with zero ops and strong
compliance certifications but costs they
can grow quickly at scale. My choice is
quadrant for vector database work at
least. It hits the right balance between
performance features and cost. Written
in Rust, it delivers excellent query
speeds with minimal latency. The
filtering capabilities are what set it
apart. You can filter on payload values
before the vector search happens not
after. This means queries like hey find
similar documents from the last 30 days
or hey find similar incidents in the
production environment
fast really fast regardless of how many
vectors you have. And the open source
model matters here. You can run Quadrant
locally for development or you can
self-host it in your own infrastructure
or you can use Quadrant cloud if you
want a managed option. This flexibility
is important for platform teams who need
to keep data inhouse or control costs at
scale. Pine cone is easier to get
started with but the costs grow quickly
as your data grows. Quadrant is
significantly cheaper at scale while
delivering comparable similar
performance. Now there are trade-offs.
They exist. Quadrant has a steeper
learning curve than simply adding a PG
vector to your existing POSSQL. If you
only need basic vector search on a small
data set, PG vector is simpler. VV8 is
better if you need sophisticated hybrid
search combining text and vector
queries. But for most AI agent use cases
where you need fast filtered semantic
search at reasonable cost, Quadrant is
the best choice.
Internal developer platform or IDP is
one of the most misunderstood concepts
in our industry. Too many people equate
IDP with a portal, a fancy web UI where
developers click buttons. That's not a
platform. That's just a front end. To
understand what the platform really is,
look at public cloud providers like AWS,
Azure, or Google Cloud. They follow a
clear pattern. services that do
something like EC2 spins up VMs or S3
stores objects or RDS manages databases.
Then we have APIs that expose those
services. And finally, we have user
interfaces that consume those APIs. That
could be web console or CLI or SDK or
Terraform or whatever it is. The UI is
just one of many ways to interact with
the platform. It's not the platform
itself. An IDP must follow the same
pattern. You need services that actually
do things like provision infrastructure,
deploy applications, manage secrets, and
enforce policies. Those services must be
exposed through APIs. Then you can build
whatever user interface makes sense.
Whether that's a portal, a CLI, GitHubs
workflows, or all of the above. If you
start with a portal, you're building a
house starting with the roof. So where
do you run these services? Kubernetes
controllers are the most logical choice.
They're designed to reconcile desired
state with actual state. Exactly what
platform services need to do. How do you
expose APIs? Well, Kubernetes custom
resource definitions or CRDs give you a
declarative API for free. How do you
interact with those APIs? Well, any way
you already interact with Kubernetes,
cube control, helm, githubs tools like
cargo city or flags, dashboards or
custom web UIs, the portal becomes just
another client, not the center of the
universe. Now, many options in this
space offer partial solutions or are
built on what I would consider obsolete
foundations. If you only need only and
exclusively a developer portal rodei
port cortex can work but the portal
alone is not the platform commercial
only one solutions like let's say
hardness or MIA platform and covery for
example they bundle various capabilities
together but often with proprietary
architectures that don't align with how
modern platforms should be built.
Humanity and Northflank get closer to
true platform orchestration but still
come with vendor locking concerns.
Actually scratch humanit
don't do it. North link let's say now if
you're building a platform in 2026 you
should be building it on Kubernetes with
Kubernetes native components from the
CNCF ecosystem. Services should be
controllers APIs should be CRDs and the
entire stack should follow the patterns
that Kubernetes established. Anything
else is either a partial solution or a
step backward and that's when we are
coming to the real deal the backtock
which is backstage Argo CD crossbank and
converter that's my choice for building
internal developer platforms each
component has established itself as the
leader in his domain backstage is
probably the only widely adopted portal
solution with massive contributions from
thousands of organizations it provides
the developerf facing UI layer
Argo CD uh together with flux is the def
facto standard for GitHubs handling
continuous delivery and keeping cluster
states synchronized with git
repositories. Crossplane is the most
mature and widely adopted solution for
building platform services as Kubernetes
controllers with APIs exposed through
CRDs. And finally, Cerno has established
itself as the standard for defining and
enforcing policies across Kubernetes
resources. All four projects are open
source and owned by CNCF. Argon Crossman
are graduated projects while Backstage
and Cavverno are incubating and on track
for graduation. They're mature, they're
widely adopted and they're
wellmaintained. The ecosystem
integration between them is strong with
crossplane providers spanning cloud
platforms and databases and SAS
applications and all manageable through
backstage portals with enforcing
security policies. The only significant
piece missing from the back stack for a
complete IDP is workflows or CI
pipelines. That space is already covered
by a plethora of tools that have existed
for a very long time and they all do
more or less the same thing. Whether you
choose GitHub actions, GitLab CI, Jenke,
Steon or any other CI tool matters less
than getting the platform foundations
right now. The investment required to
build a backstack
I'm not sure how it's pronounced
backstack let's say IDP is real but less
daunting than it might seem. Most
companies running Kubernetes are already
using Argo CD for deployments and
Cavverno for policies. Extending their
usage beyond current workloads is easier
with crossplane than with non-
Kubernetes native tools since the
patterns and workflows are already
familiar. And when it comes to developer
portals, there is no real alternative to
backstage. With the backstack, you get
full control, no vendor locking, and a
platform built on the same patterns as
public cloud.
The gap between local development and
production Kubernetes environments has
always been painful. You can run a local
Kubernetes cluster with Kind or miniQue,
but that doesn't help when your service
needs to talk to dozens of other
services and databases and message cues
and external APIs that only exist in a
shared environment. You end up mocking
everything, which means you're not
actually testing against real
dependencies or you deploy to a shared
dev cluster for every change, which is
slow and creates conflicts with other
developers. This category covers tools
that bridge that gap. The approaches
vary. Some create VPN tunnels to remote
clusters. Some intercept traffic and
route it to your local machine. Some
spin up isolated virtual clusters. And
some automate the build, deploy, test
cycle. The goal is the same. Let
developers work locally with the speed
and convenience of their own machine
while still interacting with real
services in a real cluster. For platform
teams, this is an important piece of the
developer experience puzzle. If
developers cannot easily test their
changes against a realistic environment,
they will either skip testing, which is
leading to production issues, or demand
expensive dedicated environments, which
is leading to infrastructure madness.
The right tooling here pays for itself
quickly. The oldest approach is VPN
based tunneling. Terresence for example
pioneered this space but it comes with
finicky setup compatibility issues with
service meshes and corporate VPNs and uh
it requires root access or Jira or
whatever it's pronounced was born from
teleresence frustration offering a
simpler dockerbased approach that
doesn't modify running workloads for
automating the build deploy test cycle
scaffold brings Google back maturity
with declarative YAML configuration Tilt
offers a browser UI showing build status
and logs with Starlark configuration for
flexibility. Devspace provides file sync
port forwarding and dev containers
across all major clouds. At a higher
level, Signot creates intelligent
sandboxes integrated with CI/CD for PRs.
VC cluster takes a different approach
though by spinning up virtual Kubernetes
clusters with fast provisioning and
strong resolution than namespaces. Now
my choice is mirror D and that choice is
for bridging local development with
remote Kubernetes environments. The key
difference from teleresence and similar
tools is that mirror D works at the
process level rather than the network
level. Instead of creating a VPN tunnel
to your cluster, mirror D intercepts
your local process system calls and
proxies them to a temporary agent
running in your cluster. Now this
approach has several advantages. No
cluster installation is required. Mirror
uses the Kubernetes API directly. So all
you need is a configured cube config. It
creates a temporary pod when it runs and
cleans up automatically when it's done.
No operators, no demons, no permanent
changes to your cluster. And guess no
root access is needed on your machine
either. Unlike teleresence which
requires elevated privileges to create
network tunnels. MirrorD only affects
the running process. The rest of your
machine remains untouched. This makes it
easier to adopt in corporate
environments where developers don't have
admin rights. Traffic mirroring instead
of interception is a big deal for shared
environments. Teleresence intercepts
traffic meaning requests intended for
the remote service get redirected to
your local machine. This disrupts others
using that environment. Mirror can
mirror traffic instead sending a copy to
your local process while the original
requests are handled normally by the
remote service. What else? Oh yeah,
environment configuration is automatic.
Mirror D proxies, network access, file
access and environment variables
uniformly. Your local process sees the
same environment variables can read the
same files and connects to the same
services as the remote pot.
As the pattern of building platforms on
Kubernetes with services exposed through
CRDs becomes more prevalent, testing
Kubernetes resources becomes more
critical. This isn't just about testing
applications running on Kubernetes. It's
about testing the platform itself, the
controllers, the operators, the custom
resources, the policies, and the
integrations between them. When you
combine this with the Gentic AI that
interacts with your platform through
these APIs, the stakes get higher. They
get very high. An AI agent provisioning
infrastructure or modifying
configurations needs to work correctly
every single time. The API it calls need
to behave as documented. The controllers
behind those APIs need to reconcile
state reliably. You cannot manually
verify this at scale. You need automated
end-to-end tests that exercise the full
life cycle of your custom resources. The
tooling in this space has matured
significantly. Declarative testing
frameworks let you define expected
states and assertions in YAML rather
than writing tests code. Conformance
testing ensures your clusters meet
Kubernetes standards. The best tools
make it easy to turn a bug report into
regression test by simply copying
manifests. Now within that space for
basic Helm chart validation Helm test is
built in but uh it's limited to simple
pass fail checks Helm unit test adds BD
style unit testing as a helm plug-in ct
or cuttle c I think probably was the
original declarative kubernetes test
tool but development had slowed
significantly almost non-existent for
cluster conformance testing son boy
ensures your cluster meets kubernetes
standards with non-destructive
diagnostics. The official Kubernetes C2
framework provides codebased testing
libraries with automatic cluster life
cycle management though it requires Go
knowledge but Go is okay. Go is a cool
language. Ko chainsaw is my choice for
testing Kubernetes platforms. It builds
on ideas from K and improves on them
significantly. The core idea is
declarative testing. You define tests in
YAML rather than writing go code or bar
scripts. Now this matters for platform
teams. When you build a platform with
custom controllers and CRDs, you need to
test that resources reconciled
correctly, that policies are enforced,
that the entire life cycle works as
expected. Writing this in Go means
maintaining more test code than actual
platform code. Chainsaw lets you define
test cases by simply providing the
manifests you want to apply and the
expected state you want to verify. The
workflow for turning bug reports into
regression tests is remarkably simple.
Someone reports that a specific manifest
causes unexpected behavior. Cool. You
copy that manifest into a test case, add
the expected outcome and you have a
regression test. No code to write. Each
test step is isolated which makes CI
debugging easier. When a test fails, you
know exactly which step failed and you
can see the relevant logs without going
through the monolithic test output. The
documentation is detailed and actively
maintained with frequent releases. Now
complex assertion logic might require
scripting blocks rather than pure YAML,
but that's a rare edge case. For most
platform testing scenarios, declarative
YAML definitions are sufficient and
dramatically simpler than the
alternatives.
Every DevOps engineer faces the same
dilemma. Should they write this in bash
or should they use a real programming
language like Python or Go? Bush is
everywhere and perfect for quick glues
type of scripts, but it falls apart when
dealing with structured data with error
handling or anything beyond simpler
string manipulation. Python is powerful
and it's readable, but suddenly you're
managing virtual environments, you're
managing dependencies, and you're
wondering if the target system even has
the right Python version. Go gives you
static binaries, but the overhead of
writing, compiling, and maintaining
compiled code for a simple automation
task feels like overkill. The truth is
most modern DevOps work increasingly
involves structured data, JSON from
APIs, YAML configs, log parsing, cloud
CLI outputs. Traditional shells force
you into awkward pipelines of jq or oak
set and grap. In the meantime, real
languages require too much ceremony for
what should be a 10line script. This
category explores shells that bridge
that gap, offering the immediacy of
shell scripting with modern language
features like structured data handling,
proper error messages, and sane syntax.
This isn't a comprehensive list of all
shells, just the ones I explored while
looking for something better than Bash
that doesn't require spinning up a full
development environment. Now, I'll skip
the alternatives and just go straight
into it. After trying many solutions, I
settled on New Shell, at least for
scripting. It delivers the best of both
worlds. quick to write like Bash but
with proper data types and type checking
like Go and TypeScript. The key
difference is that Nell treats data as
structured tables, records and lists
rather than text streams. You can
filter, sort and transform JSON, YAML,
CSV whatever Excel SQLite whatever
you want with the same commands. No more
jq or set pipelines expressions like hey
where status equals running are much
more intuitive than parsing text with
reax. Now to be fair shell pioneer this
structure data approach and it deserves
credit for that but improves on the
concept. It's faster. PowerShell has
historically been very slow. The syntax
is cleaner and less verbose like you
don't have to use verb noun conventions
forcing awkward naming and it's built
for Unix like systems first rather than
treating them as secondclass citizens.
New shell is written in Rust so it's
performant with no runtime dependencies.
The error messages actually tell you
what went wrong and how to fix it. Now I
should be clear. I don't use NS shell as
my interactive shell. I still use CSSH
for that because it's posix compliant
and works everywhere. I use New Shell
exclusively for scripting where it's
structured data handling really shines.
The fact that New Shell isn't commonly
installed on servers isn't a problem for
me because I use Nyx through DevBox.
Every project has a devbox JSON that
brings in all needed tools, including
Nell. That said, I wouldn't use Nell for
scripts that need to run directly on
servers where I don't control the
environment. I don't have many of those.
So, for me, not an issue. Now, the
trade-offs are real. No job control yet.
Still pre uh version one. So, syntax can
change between versions and you cannot
copy paste B scripts. But for DevOps
work involving structured data, New
Shell has replaced both Bash and Python
for most of my scripting needs. Nell
users may not be the majority yet, but
it's too good to ignore. I strongly
recommend it.
So, here are my recommendations for
2026. For AI models, go with Antropics
Cloud for software engineering work. If
budget is a concern, Gemini is a strong
second choice. For AI agents, pick based
on your workflow. Cursor if you prefer
IDs, close code if you work in the
terminal and you want the best
experience, and open code if model
flexibility matters more than polish.
For building custom agents, use Versel's
AI SDK. Model agnosticism is critical
when the landscape is shifting this
fast. For automated code reviews, adopt
code rabbit. The MCP integration alone
makes it worth it. For vector databases,
choose quadrant. It's the best balance
of performance, features, and cost. For
internal developer platforms, build on
the back stack. It's Kubernetes native.
There is no vector locking and all are
CNCA projects. For Kubernetes
development environments, use mirror. It
solves the local to remote gap without
the pain of alternatives. For platform
testing, adopt Cavverno Chainsaw, which
is the tool for declarative testing that
actually works. For scripting, please
try New Shell. It offers structured data
handling without the ceremony of real
languages. Those tools survived real
world views in 2025. They're ready for
2026. That's what I will be using this
year. Thank you for watching. See you in
the next one. Cheers.
This video presents a practitioner's guide to the most essential developer tools for 2026, covering both the AI tools and the foundational technologies that remain critical. Rather than offering a neutral comparison, it shares battle-tested recommendations based on months of real-world use across AI models, coding agents, custom agent development, code review automation, vector databases, internal developer platforms, Kubernetes development environments, platform testing, and modern shell scripting. Key recommendations include Anthropic's Claude for AI-powered software engineering, Cursor or Claude Code for coding agents depending on your workflow preference, Vercel AI SDK for building custom agents with model flexibility, CodeRabbit for automated code reviews with MCP integration, Qdrant for vector database needs, the BACK Stack for building internal developer platforms on Kubernetes, mirrord for bridging local and remote development environments, Kyverno Chainsaw for declarative platform testing, and Nushell for modern scripting with structured data handling. The video emphasizes that while agentic AI has transformed how developers work, solid foundations like testing frameworks, development environments, and platform architecture still matter—AI now intersects with all of them rather than replacing them. #DevOps #AITools #Kubernetes Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join ▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/devops/top-10-devops-tools-you-must-use-in-2026 ▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below). ▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/ ▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox ▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 DevOps and AI Tools 2026 02:10 Best AI Models for Software Engineering 06:19 Best AI Coding Agents 11:45 Building Custom AI Agents 17:15 AI Code Review Tools 21:09 Vector Databases for AI 25:06 Internal Developer Platforms 30:26 Kubernetes Dev Environments 34:56 Kubernetes Platform Testing 39:31 Modern Shell Scripting 42:46 What to Use in 2026