Writing
Thoughts on data engineering, Rust, AI, and cloud architecture
75 posts
Our Nemesis: TPC-DS Query 72 and the Limits of a Custom SQL Engine
One query. Ten tables. Twelve times slower than Trino. Everything we tried, what worked, what didn't, and where the ceiling is.
DataFusion 53, a Vendored Fork, and 40% Faster Queries
We upgraded SQE from DataFusion 52 to 53 by forking and rebasing iceberg-rust ourselves. The result: 27-40% faster across every benchmark suite.
How Agentic AI Helped Us Beat Trino
221 queries, 7 suites, one week — how an AI assistant running automated benchmarks drove a major performance breakthrough.
Auditing a Data Platform with AI: Iterative Security Hardening
Four audit rounds with escalating AI personas found 42 security issues and removed 6,500 lines of dead code.
43 Findings, Zero Deferred: A Production Security Audit of a Rust SQL Engine
We ran a full production sign-off audit against SQE and found 43 issues across security, runtime safety, logic bugs, and code quality. Then we fixed all of them in one session.
When the Ground Shifts Faster Than People Can Stand
A companion piece to The Human Side of the Machine Shift — exploring why AI disruption hits different minds differently, and how leadership can turn discomfort into team strength.
Five Layers of Caching and an 8.8x Speedup Over Trino
How multi-layer caching took SQE from slower than Trino to 2.5-8.8x faster across every benchmark suite.
Streaming Writes, Sort Order Safety, and the IN (Subquery) Workaround
Fixing OOM in CTAS, safe Iceberg sort order for mixed writers, and working around DataFusion limitations.
Databricks-Style Column Profiling in the Data Explorer
Inline column statistics with distribution sparklines computed from real SQL queries, not just Iceberg manifest metadata.
Dropping Radix UI: Going Fully Custom for React 19
How a React 19 infinite re-render loop led us to replace all third-party UI primitives with 660 lines of custom components.
From 63% to 95%: Building Trino SQL Compatibility in a Single Day
Implementing 70+ UDFs, Iceberg time travel, metadata TVFs, and engine-level SQL features for Trino drop-in replacement.
Docker Dev Mode: Hot Reload for the Full Stack
A docker-compose.dev.yml overlay for instant feedback on frontend and backend changes without rebuilding containers.
A Visual Editor for Open Data Contracts
A full ODCS v3.1.0 contract editor with tabbed forms, live YAML preview, catalog import with AI-suggested quality checks, and dbt export.
AI Flows: Langflow as the Platform's Intelligence Layer
Embedding Langflow as a managed AI service with auto-switching flows, MCP tool integration, and context-aware assistance.
The Art of Agents: Sun Tzu's Principles for Building Agentic AI Systems
By Jacob Verhoeks, March 2026 The cost of building software has collapsed. Claude Code ships...
Bringing dbt to the Data Platform: A Browser-Based IDE
Integrating dbt Core into Chameleon with a full workspace IDE, git operations, lineage visualization, and AI-assisted model development.
Building a Comprehensive SQL Benchmark Suite
Seven benchmark suites, 222 queries, and the infrastructure to measure performance honestly.
How We Build Software with AI Assistants
From brainstorm to production in four phases — structured AI collaboration that produces better software than either alone.
Making SQE Work Everywhere: Pluggable Auth and Catalogs
How we're turning a single-vendor query engine into something that runs against any identity provider, any catalog, and any cloud.
When Your SQL Engine Understands Meaning
SQL engines know table shapes. We're adding ontologies, property graphs, vector search, and AI-native interfaces.
We Replaced Our Trino Fork with a Rust SQL Engine
How we went from maintaining a 2M-line Java fork to shipping a 50MB binary that runs every query as the authenticated user.
The AI Development Workflow: A Complete System for Working with AI Agents
A continuous cycle of ideation, planning, execution, and refinement — all driven by issues, feedback...
From Extension to Orchestration: Who Wins (and Who Gets Left Behind) in the Claude 4.6 Era
I published this morning "From Dark Flow to Real Momentum: Why Claude Opus 4.6 Feels Like an...
From Dark Flow to Real Momentum: Why Claude Opus 4.6 Feels Like an Extension of Me
The recent article from fast.ai, titled "Breaking the Spell of Vibe Coding" by Rachel Thomas...
Efficient Agentic AI Development Guide (begin 2026)
A practical guide for working effectively with AI coding agents, especially Claude Code as of begin...
OpenClaw: The Open-Source Agent That Feels Too Alive
It hits like the moment teenagers first logged onto BBS boards or IRC channels in the late '80s and...
I Stopped Maintaining Terraform Examples and Tests Separately. Here's Why.
TL;DR: Every time you update a Terraform example, you should also update its corresponding test. Stop...
Your Terraform Examples Are Broken (And You Don't Know It Yet)
TL;DR: Stop maintaining separate examples and tests. Test your examples directly. One source of...
Scaling Terraform Across many Teams: A Native Framework for Platform Engineering
TL;DR: A pure Terraform framework that lets 50+ teams self-service infrastructure by...
Redesigning This Blog (While Writing About It)
A meta-journey through redesigning a blog's layout - avoiding AI design clichés and building something with actual character
The Hidden Dangers in Our Software Supply Chain: Why It's Bigger Than You Think
In today's fast-paced digital world, software powers everything from the apps on our phones to the...
Why Debian packages are safer then NPM and PyPi
In the world of software development, package repositories are critical for distributing libraries...
Setting Up IOMete: A Cloud-Independent Data Platform Based on Spark
IOMete is a powerful, cloud-independent data platform built on Apache Spark, designed to enable...
DUCKDB, S3 Tables with iceberg using Iceberg Rest API
I wrote my previous article about Duckberg, a combination of PyIceberg with Duckdb to access iceberg...
Duckberg!
I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api This week is saw the...
Collibra Protect , Snowflake and Iceberg tables
Iceberg is gaining traction, and Collibra is expanding its presence as a data governance tool,...
AWS Glue vulnerabilities in default packages
Securing AWS Glue: A Guide to Identifying and Fixing Python Package Vulnerabilities ...
Glue SBOM inspector
Glue SBOM exporter and inspector
Unity Catalog Iceberg Rest Api and PyIceberg
Access Unity tables via the Iceberg Rest Api After working with glue catalog in the...
Unity Iceberg Rest Api and PyIceberg
Using the Unity iceberg rest api with pyiceberg
Re:invent 2024 News
reinvent 24 news
Glue Iceberg Rest Api and PyIceberg
Access Glue Iceberg tables via the Iceberg Rest Api AWS Released silently Iceberg REST-API...
Glue Iceberg Rest Api and PyIceberg
Using the glue iceberg rest api with pyiceberg
The Database Evolution: Breaking Free from Monolithic Thinking
A technical analysis of AWS's latest database innovations, from Aurora's Graviton4 support to OpenSearch Serverless improvements. Learn how these changes are reshaping data architecture patterns and enabling more efficient, distributed database systems with real-world implementation strategies.
The Silent Revolution in Disaster Recovery: How AWS is Changing the Game
Explore how AWS's new zonal shift capabilities and Security Incident Response features are transforming traditional disaster recovery. This technical deep-dive reveals how automatic failover, self-healing systems, and integrated security responses are making DR more reliable and cost-effective than ever before.
The Human Touch in Digital Transformation: Beyond Just Technology
An architect's perspective on AWS's latest customer experience innovations, including Amazon Connect's AI capabilities and Polly's new synthetic voices. Discover how these technologies are making digital interactions more natural while improving efficiency and customer satisfaction.
Infrastructure Efficiency: The Hidden Environmental Impact of AWS's Latest Announcements
Discover how AWS's latest infrastructure updates, particularly the expansion of Graviton4 processors and R8g instances, are quietly revolutionizing cloud computing sustainability. Learn about real-world performance improvements, cost reductions, and environmental benefits of these strategic changes.
Bridging Clouds: A Guide to Connecting AWS Glue Tables with Snowflake
In today's data-driven world, organizations often find themselves working with multiple cloud...
Glue SBOM exporter and vulnerabilities
Glue SBOM exporter and vulnerabilities
Sustainability on AWS
Sustainability is the new topic on AWS. While it has a lot in common with FinOps, it adds emphasis to...
Re:invent 2023 News
reinvent 23 news
Purge a glue table from cli
Sometimes you want to purge a glue table from s3 and delete all files and versions. Lately i had to...
Data Engineering and ChatGPT (Part2)
In my previous post i must have hit some issue with ChatGPT. It's way more impressive then i...
Data Engineering and ChatGPT
ChatGPT is out and everybody is trying it. Blogs, songs are easy, but can it help with Data...
Re:invent 2022 Releases sorted
An overview of all the releases sorted per group. This re:invent was very focused on Data &...
New EC2 Models re:invent 2022
AWS Released 5 new EC2 models in Re:invent 2022. Hereby an...
Re:invent 2022 Releases sorted
Re:invent 2022 Releases sorted
Python and modules for ETL jobs on AWS
Python and modules for ETL jobs on AWS
Overview of the features released for Step Functions
At the Dutch AWS Community Day 2022, i held the following talk. An overview of all the features...
AWSUG.nl talk about Step Functions
AWSUG.nl Communityday talk about Step Functions with the overview of the released features
Secrets and AWS GLUE Custom Connectors
For a project i had to retrieve data from Teradata using a Glue Job. A quick google gave me this:...
AWSUG.nl talk about EventBridge
AWSUG.nl talk about EventBridge with the overview of the released features
AWS Sagemaker Canvas Remove
How to remove AWS Sagemaker Canvas
Docker on Mac with Colima
Use the free alternative Colima to run docker on mac
AWS Community Builder
Part of the AWS Community Builders Program
AWS Links 2 Eventbridge
AWS Links 2 Eventbridge
CDK Day 2022
CDK Day 26 May 2022
Votes and Views Part 1
Votes and Views Part 1
AWS Tools / Resources Part 1
Make your Jupyter Notebook/IPython full width
Make your Jupyter Notebook/IPython full width
Make your Jupyter Notebook/IPython full width
Learning with AWS Workshops
Overview AWS Workshops
Maker's Schedule, Manager's Schedule
Maker's Schedule, Manager's Schedule and why they don't combine.
Autocomplete IAM in vscode
Autocomplete IAM in vscode
AWS Glue with custom Python libraries
Using external python modules in AWS Glue
An ON-AIR sign with IoT
Building an sign with iot to show when you are in a meeting"