Writing

Thoughts on data engineering, Rust, AI, and cloud architecture

aws 37 ai 12 iceberg 9 data-engineering 9 glue 9 performance 6 claude 5 react 5 terraform 5 serverless 5 eventbridge 5 trino 4

75 posts

Latest

Our Nemesis: TPC-DS Query 72 and the Limits of a Custom SQL Engine

One query. Ten tables. Twelve times slower than Trino. Everything we tried, what worked, what didn't, and where the ceiling is.

DataFusion 53, a Vendored Fork, and 40% Faster Queries

We upgraded SQE from DataFusion 52 to 53 by forking and rebasing iceberg-rust ourselves. The result: 27-40% faster across every benchmark suite.

How Agentic AI Helped Us Beat Trino

221 queries, 7 suites, one week — how an AI assistant running automated benchmarks drove a major performance breakthrough.

Auditing a Data Platform with AI: Iterative Security Hardening

Four audit rounds with escalating AI personas found 42 security issues and removed 6,500 lines of dead code.

43 Findings, Zero Deferred: A Production Security Audit of a Rust SQL Engine

We ran a full production sign-off audit against SQE and found 43 issues across security, runtime safety, logic bugs, and code quality. Then we fixed all of them in one session.

When the Ground Shifts Faster Than People Can Stand

A companion piece to The Human Side of the Machine Shift — exploring why AI disruption hits different minds differently, and how leadership can turn discomfort into team strength.

Five Layers of Caching and an 8.8x Speedup Over Trino

How multi-layer caching took SQE from slower than Trino to 2.5-8.8x faster across every benchmark suite.

Streaming Writes, Sort Order Safety, and the IN (Subquery) Workaround

Fixing OOM in CTAS, safe Iceberg sort order for mixed writers, and working around DataFusion limitations.

Databricks-Style Column Profiling in the Data Explorer

Inline column statistics with distribution sparklines computed from real SQL queries, not just Iceberg manifest metadata.

Dropping Radix UI: Going Fully Custom for React 19

How a React 19 infinite re-render loop led us to replace all third-party UI primitives with 660 lines of custom components.

From 63% to 95%: Building Trino SQL Compatibility in a Single Day

Implementing 70+ UDFs, Iceberg time travel, metadata TVFs, and engine-level SQL features for Trino drop-in replacement.

Docker Dev Mode: Hot Reload for the Full Stack

A docker-compose.dev.yml overlay for instant feedback on frontend and backend changes without rebuilding containers.

A Visual Editor for Open Data Contracts

A full ODCS v3.1.0 contract editor with tabbed forms, live YAML preview, catalog import with AI-suggested quality checks, and dbt export.

AI Flows: Langflow as the Platform's Intelligence Layer

Embedding Langflow as a managed AI service with auto-switching flows, MCP tool integration, and context-aware assistance.

The Art of Agents: Sun Tzu's Principles for Building Agentic AI Systems

By Jacob Verhoeks, March 2026 The cost of building software has collapsed. Claude Code ships...

Bringing dbt to the Data Platform: A Browser-Based IDE

Integrating dbt Core into Chameleon with a full workspace IDE, git operations, lineage visualization, and AI-assisted model development.

Building a Comprehensive SQL Benchmark Suite

Seven benchmark suites, 222 queries, and the infrastructure to measure performance honestly.

How We Build Software with AI Assistants

From brainstorm to production in four phases — structured AI collaboration that produces better software than either alone.

Making SQE Work Everywhere: Pluggable Auth and Catalogs

How we're turning a single-vendor query engine into something that runs against any identity provider, any catalog, and any cloud.

When Your SQL Engine Understands Meaning

SQL engines know table shapes. We're adding ontologies, property graphs, vector search, and AI-native interfaces.

We Replaced Our Trino Fork with a Rust SQL Engine

How we went from maintaining a 2M-line Java fork to shipping a 50MB binary that runs every query as the authenticated user.

The AI Development Workflow: A Complete System for Working with AI Agents

A continuous cycle of ideation, planning, execution, and refinement — all driven by issues, feedback...

From Extension to Orchestration: Who Wins (and Who Gets Left Behind) in the Claude 4.6 Era

I published this morning "From Dark Flow to Real Momentum: Why Claude Opus 4.6 Feels Like an...

From Dark Flow to Real Momentum: Why Claude Opus 4.6 Feels Like an Extension of Me

The recent article from fast.ai, titled "Breaking the Spell of Vibe Coding" by Rachel Thomas...

Efficient Agentic AI Development Guide (begin 2026)

A practical guide for working effectively with AI coding agents, especially Claude Code as of begin...

OpenClaw: The Open-Source Agent That Feels Too Alive

It hits like the moment teenagers first logged onto BBS boards or IRC channels in the late '80s and...

I Stopped Maintaining Terraform Examples and Tests Separately. Here's Why.

TL;DR: Every time you update a Terraform example, you should also update its corresponding test. Stop...

Your Terraform Examples Are Broken (And You Don't Know It Yet)

TL;DR: Stop maintaining separate examples and tests. Test your examples directly. One source of...

Scaling Terraform Across many Teams: A Native Framework for Platform Engineering

TL;DR: A pure Terraform framework that lets 50+ teams self-service infrastructure by...

Redesigning This Blog (While Writing About It)

A meta-journey through redesigning a blog's layout - avoiding AI design clichés and building something with actual character

The Hidden Dangers in Our Software Supply Chain: Why It's Bigger Than You Think

In today's fast-paced digital world, software powers everything from the apps on our phones to the...

Why Debian packages are safer then NPM and PyPi

In the world of software development, package repositories are critical for distributing libraries...

Setting Up IOMete: A Cloud-Independent Data Platform Based on Spark

IOMete is a powerful, cloud-independent data platform built on Apache Spark, designed to enable...

DUCKDB, S3 Tables with iceberg using Iceberg Rest API

I wrote my previous article about Duckberg, a combination of PyIceberg with Duckdb to access iceberg...

Duckberg!

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api This week is saw the...

Collibra Protect , Snowflake and Iceberg tables

Iceberg is gaining traction, and Collibra is expanding its presence as a data governance tool,...

AWS Glue vulnerabilities in default packages

Securing AWS Glue: A Guide to Identifying and Fixing Python Package Vulnerabilities ...

Glue SBOM inspector

Glue SBOM exporter and inspector

Unity Catalog Iceberg Rest Api and PyIceberg

Access Unity tables via the Iceberg Rest Api After working with glue catalog in the...

Unity Iceberg Rest Api and PyIceberg

Using the Unity iceberg rest api with pyiceberg

Re:invent 2024 News

reinvent 24 news

Glue Iceberg Rest Api and PyIceberg

Access Glue Iceberg tables via the Iceberg Rest Api AWS Released silently Iceberg REST-API...

Glue Iceberg Rest Api and PyIceberg

Using the glue iceberg rest api with pyiceberg

The Database Evolution: Breaking Free from Monolithic Thinking

A technical analysis of AWS's latest database innovations, from Aurora's Graviton4 support to OpenSearch Serverless improvements. Learn how these changes are reshaping data architecture patterns and enabling more efficient, distributed database systems with real-world implementation strategies.

The Silent Revolution in Disaster Recovery: How AWS is Changing the Game

Explore how AWS's new zonal shift capabilities and Security Incident Response features are transforming traditional disaster recovery. This technical deep-dive reveals how automatic failover, self-healing systems, and integrated security responses are making DR more reliable and cost-effective than ever before.

The Human Touch in Digital Transformation: Beyond Just Technology

An architect's perspective on AWS's latest customer experience innovations, including Amazon Connect's AI capabilities and Polly's new synthetic voices. Discover how these technologies are making digital interactions more natural while improving efficiency and customer satisfaction.

Infrastructure Efficiency: The Hidden Environmental Impact of AWS's Latest Announcements

Discover how AWS's latest infrastructure updates, particularly the expansion of Graviton4 processors and R8g instances, are quietly revolutionizing cloud computing sustainability. Learn about real-world performance improvements, cost reductions, and environmental benefits of these strategic changes.

Bridging Clouds: A Guide to Connecting AWS Glue Tables with Snowflake

In today's data-driven world, organizations often find themselves working with multiple cloud...

Glue SBOM exporter and vulnerabilities

Glue SBOM exporter and vulnerabilities

Sustainability on AWS

Sustainability is the new topic on AWS. While it has a lot in common with FinOps, it adds emphasis to...

Re:invent 2023 News

reinvent 23 news

Purge a glue table from cli

Sometimes you want to purge a glue table from s3 and delete all files and versions. Lately i had to...

Data Engineering and ChatGPT (Part2)

In my previous post i must have hit some issue with ChatGPT. It's way more impressive then i...

Data Engineering and ChatGPT

ChatGPT is out and everybody is trying it. Blogs, songs are easy, but can it help with Data...

Re:invent 2022 Releases sorted

An overview of all the releases sorted per group. This re:invent was very focused on Data &...

New EC2 Models re:invent 2022

AWS Released 5 new EC2 models in Re:invent 2022. Hereby an...

Re:invent 2022 Releases sorted

Re:invent 2022 Releases sorted

Python and modules for ETL jobs on AWS

Python and modules for ETL jobs on AWS

Overview of the features released for Step Functions

At the Dutch AWS Community Day 2022, i held the following talk. An overview of all the features...

AWSUG.nl talk about Step Functions

AWSUG.nl Communityday talk about Step Functions with the overview of the released features

Secrets and AWS GLUE Custom Connectors

For a project i had to retrieve data from Teradata using a Glue Job. A quick google gave me this:...

AWSUG.nl talk about EventBridge

AWSUG.nl talk about EventBridge with the overview of the released features

AWS Sagemaker Canvas Remove

How to remove AWS Sagemaker Canvas

Docker on Mac with Colima

Use the free alternative Colima to run docker on mac

AWS Community Builder

Part of the AWS Community Builders Program

AWS Links 2 Eventbridge

AWS Links 2 Eventbridge

CDK Day 2022

CDK Day 26 May 2022

Votes and Views Part 1

Votes and Views Part 1

AWS Tools / Resources Part 1

Make your Jupyter Notebook/IPython full width

Make your Jupyter Notebook/IPython full width

Make your Jupyter Notebook/IPython full width

Learning with AWS Workshops

Overview AWS Workshops

Maker's Schedule, Manager's Schedule

Maker's Schedule, Manager's Schedule and why they don't combine.

Autocomplete IAM in vscode

Autocomplete IAM in vscode

AWS Glue with custom Python libraries

Using external python modules in AWS Glue

An ON-AIR sign with IoT

Building an sign with iot to show when you are in a meeting"