Over-engineer and you're maintaining complexity nobody understands. Under-engineer and you're rewriting everything in two years. Answer 6 questions about your data, team, and goals — get an architecture that actually fits.
Design my data pipeline →High-quality teams do design reviews. Some don't. And even when they do, the right people aren't always in the room. The architecture ends up reflecting whoever was involved, not what the data actually needs.
A Spark person builds Spark pipelines. A Snowflake shop puts everything in Snowflake. The architecture follows the engineer's comfort zone — not the problem.
One-off requirements get built like production systems. Production systems get treated as one-offs. Both become maintenance burdens that slow the whole team down.
Despite all the tooling, data is still in silos. Finance, product, and operations are still waiting on reports that should be self-serve.
The inputs that actually change the recommendation — not surface-level ones.
What you're trying to do with the data — the business question, what's currently broken, and what good looks like.
Where the data lives today — databases, files, APIs, lakehouses. How often it arrives and in what shape.
Cloud provider, tools already in use, and anything off the table. The design works with what you have.
Volume, variety, and processing frequency — these three inputs drive most of the architectural decisions.
Budget, compliance requirements, team size, and who owns it after it ships. Determines what's actually buildable.
Specific tools, estimated costs, complexity rating, and the first steps to build it — no filler, no generic advice.
You don't need to know the stack — you need to know the problem. This tool handles the rest.
I started as a big data test engineer — finding bugs in pipelines before they reached production. What I learned quickly was that the bugs weren't in the code. They were in the design. Flawed assumptions about the data, the wrong tool for the volume, architecture that made sense for one engineer but couldn't be handed off.
That insight drove me into data engineering. I built big data pipelines on Spark and AWS EMR, and spent time at Amazon's AWS Billing Data Warehouse — a system collecting and maintaining 45–50 PB of data. I saw firsthand how data challenges compound at scale, what causes pipelines to fail under pressure, and how the best teams think about resilience, contracts, and maintainability from day one.
More recently I've been building and maintaining modern data stacks — Fivetran, Airflow, Snowflake, dbt, AWS Glue. The consistent pattern across all of it: teams either over-engineer because they don't understand the requirements, or under-engineer assuming it's a one-off. Both become legacy. This tool exists to make that decision less dependent on one person's experience.
Connect on LinkedIn →Takes 3 minutes. No account required. Use the access code below.