New SkaleData is in early access — request your invite →
All posts

Your Airflow Vendor Wants Your DAGs to Be Expensive

MWAA, Cloud Composer, and Astronomer all make money when your Airflow consumes more compute. SkaleData doesn't. The difference shapes their roadmaps, their defaults, and your bill.

Here's the thing nobody at AWS, Google, or Astronomer will say out loud: they make more money when your Airflow is inefficient.

This isn't a conspiracy. It's just how their business model works. The compute that runs your DAGs is their product. The more of it you consume, the more they get paid. Whether your DAGs are tight, well-tuned, parallelism-aware Airflow code, or a tire fire of long-running tasks that should have been ten-line SQL queries — they get paid either way, and they get paid more for the second one.

We built SkaleData differently. Here's how the math actually works.

How MWAA makes money

AWS bills MWAA in three ways:

  1. Per-environment hourly fee for the scheduler/webserver (small/medium/large)
  2. Per-worker hourly fee for every additional worker beyond the baseline
  3. Metadata-DB storage by the GB-month, plus CloudWatch ingestion for logs

The first one is fixed. The other two scale with consumption.

When MWAA's autoscaler decides whether to spin up another worker, the formula is explicitly task-count driven: (running_tasks + queued_tasks) / tasks_per_worker = required_workers, capped at your max. There is no cost term in that equation. If your team writes a DAG with 200 parallel tasks where 20 sequential ones would suffice, MWAA scales up to handle them. AWS bills you for 200 tasks' worth of worker time. Nobody at AWS is going to ship a feature that detects "you probably meant to write this as a single batched operation."

Why would they? Their job is to run the workers you ask for.

(AWS launched MWAA Serverless in late 2025 with per-task-second billing, which is structurally closer to the right alignment — you pay only for time tasks actually run. It's a real improvement. It's also still AWS in the middle, marking up the underlying compute and pocketing the difference. Better incentives than the standard tier, same fundamental shape.)

How Cloud Composer makes money

Cloud Composer 3 (the default since March 2025) runs the Airflow components in a Google-managed tenant project — you don't see the underlying GKE cluster anymore. GCP rolled the old per-vCPU / per-memory / per-storage SKUs into a single abstract unit called a Data Compute Unit (DCU-hour), which blends across workers, scheduler, DAG processor, triggerer, and webserver. The billing surface area:

  1. Per-environment management fee for the orchestration layer
  2. DCU-hours for everything that runs Airflow under the hood
  3. Database storage as part of the managed-tenant package
  4. Networking, GCS for DAGs/logs, Cloud Logging — all standard GCP line items on top

Same shape as MWAA underneath — the autoscaler is the same Celery/KEDA-based worker scaler, scaling on queued + running tasks with no cost-aware logic. The DCU is just a more abstract unit for the same equation.

What's distinct about Composer is the ~$400/month floor even when idle, which tells you what this product actually is: GCP selling reserved capacity dressed up as managed Airflow. Whether to optimize for "fewer DCU-hours" or "faster task throughput" is a tradeoff, and GCP, structurally, gets paid more for the second one. So that's where the defaults land.

How Astronomer makes money

This one's the purest version of the model. Astro Hosted defaults to Astronomer-operated multi-tenant Kubernetes clusters — shared nodes that they run on AWS, GCP, or Azure and pay the cloud provider for. Your workers are pods scheduled on those shared nodes, in a namespace fenced off from other tenants. (A single-tenant "Dedicated" cluster is offered at a higher hourly rate; the multi-tenant Standard cluster is the default.) The pods are billed back to you at a fixed Astronomer rate per worker size — and Astronomer never publishes the spread between that rate and what they pay their cloud provider for the underlying node-hour.

That undisclosed spread is the margin. It's reselling compute with an opaque markup. You can't see the underlying cloud cost, you can't right-size against it, and you can't shop it around — because the compute isn't yours, it's Astronomer's, and they're charging you for letting you borrow some of it. (Egress networking is the one line item they pass through transparently.)

Astronomer's BYOC offering (historically "Astro Hybrid," now largely rebranded as "Astro Private Cloud" and "Remote Execution") splits the bill differently: you pay Astronomer a platform fee, and you pay your cloud provider directly for the actual compute. That's structurally closer to the right alignment — but those BYOC variants are the enterprise tier, and Hosted is what most teams actually buy.

Workers do scale to zero when idle, so you're not billed for stopped pipelines — but every active pod-second is billable, and Astronomer has built the slickest UX in the market on top of that incentive structure. To their credit, they ship serious observability tooling — task duration baselines, lineage, anomaly alerts — that could help you find inefficient DAGs. Whether the team building those features is being judged on "did this reduce customer pod consumption" or "did this make Astronomer stickier" is a question only their PMs can answer.

The structural problem

Step back from the individual vendors. Every managed Airflow offering on the market — except SkaleData — has the same shape: they run the compute, they charge you for the compute, more compute = more revenue. Every product decision they make is filtered through a question that should not be the customer's problem: does this change increase or decrease the compute we get to bill?

Most of the time, those incentives align with yours. They want fast schedulers, reliable workers, good UX — and so do you. But the moment efficiency conflicts with throughput, or "do less work" conflicts with "do work faster," their incentive points one way and yours points the other.

You will never see MWAA ship a feature called "automatically detect and merge redundant tasks." You will never see Cloud Composer surface a "this DAG could have been a SQL view, here's the SQL" suggestion. Not because the engineering is hard — it isn't — but because the business case for AWS or Google to ship that feature is we will make less money.

How SkaleData makes money

SkaleData runs on a different shape entirely.

When you deploy a SkaleData cluster, the cluster lives in your AWS, GCP, or Azure account. The Kubernetes nodes, the Postgres instance, the object storage, the networking — all of it is billed to you directly by your cloud provider at standard rates. There's no markup, because we're not in the middle of that transaction.

What SkaleData charges is a flat platform fee per cluster. Run one DAG a week, run a million DAGs a week — we make the same money. Add a worker, remove a worker, autoscale to zero overnight — we make the same money.

This isn't generosity — it's an architectural choice, and it changes our incentives completely.

When we ship a Kubernetes autoscaling improvement that lets a customer run the same DAGs on 30% fewer worker nodes, that costs SkaleData nothing. We get paid the same. The customer's cloud bill goes down. That's a feature we ship without hesitation.

When a customer asks us "why is my Airflow expensive?" we get to actually answer the question. We can look at their DAGs, point at three things that are wasting compute, and tell them how to fix it. Doing that for an AWS, GCP, or Astronomer customer is asking a salesperson to talk you out of buying more of the thing they sell.

The honest caveat

This argument cuts in our favor, so we're motivated to make it. You should be skeptical of self-serving arguments — including this one. Two things to check it against:

One: look at what we ship. Our defaults skew toward efficiency. The platform doesn't try to upsell you on bigger node sizes; it tries to right-size what you've got. Our docs talk openly about how to scale down. That's not what you'd expect from a vendor whose revenue scales with compute, because it isn't.

Two: look at the structure, not the marketing. We can't price ourselves on compute margins because the compute is not ours to mark up. Even if we wanted to extract more revenue per customer, the architecture doesn't give us a lever. That's the whole point of BYOC.

So what

If you're picking a managed Airflow vendor, you're not just picking features. You're picking whose interests are aligned with yours.

MWAA, Cloud Composer, and Astronomer all build genuinely good products. They also all sit on the wrong side of a structural misalignment that you, the customer, end up paying for — in higher cloud bills, in defaults tuned for throughput over efficiency, and in roadmaps that will never prioritize "use less of our product."

SkaleData is the only managed Airflow option in the market where your bill going down is genuinely fine with us.

If that sounds like the kind of vendor relationship you want, request early access. We'd rather make less margin per customer and have you stay forever than make more and have you leave the moment you do the math.