Semantic Model Power Query vs Dataflow Gen2 in Fabric

In my recent blog post, Building Power BI Reports: Desktop vs Fabric, I talked about the evolving authoring experience in Microsoft Fabric and how report development is increasingly moving into the browser. But that conversation inevitably leads to another architectural question that many teams overlook: where should your Power Query logic actually live? Should it stay inside the semantic model, or should it move upstream into a Dataflow Gen2? At first glance the choice seems trivial—they both use Power Query, they both transform data, and the editors even look almost identical. But once you adopt a lakehouse-first architecture and start using Direct Lake, that decision becomes far more important than it appears. You’re no longer just deciding where to rename columns or fix a date format. You’re deciding where your organization’s data preparation logic lives, who owns it, how reusable it becomes, and whether your architecture will scale cleanly—or slowly turn into a maze of hidden transformations scattered across reports.

Why this choice matters in a Direct Lake world

Here’s the bottom line: once you move to a lakehouse-first pattern and Direct Lake becomes your default, you’re no longer just choosing where to “clean some columns.” You’re deciding where your team’s data logic lives, who owns it, how reusable it is, and how painful it will be six months from now when someone asks, “Why doesn’t this report match the data warehouse?”

Direct Lake is designed to load data into memory directly from Delta tables in OneLake and then query it with the VertiPaq engine, giving you Import-like performance without copying the entire dataset into the model. A Direct Lake refresh is mostly metadata framing, not a full data reload, which keeps refresh overhead low compared to Import refreshes.

And there’s a subtle but important architectural nudge in the official guidance: Direct Lake moves data preparation upstream into OneLake, using tools like dataflows, pipelines, Spark, and T-SQL, so the logic is reusable and centralized.

That’s where the comparison gets interesting.

Two approaches, one common trap

Let’s define the two options as they exist today in Fabric, without the marketing fog.

Semantic Model Power Query (PQ) means you’re using Power Query editing inside the semantic model experience (web modeling). This is real Power Query: connect, transform, and load. But there’s a key limitation that directly affects Direct Lake: Power Query editing for semantic models is supported for Import storage mode tables, not for Direct Lake tables. In other words, you can’t “Transform data” on Direct Lake tables the way you can in Import.

Dataflow Gen2 is a separate artifact designed for low-code ingestion and transformation with Power Query Online, then loading the results into destinations like Lakehouse tables and Warehouse. It’s explicitly positioned for transformation and loading, and it includes scheduling, monitoring, and pipeline integration.

A simple diagram helps show why these feel similar in the editor, but behave very differently in real life:

Option A: Semantic Model PQ (Import shaping)
Source -> Power Query in semantic model -> Import into model -> Report
                                          (local to one model)

Option B: Lakehouse-first with Dataflow Gen2 + Direct Lake
Source -> Dataflow Gen2 (PQ) -> Lakehouse tables (Delta in OneLake) -> Direct Lake semantic model -> Reports
                                         (reusable layer)               (shared, durable)

If you’ve ever thought, “But they both use Power Query… aren’t they basically the same?” you’re not alone. They look alike. They do not behave alike.

Semantic Model PQ: great for last-mile shaping, risky as your data prep layer

When Semantic Model PQ is the right tool, it feels magical. You’re building a report, you find a messy column, you fix it, and you move on with your life. That speed matters, especially for BI developers and report authors.

As Direct Lake increasingly becomes the default choice, there are clear pros and cons.

Pros

Semantic Model PQ is fast to start and very close to the reporting experience. The semantic model editing experience in the service supports Power Query editing for Import models and lets you build the model and report quickly in the browser.

It also fits the “I can’t change upstream” scenario. The Direct Lake guidance even calls out that when the semantic model author can’t modify the source item managed by IT, adding Import tables and using Power Query in the model can be a practical workaround.

Cons

Scope and reuse are the first big problem. Semantic Model PQ logic lives inside one semantic model. That’s fine until you have three semantic models doing the same transformation three different ways (and nobody notices until you’re in a meeting).

Direct Lake is the second big problem. Power Query editing (Transform data) isn’t supported for Direct Lake tables. Direct Lake uses an Edit tables experience, where you select which Lakehouse/Warehouse tables the semantic model uses, not how to transform them.

So if you try to do “real” transformation work inside the semantic model, you tend to drift into Import mode tables. And now you’ve brought back the heavy part Direct Lake was trying to remove: full refresh cycles, capacity CPU and memory pressure during refresh, and duplicated data storage in the model. The Power BI data refresh guidance explicitly calls out the need to plan for extra memory during semantic model refresh to prevent refresh failures.

Important nuance: Composite models (Direct Lake + Import) are a valid and intentional design choice in Fabric. They work well when used deliberately — for example, keeping large fact tables in Direct Lake while using Import for small, highly flexible dimension tables that require Power Query shaping, calculated columns, or hierarchies. The real risk isn’t using Import tables; it’s drifting into them accidentally and re‑introducing refresh cost and data duplication without realizing it.

Scheduling and monitoring are also more limited compared to Dataflow Gen2. Yes, semantic models can be scheduled up to 48 times per day on Fabric capacity, but that’s still “dataset refresh scheduling,” not an ETL monitoring experience.

There’s also an honest ownership issue. When transformation logic lives inside the semantic model, it tends to be owned by the report team (or one heroic BI developer). That can work, but it often creates “shadow ETL,” where business-critical shaping happens in a place the data/platform team doesn’t naturally govern, test, or monitor.

One nuance worth knowing: Fabric does provide OneLake integration that can automatically write Import table data to Delta tables in OneLake (automatic OneLake integration for Import models). That’s useful, but it still doesn’t magically turn semantic model PQ into a governed, reusable engineering layer. The logic and lifecycle are still tied to the semantic model.

If you keep Semantic Model PQ focused on last-mile shaping, it stays your friend. If you let it become your data prep layer, it eventually becomes your weekend plan (and not in a fun way).

Dataflow Gen2: built for reuse and Direct Lake, with real operational trade-offs

If Semantic Model PQ is “fix it right where you see it,” Dataflow Gen2 is “build it once so everyone stops re-building it.”

Pros

Dataflow Gen2 is designed for ingesting and transforming data with a low-code interface, then loading it into destinations, including Fabric Lakehouse tables.

In a lakehouse-first pattern, this lines up perfectly with how Fabric Lakehouse works: tables are stored in Delta Lake format by default, which is the preferred table format for Fabric Lakehouse.

That means Dataflow Gen2 can land curated Delta tables in OneLake, and your semantic model can consume those tables via Direct Lake. This is exactly the “do prep upstream, analyze downstream” pattern Direct Lake is nudging you toward.

Operationally, Dataflow Gen2 has first-class monitoring. Refresh history and the monitoring hub provide run details (status, duration, trigger type), activity-level information, and even downloadable detailed logs from the mashup engine for troubleshooting. It also tracks capacity used for refresh, which is a big deal when you’re trying to manage Fabric cost and performance.

It also has a real CI/CD story. Dataflow Gen2 supports Git integration and deployment pipelines, and Microsoft provides solution architecture guidance for parameterization and stage-specific configuration.

Finally, Dataflow Gen2 is built to scale better than it looks. Under the hood, it runs on Fabric‑managed compute. Microsoft Fabric uses a combination of Spark‑based and SQL‑based execution paths depending on the source, transformations, and destination, and automatically creates managed staging Lakehouse and Warehouse items to optimize performance. Those staging items are managed by the platform and aren’t meant to be manually accessed or modified.

Cons

Dataflow Gen2 introduces another layer, and layers come with responsibilities. You now need to orchestrate refresh order (dataflow first, then downstream consumers). That’s manageable, but it’s real work. The reward is isolation: a semantic model refresh doesn’t have to also be an ingestion job, and failures can be investigated where they occur (in the dataflow run details).

There are also platform limits and throttling behavior to respect. For example, Fabric documents per-dataflow refresh limits per 24-hour window and notes that system-level throttling may reject bursty refresh patterns to protect service stability.

And then there’s the “invisible complexity” part: staging items exist, they can show up in some experiences, and Microsoft strongly advises you not to touch them because the data may not be consistent and changes can cause unexpected behavior. If you’ve ever wondered why a workspace suddenly has mysterious staging artifacts, it’s not you. It’s the platform doing platform things.

A final trade-off: Dataflow Gen2 is best when you treat its output schema as a contract. If downstream semantic models depend on stable columns and types, you need change management discipline. The tooling helps, but it can’t replace good habits.

The good news is this: those are the kinds of growing pains you want. They mean you’re building a reusable data layer, not just a one-off report.

Governance, lineage, CI/CD, and cost: the practical differences people feel

Here’s the part that usually convinces teams (or scares them, depending on the week).

Governance and ownership

Direct Lake guidance explicitly encourages upstream preparation to maximize reusability across the architecture. That’s naturally aligned with a data/platform team owning Dataflow Gen2, while BI teams focus on semantic modeling and measures.

Lineage

Fabric and Power BI both provide lineage views to help understand dependencies across workspace artifacts, including semantic models and dataflows. That visibility is much stronger when your transformations exist as distinct workspace items (like dataflows and Lakehouse tables), rather than being hidden inside a single semantic model.

There is a nuance: Correct lineage between semantic models and dataflows is guaranteed only when the connection is set up using the Get Data UI and the Dataflows connector, not when someone hand-writes a mashup query. This is one of those “it works until it doesn’t” details worth standardizing early.

CI/CD

Both semantic models and Dataflow Gen2 are supported items for Fabric Git integration (currently marked preview for some Power BI items).

But Dataflow Gen2 goes further with explicit CI/CD documentation, deployment pipeline automation guidance, and patterns for stage-specific configuration.

Semantic models do have strong ALM support through deployment pipelines, but there are gotchas: deployment pipelines move metadata, not data, and don’t copy things like refresh schedules or credentials. That matters in real deployments more than people expect.

Cost and capacity

Direct Lake reduces the heavy refresh cost because framing is low-cost compared to Import refresh that copies full data. Dataflow Gen2 shifts cost into the ingestion/transform layer (which is usually where you want it), and its monitoring surfaces capacity usage during refresh so you can manage that cost deliberately.

This is where the mentor advice kicks in: don’t optimize for “fewer artifacts.” Optimize for “fewer surprises.”

Summary table and a decision gut-check

Here’s a comparison table:

Aspect	Semantic Model PQ	Dataflow Gen2
Scope	One semantic model	Workspace-wide
Reusable	No (logic stays in one model)	Yes (shared tables/destinations can feed many)
Writes Delta to OneLake	No (not as a first-class curated layer)	Yes (can land into Lakehouse tables)
Good for Direct Lake	Risky if heavy (tends to drag you into Import)	Ideal (prep Delta upstream, consume via Direct Lake)
ETL ownership	Report owner	Data / Platform team
Scheduling & monitoring	Limited (semantic model refresh features)	First-class (run history, monitoring hub, logs)
Primary purpose	Model shaping (report-friendly structure)	Data prep / engineering

If you want one practical rule that won’t embarrass you later, it’s this:

If the transformation is meant to be shared, trusted, and reused, put it in Dataflow Gen2 and land it in Lakehouse tables, then use Direct Lake for the semantic model.

If the transformation is truly report-specific, small, and unlikely to be reused, Semantic Model PQ can be fine, but keep it small on purpose.

And if you’re unsure which camp it’s in, that’s your answer. Put it upstream. Your future self (and your on-call rotation) will thank you.