Introduction to Power Query
Imagine this: Your data is messy, scattered across Excel files, databases, and websites. Power Query is your data cleaning assistant built right into Power BI! It helps you gather, clean, and organize data automatically so you can focus on creating amazing reports.
What is Power Query?
Think of Power Query as a smart data robot that lives inside Power BI and Excel. Instead of manually copying, cleaning, and organizing data for hours, you teach Power Query how to do it once -- and it remembers! Every action you take is saved as a "step," so when your data updates next month, Power Query automatically repeats all those steps in seconds.
Why it matters: No more repetitive work! Whether you're combining 50 Excel files or cleaning messy sales data, Power Query does it automatically. Your work becomes faster, error-free, and professional-grade. It's like having a data assistant that never gets tired!
Why Use Power Query?
Have you ever spent hours cleaning the same report every week? Copying data, removing blanks, fixing dates -- only to do it all over again next Monday? That's exactly what Power Query solves! Instead of repeating boring tasks, you set up your cleaning process once, and Power Query remembers every step. Next time your data updates, just click "Refresh" and watch the magic happen in seconds!
Set It & Forget It
Clean your data once, Power Query repeats those exact steps automatically whenever you refresh -- no manual work needed!
See Every Step
No black boxes! Every transformation is listed clearly. You can edit, delete, or reorder steps anytime -- full control!
Connect to Everything
Excel files? Databases? Websites? Cloud storage? Power Query connects to 100+ sources -- your data, anywhere!
Opening Power Query Editor
Ready to start cleaning data? Power Query Editor opens in its own window, separate from your main Power BI report. Think of it as entering a "data cleaning room" where you can work on your data without affecting your reports. When you're done, you simply close it and return to building your visuals!
Home Ribbon → Transform Data
In Power BI Desktop, go to the Home ribbon and click the "Transform Data" button. This is the quickest way to open the Power Query Editor.
Right-Click Query → Edit Query
If you already have queries loaded, right-click any query name in the Fields pane (on the right) and select "Edit Query" to open it in Power Query Editor.
Keyboard Shortcut
Press Alt + H, then Q to open Power Query Editor. This works when you have a query selected.
Get Data → Choose Source
When importing data, click "Get Data" → select your source, then click "Transform Data" (instead of "Load") to open the editor immediately after connecting.
Power Query Editor Interface (Your Control Center)
Let's take a tour! The Power Query Editor has 4 main areas, like a car dashboard. Once you know where everything is, you'll navigate like a pro!
Queries Pane
Left Sidebar
Your query library!
- See all your queries listed
- Organize into folders
- Check load status
- Manage relationships
Preview Pane
Center Stage
See your data live!
- First 1,000 rows
- Column headers visible
- Data types shown
- Instant updates
Applied Steps
Right Panel
Your transformation diary!
- Every action recorded
- Time-travel through steps
- Edit or delete steps
- Reorder transformations
Formula Bar
Top Bar
Peek behind the curtain!
- View M code
- Current step shown
- Edit code directly
- Advanced power
The ETL Workflow (Your Data Journey)
ETL sounds fancy, but it's just 3 simple steps: Extract (grab your data), Transform (clean & organize it), Load (put it in Power BI). Think of it like cooking: 1) Get ingredients 2) Chop & prepare 3) Serve the dish. Every Power Query follows this recipe!
| Phase | Description | Power Query Tasks |
|---|---|---|
| Extract | Connect to data sources and import raw data | Use connectors (Excel, SQL, Web, APIs), configure connection settings, authenticate |
| Transform | Clean, reshape, and prepare data for analysis | Remove columns, filter rows, split/merge, pivot/unpivot, change data types, merge queries |
| Load | Load transformed data into the data model | Close & Apply, configure refresh settings, set load destination (data model or connection only) |
Applied Steps: Your Action Replay
Every click you make gets saved as a step! It's like having a video replay of everything you did to your data. Click "Removed Columns" step? You'll see your data before those columns were removed. Delete a step? That action disappears. It's the ultimate UNDO feature -- with full visibility! Think of it as breadcrumbs showing exactly how you got from messy data to clean data.
Practice Questions: Introduction to Power Query
Test your understanding with these hands-on exercises.
Task: What are the three phases of the ETL workflow in Power Query?
Show Solution
Answer: The three phases are:
- Extract: Connect to data sources and import raw data using connectors
- Transform: Clean, reshape, and prepare data through transformations
- Load: Load the transformed data into Power BI's data model for visualization
Why it matters: Understanding the ETL workflow helps you organize queries logically, optimize performance, and follow industry best practices for data preparation.
Task: Explain how to use the Applied Steps pane to manage transformations and improve query maintainability.
Show Solution
Answer: The Applied Steps pane records every transformation as a named step, creating a transparent history. You can:
- Click any step to preview data at that stage of transformation
- Edit step names for better documentation
- Delete steps to undo transformations
- Reorder steps (with caution) to optimize or fix logic
- View the M code for each step by selecting it and checking the formula bar
Practical tip: Always rename cryptic auto-generated step names like "Changed Type1" to descriptive names like "Set Sales Amount to Number" for better maintainability.
Task: Explain when to disable loading for a query and how this affects query dependencies and the data model.
Show Solution
Answer: Disable loading for intermediate or helper queries that:
- Are referenced by other queries but don't need to appear in the data model as separate tables
- Contain parameter definitions or reusable functions
- Store staging or temporary transformations used in multiple queries
Impact on dependencies: The query still executes during refresh (as long as it's referenced by a loaded query), but it won't create a table in the data model. This reduces model size and keeps the field list clean while maintaining functional dependencies.
Example: If you create a "DateTable Helper" query that generates a date range, and then reference it in your main "Sales" query, you can disable loading for "DateTable Helper" since you only need the final Sales table in your model.
Data Sources & Connectors
Your data is everywhere! Sales in Excel, customer info in SQL databases, website stats online, receipts in Google Sheets... Power Query can connect to all of them! With 100+ connectors, it's like having a universal adapter that plugs into any data source you can imagine.
Common Data Source Categories
Where does your data live? Power Query organizes connectors into 4 main neighborhoods. Let's explore each one!
File Connectors
Local & Network Files
- Excel workbooks
- CSV & Text files
- JSON & XML data
- Access databases
- Folder combine
Databases
Relational Systems
- SQL Server
- MySQL
- PostgreSQL
- Oracle
- OData feeds
Cloud Services
SaaS Platforms
- SharePoint
- Google Sheets
- Salesforce
- Google Analytics
- Dynamics 365
Web & APIs
Online Services
- Web pages
- REST APIs
- OData feeds
- JSON/XML APIs
- Web URLs
Connecting to Excel Files
Excel is one of the most common data sources in business environments. Power Query can import from workbooks, worksheets, tables, and named ranges. The connector automatically detects the structure and provides options for data import.
Get Data
From the Home ribbon, select Get Data → Excel Workbook
Browse File
Navigate to your Excel file and click Open to select the file location
Navigator
Select the sheet, table, or named range you want to import
- Preview pane shows first few rows
- Check "Select multiple items" to import multiple sheets
Load or Transform
- Load: Imports data directly into the model
- Transform Data: Opens Power Query Editor for transformations first (recommended)
Connecting to SQL Server
SQL Server connections allow you to import data from on-premises or Azure SQL databases. Power Query supports both Import and DirectQuery modes, giving you flexibility in how data is accessed.
Server Name
Enter the SQL Server instance name
Example: localhost, SERVER01, or myserver.database.windows.net
Database (Optional)
Specify database name or leave blank to see all databases
Specifying the database name speeds up connection and reduces Navigator clutter
Data Connectivity Mode
- Import: Data is cached in Power BI (faster queries, scheduled refresh needed)
- DirectQuery: Live connection to database (always current, queries run on database)
Authentication
Choose Windows or Database authentication
Windows authentication uses your logged-in credentials; Database requires SQL username/password
Connecting to Web Data
The Web connector allows you to import HTML tables from web pages or consume data from REST APIs. Power Query automatically detects tables on web pages, making it easy to import online data.
Example: Importing Wikipedia Table
1. Get Data → Web
2. Enter URL: https://en.wikipedia.org/wiki/List_of_countries_by_population
3. Navigator shows all detected tables
4. Select "Countries" table
5. Transform Data to clean and format
Power Query automatically:
- Detects table structure
- Assigns column headers
- Suggests data types
Folder Connector: Combining Multiple Files
The Folder connector is incredibly powerful for scenarios where you need to combine multiple files with the same structure into a single table. This is common with monthly reports, daily logs, or distributed data files.
Access Folder Connector
From the Home ribbon, select Get Data → Folder and browse to the folder containing your files
Review File List
Power Query displays all files with metadata including name, extension, date modified, and file size
Combine & Transform
Click the "Combine & Transform Data" button to begin the combining process
Select Template
Select an example file and the sheet/table to use as the structure template for all files
Automatic Combination
Power Query automatically combines all matching files into one table and adds a "Source.Name" column showing which file each row came from
Connection Properties & Refresh Settings
After establishing a connection, you can configure properties like refresh schedules, authentication credentials, and privacy levels. These settings control how and when data is updated.
| Property | Purpose | Options |
|---|---|---|
| Connection Name | Identify the data source in queries | Descriptive name (default is server/file name) |
| Privacy Level | Control data sharing between sources | Private, Organizational, Public |
| Credentials | Authentication method | Windows, Database, OAuth, API Key |
| Gateway Connection | Enable refresh in Power BI Service | On-premises gateway configuration |
| Refresh Frequency | Scheduled update interval (Service only) | Up to 8 times per day (Pro), 48 times (Premium) |
Practice Questions: Data Sources & Connectors
Test your understanding with these hands-on exercises.
Task: Explain the difference between "Load" and "Transform Data" when connecting to a data source, and when to use each approach.
Show Solution
Answer:
- Load: Imports data directly into the Power BI data model without opening Power Query Editor. Use this only when data is already clean and properly structured.
- Transform Data: Opens Power Query Editor first, allowing you to inspect and transform data before loading. This is the recommended approach for most scenarios.
Best Practice: Always use "Transform Data" for new connections to set proper data types, remove unnecessary columns, and apply initial cleaning steps.
Task: Describe how to efficiently combine 50 Excel files with identical structure from a network folder.
Show Solution
Answer: Use the Folder connector with "Combine & Transform Data" feature:
- Get Data → Folder → Browse to the network folder
- Click "Combine & Transform Data" (not "Combine & Load")
- Select an example file to define the structure template
- Power Query combines all files automatically
- A "Source.Name" column tracks which file each row originated from
Requirements: All files must have matching column names and structure. Power Query will error if columns don't align.
Performance Tip: Filter the file list to only relevant extensions (e.g., .xlsx) before combining to avoid processing unnecessary files.
Task: Compare DirectQuery and Import modes for SQL Server connections, including trade-offs for each approach.
Show Solution
Import Mode:
- Pros: Fast queries (cached data), supports all DAX functions, works offline, reduces database load
- Cons: Requires scheduled refresh, data may be stale, consumes storage in Power BI
- Use when: Data doesn't change frequently, you need complex DAX, database is slow, or you need offline access
DirectQuery Mode:
- Pros: Always shows current data, no storage limits, suitable for very large datasets
- Cons: Slower performance (queries hit database), limited DAX support, requires live connection, increases database load
- Use when: Real-time data is critical, dataset exceeds Power BI size limits (10GB Pro, 100GB Premium), or you need row-level security at database level
Hybrid Approach: Use Composite Models to combine Import and DirectQuery tables in the same report, getting benefits of both modes where appropriate.
Data Transformations
This is where the magic happens! Raw data is usually messy -- wrong types, duplicate rows, columns you don't need, dates in weird formats. Power Query's transformations are like data cleaning superpowers! Rename, remove, split, merge, filter -- all with simple clicks. No coding required! Let's learn the essential moves...
Column Operations (Shaping Your Data)
Think of columns as the ingredients in your data recipe. Sometimes you need to remove extras, rename them for clarity, change their type (text vs number), or split them apart. These 4 operations are your daily bread -- you'll use them in almost every Power Query!
Remove Columns
Delete unnecessary columns to reduce model size and improve performance.
- How: Select columns → Right-click → Remove Columns
- Shortcut: Select columns → Home ribbon → Remove Columns
- Tip: Use "Remove Other Columns" to keep only selected columns
Rename Columns
Give columns clear, descriptive names for better report usability.
- How: Double-click column header → Type new name → Enter
- Alternative: Right-click column → Rename
- Tip: Avoid spaces; use underscores (e.g., Sales_Amount)
Change Data Type
Set correct data types (Text, Number, Date, etc.) for proper operations.
- How: Click data type icon (ABC/123) in column header
- Options: Text, Whole Number, Decimal, Date, Date/Time, True/False
- Critical: Always set correct types before loading to model
Split Column
Divide a column into multiple columns based on delimiters or positions.
- By Delimiter: Split "FirstName LastName" by space
- By Position: Extract first 3 characters from Product Code
- Tip: Use advanced options for custom split patterns
Filtering and Removing Rows (Cutting the Clutter)
Not all data is useful! Maybe you only need 2024 sales (not 2020-2023), or you want to remove blank rows that mess up your charts. Row operations let you keep what matters and toss what doesn't. It's like sorting through your closet -- keep the good stuff, donate the rest!
Keep Rows
- Keep Top Rows: Retain first N rows (useful for sampling)
- Keep Bottom Rows: Retain last N rows
- Keep Range: Keep rows from position X to Y
Remove Rows
- Remove Top Rows: Delete first N rows (e.g., remove header rows)
- Remove Duplicates: Keep only unique rows based on selected columns
- Remove Blank Rows: Delete rows with all empty values
- Remove Errors: Filter out rows containing error values
Filter Rows
- Basic Filter: Click column dropdown → Check/uncheck values
- Text Filters: Contains, Begins With, Ends With, Equals
- Number Filters: Greater Than, Less Than, Between, Top N
- Date Filters: In the Last, Before, After, Between dates
Merge and Append Queries
Combining data from multiple queries is essential for creating comprehensive datasets. Power Query supports two primary methods: merging (like SQL joins) and appending (like SQL UNION).
| Operation | Purpose | SQL Equivalent | Use Case |
|---|---|---|---|
| Merge Queries | Combine queries horizontally (add columns) | JOIN | Add customer details to sales transactions |
| Append Queries | Combine queries vertically (add rows) | UNION | Combine sales from multiple regions into one table |
Merge Queries: Join Types
Merging queries requires specifying matching columns and the join type. Power Query supports all standard SQL join types, each serving different analytical needs.
Left Outer Join
Returns: All rows from left table + matching rows from right table
Example: Keep all sales transactions, add customer details where available (unmatched sales show NULL for customer fields)
Most CommonInner Join
Returns: Only rows with matches in both tables
Example: Sales transactions with valid customer records only (excludes orphaned or invalid transactions)
Full Outer Join
Returns: All rows from both tables, with NULLs where no match exists
Example: Find all customers AND all transactions, even if some customers never purchased or some transactions lack customer info
Left Anti Join
Returns: Rows from left table with NO match in right table
Example: Find sales transactions that don't have a matching customer record (orphaned sales)
Right Anti Join
Returns: Rows from right table with NO match in left table
Example: Find customers who have never made a purchase (no matching sales records)
Grouping and Aggregation
Group By operation allows you to aggregate data, similar to SQL's GROUP BY clause. You can calculate sums, counts, averages, min/max, and more for groups of rows.
Example: Sales Summary by Region and Product
1. Select columns to group by: Region, Product Category
2. Home ribbon → Group By
3. Add aggregations:
- Total Sales: Sum of SalesAmount
- Order Count: Count Rows
- Average Order: Average of SalesAmount
- Max Order: Maximum of SalesAmount
4. Result: One row per Region-Product combination with aggregated metrics
Common Aggregations:
- Sum: Total of numeric column
- Count: Number of rows in group
- Average: Mean value of numeric column
- Min/Max: Minimum or maximum value
- Count Distinct: Number of unique values
Pivot and Unpivot Operations
Reshaping data between wide and long formats is crucial for analysis. Pivot converts rows to columns (wide format), while Unpivot converts columns to rows (long format).
Unpivot Columns
Purpose: Convert wide format (months as columns) to long format (one row per month)
Before (Wide):
Product | Jan | Feb | Mar Laptop | 100 | 150 | 120 Phone | 200 | 180 | 210
After (Long):
Product | Month | Sales Laptop | Jan | 100 Laptop | Feb | 150 Phone | Jan | 200
How: Select columns to keep → Transform ribbon → Unpivot Columns (or Unpivot Other Columns)
Pivot Column
Purpose: Convert long format (months in rows) to wide format (months as columns)
Before (Long):
Product | Month | Sales Laptop | Jan | 100 Laptop | Feb | 150 Phone | Jan | 200
After (Wide):
Product | Jan | Feb Laptop | 100 | 150 Phone | 200 | NULL
How: Select column to pivot → Transform ribbon → Pivot Column → Choose value column and aggregation
Practice Questions: Data Transformations
Test your understanding with these hands-on exercises.
Task: Explain the difference between "Remove Columns" and "Remove Other Columns" operations.
Show Solution
Answer:
- Remove Columns: Deletes the selected columns, keeps all others
- Remove Other Columns: Keeps only the selected columns, deletes all others
Use Case: When you need to keep only 5 columns out of 50, use "Remove Other Columns" after selecting the 5 you want. This is faster than selecting and removing 45 columns individually.
Tip: "Remove Other Columns" is more resilient to source changes. If new columns are added to the source, they'll be automatically excluded, maintaining your intended column set.
Task: You need to combine sales data from North, South, East, and West regions into one table. Which operation should you use and why?
Show Solution
Answer: Use Append Queries because you're combining tables with the same structure (same columns) vertically (adding rows).
Steps:
- Home ribbon → Append Queries → Append Queries as New (creates a new combined query)
- Select "Three or more tables"
- Move North, South, East, and West queries to the "Tables to append" list
- Click OK
Result: A single query containing all rows from all four regions with matching column structure.
Tip: Add a custom column before appending to identify which region each row came from, or retain the query name as a source column during append configuration.
Task: You have a Sales table and a Products table. Find all products that have never been sold. Which join type should you use?
Show Solution
Answer: Use a Right Anti Join with Sales as the left table and Products as the right table.
Explanation:
- An Anti Join returns rows from one table that have NO match in the other table
- Right Anti Join keeps rows from the right table (Products) with no match in the left table (Sales)
- This gives you products that don't appear in any sales records
Steps:
- Start with Sales query
- Home ribbon → Merge Queries
- Select Products as the second table
- Match on ProductID in both tables
- Join Kind: Right Anti
- Result shows only products with no sales
Alternative: You could also use Left Anti Join by starting with Products and merging with Sales, which would be logically equivalent but conceptually clearer.
M Language Basics
Meet M: Power Query's secret language! Don't worry -- you can use Power Query without knowing M at all! Every button you click gets translated into M code automatically. But learning a bit of M is like learning magic spells -- it unlocks advanced powers that the buttons can't do. Think of it as optional superpowers for when you're ready!
What is M Language?
Here's the cool part: Every time you click "Remove Column" or "Filter Rows," Power Query writes a line of M code in the background! M is the actual language that does the work. It's like when you order a burger -- you just say "I'll have a cheeseburger," but the kitchen follows a detailed recipe. The buttons are your menu, M is the recipe!
Why learn M: 90% of tasks? Just use buttons! But for that special 10% -- custom calculations, dynamic column names, advanced error handling -- M lets you do things no button can. It's optional, but powerful when you need it!
M Language Basics (The Structure)
Every M query follows a simple recipe: Start with let (where you list all your steps), end with in (which step to show as the final result). It's like a cooking show: "Let me show you the steps... and HERE'S the final dish!" Each step builds on the previous one, like stacking building blocks.
// Simple M query structure
let
// Step 1: Define source
Source = Excel.Workbook(File.Contents("C:\Sales.xlsx")),
// Step 2: Navigate to sheet
SalesSheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
// Step 3: Promote headers
PromotedHeaders = Table.PromoteHeaders(SalesSheet),
// Step 4: Change data types
ChangedTypes = Table.TransformColumnTypes(PromotedHeaders, {
{"Date", type date},
{"Amount", type number},
{"Product", type text}
}),
// Step 5: Filter rows
FilteredRows = Table.SelectRows(ChangedTypes, each [Amount] > 1000)
in
FilteredRows
let...in pattern. The let section defines variables (steps), and the in section specifies which variable to return as the query result.
Common M Functions (Your Toolbox)
M has hundreds of pre-built functions -- like tools in a toolbox! You don't need to memorize them all (nobody does!). Just know the categories: Table.* for table operations, Text.* for text cleaning, Date.* for date work, List.* for lists. Think of the dot as opening a toolbox drawer -- Text.Upper means "open the Text drawer and grab the Upper tool!"
| Category | Functions | Example Usage |
|---|---|---|
| Table Functions | Table.SelectRows, Table.RemoveColumns, Table.TransformColumnTypes |
Manipulate table structure and filter data |
| Text Functions | Text.Upper, Text.Trim, Text.Split, Text.Length |
Clean and transform text values |
| Date Functions | Date.Year, Date.AddDays, Date.StartOfMonth |
Extract and manipulate date components |
| List Functions | List.Sum, List.Max, List.Distinct, List.Transform |
Aggregate and transform lists |
| Logical Functions | if...then...else, and, or, not |
Conditional logic and branching |
Creating Custom Columns with M
Need a calculation that no button can do? Custom columns are your answer! They're like Excel formulas but more powerful. Want to combine first and last name? Calculate days since an order? Add categories based on sales amount? Custom columns make it happen! Use each to say "do this for EVERY row" -- it's shorthand for "repeat this step on each row in the table."
// Add custom column: Full Name
= Table.AddColumn(ChangedTypes, "Full Name",
each [First Name] & " " & [Last Name]
)
// Add custom column: Sales Category (conditional logic)
= Table.AddColumn(FilteredRows, "Sales Category",
each if [Amount] >= 10000 then "High"
else if [Amount] >= 5000 then "Medium"
else "Low"
)
// Add custom column: Days Since Sale
= Table.AddColumn(ChangedTypes, "Days Since Sale",
each Duration.Days(DateTime.LocalNow() - [Order Date])
)
// Add custom column: Discount Amount (10% of price)
= Table.AddColumn(ChangedTypes, "Discount",
each [Price] * 0.10
)
each keyword as shorthand for row-by-row operations. each [Column] is equivalent to (currentRow) => currentRow[Column], which represents a function that operates on each row.
Error Handling in M (Don't Let Errors Stop You!)
What happens when data has errors? Maybe someone typed "ABC" in an age field. Normally, Power Query would stop and show an error. But with try...otherwise, you can say: "TRY to convert this to a number, OTHERWISE just put 0 (or null)." It's like having a safety net -- errors don't break your query, they get handled gracefully!
try _ otherwise null to replace errors with blank values, or try _ otherwise 0 for default numbers. The underscore _ means "the current value" -- it's shorthand to keep your code clean!
// Replace errors with null
= Table.TransformColumns(ChangedTypes, {
{"Amount", each try _ otherwise null}
})
// Replace errors with default value
= Table.TransformColumns(ChangedTypes, {
{"Price", each try _ otherwise 0}
})
// Remove rows with errors in specific column
= Table.SelectRows(ChangedTypes,
each not (try [Amount] otherwise null) is null
)
// Try...catch pattern for complex operations
= try
let result = ComplexOperation in result
otherwise
DefaultValue
Practice Questions: M Language Basics
Test your understanding with these hands-on exercises.
Task: Describe the purpose of the "let...in" structure in M language.
Show Solution
Answer: The let...in structure is the foundation of every M query:
- let section: Defines variables (steps) that represent transformations
- in section: Specifies which variable to return as the final query output
Example:
let
Source = Excel.Workbook(...), // Variable 1
FilteredRows = Table.SelectRows(...) // Variable 2
in
FilteredRows // Return this variable
Key Point: Each Applied Step you see in the GUI corresponds to a variable in the let section. The last step is always the variable referenced in the in section.
Task: Write M code to create a custom column that categorizes sales as "Q1", "Q2", "Q3", or "Q4" based on the month.
Show Solution
Answer:
= Table.AddColumn(ChangedTypes, "Quarter",
each
let month = Date.Month([Order Date])
in
if month <= 3 then "Q1"
else if month <= 6 then "Q2"
else if month <= 9 then "Q3"
else "Q4"
)
Alternative approach using List.PositionOf:
= Table.AddColumn(ChangedTypes, "Quarter",
each "Q" & Text.From(Number.RoundUp(Date.Month([Order Date])/3))
)
Explanation: The second approach divides the month (1-12) by 3 and rounds up, giving quarters 1-4, then concatenates "Q" prefix. This is more concise but less explicit than the if-then-else chain.
Advanced Query Techniques
Want your queries to run at lightning speed? These advanced techniques will take you from beginner to power user! Learn how to make databases do the heavy lifting (query folding), create reusable functions, use parameters for flexibility, and optimize performance like a pro. Ready to level up?
Query Folding Explained (Turbo Mode Activated!)
Here's a secret: Power Query can be super smart or super slow -- it depends on query folding! When folding works, Power Query tells your database "Hey, do this filtering for me!" The database (which is built for speed) does the work, then sends back only the results. Without folding, Power Query downloads EVERYTHING first, then does the work locally (much slower). It's like asking a chef to pre-cut vegetables vs. you chopping them yourself at home!
What is Query Folding?
Imagine ordering from a restaurant: With Folding = You say "bring me only vegetarian pizzas" and the kitchen only makes those. Without Folding = The kitchen makes ALL pizzas, delivers everything to your table, and THEN you pick out the vegetarian ones (wasting time and space!). Query folding means Power Query sends smart requests to your database: "Filter this, sort that" -- so the database does the hard work and returns only what you need.
Real Impact: You have 1 million rows in SQL Server but only need 1,000 after filtering. With folding: SQL returns 1,000 rows (instant!). Without folding: Download all 1 million, then filter locally (slow!). That's 1000x faster!
Operations That Fold (Fast Lane!)
- Filtering rows (WHERE clause)
- Removing columns (SELECT specific columns)
- Renaming columns
- Changing data types (CAST)
- Sorting (ORDER BY)
- Grouping and aggregation (GROUP BY)
- Merging queries (JOIN)
- Basic date/time operations
Operations That Break Folding (Slow Lane)
- Adding custom columns with complex M
- Merging queries from different sources
- Unpivoting columns
- Using Table.Buffer()
- Text operations (split, extract, etc.)
- Conditional column logic
- Replacing errors/values
Parameters for Dynamic Queries
Parameters make your queries flexible and reusable. Instead of hard-coding values like file paths or date ranges, you create a parameter that can be changed without editing the query. This is essential for monthly reports, switching between test/production environments, or creating user-configurable dashboards.
Step 1: Create a Parameter
First, define a parameter that will hold the dynamic value. In this example, we'll create a parameter for a file path so you can easily switch between different Excel files.
1. Home ribbon → Manage Parameters → New Parameter
2. Name: FilePath
3. Type: Text
4. Current Value: C:\Data\Sales.xlsx
5. Click OK
Step 2: Use the Parameter in Your Query
Once created, reference the parameter in your M code by its name. The parameter value replaces any hard-coded values, making the query dynamic.
// Before (hard-coded):
Source = Excel.Workbook(File.Contents("C:\Data\Sales.xlsx"))
// After (using parameter):
Source = Excel.Workbook(File.Contents(FilePath))
Step 3: Change Parameter Values
When you need to use a different file, simply update the parameter value. All queries using that parameter automatically use the new value on refresh.
1. Home ribbon → Manage Parameters
2. Select FilePath parameter
3. Change "Current Value" to: C:\Data\SalesQ2.xlsx
4. Click OK and refresh your queries
Custom Functions (Reusable Recipes)
Do you clean phone numbers the same way in 5 different queries? Stop repeating yourself! Custom functions are like saving a recipe -- write the cleaning steps once, then use that function anywhere. It's copy-paste on steroids: change the recipe in one place, and all dishes (queries) get updated automatically! Think of it as teaching Power Query a new trick that it can perform on command.
// Create a custom function to clean phone numbers
let
CleanPhoneNumber = (phoneNumber as text) as text =>
let
// Remove all non-numeric characters
cleaned = Text.Select(phoneNumber, {"0".."9"}),
// Format as (XXX) XXX-XXXX
formatted =
if Text.Length(cleaned) = 10 then
"(" & Text.Middle(cleaned, 0, 3) & ") " &
Text.Middle(cleaned, 3, 3) & "-" &
Text.Middle(cleaned, 6, 4)
else
cleaned
in
formatted
in
CleanPhoneNumber
// Use the function:
= Table.TransformColumns(Source, {
{"Phone", each CleanPhoneNumber(_)}
})
Performance Optimization Tips
Slow queries driving you crazy? These 6 proven strategies will transform sluggish data refreshes into lightning-fast operations. Performance optimization is about working smarter, not harder -- small changes in query design can yield massive speed improvements.
Filter Early
The #1 performance rule! Apply filters as early as possible in your query. Reducing from 1 million rows to 10,000 at the start means all subsequent transformations work with 99% less data.
Remove Columns Early
Less is more! Delete unnecessary columns immediately after import. Fewer columns mean less memory usage and faster processing. Use "Remove Other Columns" to keep only what you need.
Maximize Query Folding
Push work to the source! Arrange transformation steps to maintain query folding as long as possible. Move folding-breakers like unpivot or custom columns to the end of your sequence.
Disable Load for Helpers
Keep models clean! Reference queries used by other queries? Right-click and uncheck "Enable Load" so they work behind the scenes without cluttering your data model.
Avoid Table.Buffer()
Memory hog alert! Table.Buffer() loads entire tables into RAM and kills query folding. Only use when absolutely necessary (very rare). In 99% of cases, avoid it entirely.
Incremental Refresh
For massive datasets! Update only new or changed rows instead of reloading millions of historical records every refresh. Power BI Premium feature that's essential for large-scale implementations.
Practice Questions: Advanced Query Techniques
Test your understanding with these hands-on exercises.
Task: Explain how to check if query folding is occurring in your query.
Show Solution
Answer: Right-click any step in the Applied Steps pane and look for the "View Native Query" option:
- If available (not greyed out): Query folding is working up to that step. Click it to see the SQL or native query being sent to the data source.
- If greyed out: Query folding has stopped, either at this step or a previous one. Power Query is processing transformations locally instead of at the source.
Tip: Check after each transformation to identify which step breaks folding. Consider reordering steps or using alternative approaches to maintain folding for better performance.
Task: You have 5 queries that all connect to the same SQL table with identical transformation steps. How can you make this more maintainable?
Show Solution
Answer: Create a custom function that encapsulates the transformation logic:
- Create the base query with all transformations
- Right-click the query → Create Function
- Add parameters (e.g., table name, filter criteria)
- Call the function from each of your 5 queries with different parameters
Benefits:
- Update logic in one place, automatically applies to all queries
- Reduces code duplication and maintenance burden
- Ensures consistent transformations across queries
- Makes intent clearer (named function vs repeated steps)
Alternative: Create a base query with common steps, then reference it 5 times and add query-specific transformations to each reference.
Task: Your query imports 10 million rows from SQL Server and takes 30 minutes to refresh. The first 10 steps fold, but step 11 (adding a custom column) breaks folding. How would you optimize this?
Show Solution
Solution Strategy:
- Analyze the custom column logic: Can it be rewritten as a SQL calculation?
- Push calculation to source: If possible, create a database view or stored procedure that includes the calculation, maintaining full query folding
- Filter before custom column: If you only need a subset of rows, apply filters in steps 1-10 (which fold) to reduce dataset before the custom column calculation
- Simplify custom column: Break complex logic into multiple simpler steps that might fold individually
- Use calculated column in DAX instead: If the calculation doesn't need to happen in Power Query, move it to a DAX calculated column in the data model (evaluates during report interaction, not refresh)
Example: If custom column is "SalesCategory = IF Amount > 1000 THEN 'High' ELSE 'Low'", you could add this logic to the SQL view:
CREATE VIEW vw_Sales AS SELECT *, CASE WHEN Amount > 1000 THEN 'High' ELSE 'Low' END as SalesCategory FROM Sales
Result: Query folding preserved, calculation runs on SQL Server (optimized engine), refresh time potentially reduced from 30 minutes to seconds.
Key Takeaways
Automated ETL Workflows
Power Query automates the Extract-Transform-Load process with repeatable, transparent steps that update automatically when data refreshes. No more manual Excel manipulations or error-prone copy-paste workflows.
Applied Steps = Transparency
Every transformation is recorded as an Applied Step with viewable M code. This creates an audit trail, enables easy debugging, and allows you to modify or reorder transformations at any time.
Universal Data Connectivity
With 100+ connectors spanning files, databases, cloud services, and APIs, Power Query can connect to virtually any data source. Combine disparate sources seamlessly with merge and append operations.
Merge & Append Fundamentals
Merge queries combine tables horizontally (like SQL joins) with 5 join types. Append queries stack tables vertically for consolidation. Mastering these operations is essential for relational data integration.
M Language for Advanced Control
While the GUI covers 90% of scenarios, M language (Power Query Formula Language) unlocks advanced transformations, custom functions, dynamic logic, and error handling that point-and-click can't achieve.
Query Folding Optimization
Query folding pushes transformations to the data source (e.g., SQL Server) instead of processing locally. Check "View Native Query" to verify folding and ensure optimal performance for large datasets.
Practice Questions: Optimizing Slow Refresh & Restoring Query Folding
Master performance optimization techniques for large datasets with these advanced scenarios.
Scenario: Your query on 5 million rows from SQL Server has these steps:
2. Filter rows where Year = 2024
3. Remove unnecessary columns
4. Add Custom Column = [Amount] * 1.1
5. Rename columns
6. Change data types
Question: Which step breaks query folding? Why?
Show Solution
Answer: Step 4 - Add Custom Column
Why it breaks folding:
- Steps 1-3 fold: Filter, remove columns, and source operations can be translated to SQL (WHERE, SELECT, FROM)
- Step 4 breaks: Custom columns require M language logic that SQL Server doesn't understand. Power Query must download ALL 5 million rows and process the calculation locally
- Steps 5-6 don't fold back: Once folding breaks, all subsequent steps also process locally
Impact: Instead of SQL Server filtering to relevant 2024 rows, Power Query downloads millions of rows, calculates the custom column, then filters. This is 100x+ slower!
Example: If you need Amount * 1.1 calculation, create a SQL view instead:
CREATE VIEW vw_SalesAdjusted AS
SELECT
SalesID,
ProductID,
Amount,
CAST(Amount AS FLOAT) * 1.1 as AdjustedAmount,
YEAR(SaleDate) as SalesYear
FROM Sales
WHERE YEAR(SaleDate) = 2024
Result: Query folding maintained throughout! Calculation happens on SQL Server (the optimized engine). Power Query downloads only 2024 data with adjusted amounts pre-calculated. Refresh: 100x faster!
Solution: Move custom column to the end OR use a SQL query/view to pre-calculate on the server side.
Problem: Your refresh takes 45 minutes for a 50 million row dataset from SQL Server. Current query:
Step 1: Source = SQL Server table (50M rows)
Step 2: Filter = Filtered to current year only (1.2M rows)
Step 3: Custom Column = Calculate profit margin
Step 4: Filter = Remove rows where margin < 5%
Step 5: Unpivot monthly columns
Step 6: Group By = Aggregate by category
Question: How would you refactor this query to optimize refresh time and maximize query folding?
Show Solution
Optimized Approach:
Step 1: Source = SQL Server (SELECT only needed columns) ✓ Folds
Step 2: Filter = WHERE year = 2024 ✓ Folds
Step 3: Filter = Add 2nd filter for margin calc (done in SQL) ✓ Folds
Step 4: Remove unnecessary columns ✓ Folds
Step 5: Change data types ✓ Folds
Step 6: Custom Column = profit margin (can't fold) ✗ Doesn't fold
Step 7: Filter = margin > 5% (processes 1.2M locally, filters to 300K)
Step 8: Unpivot ✗ Doesn't fold
Step 9: Group By ✗ Doesn't fold
Key improvements:
- Push filtering to SQL: Modify the source to include `WHERE Year = 2024` in the SQL query, not as a filter step
- Filter early before custom calculations: Every step before the custom column processes 50M rows. Move Step 2 before Step 3
- Minimize folding breakers: Keep custom columns and unpivot at the end (they'll process minimal rows)
- Use SQL functions: Calculate margin in the SQL query itself, letting the server do the math
Expected improvement: From 45 minutes to 2-3 minutes by:
- Reducing downloaded rows from 50M → 1.2M (early filter)
- Further reducing to 300K before custom column (double filtering)
- Having server handle numeric calculations (SQL is faster)
Example: If your custom column calculates profit margin = (Amount - Cost) / Cost, push this to SQL:
CREATE VIEW vw_Sales_Optimized AS
SELECT
SalesID,
ProductID,
Amount,
Cost,
CASE
WHEN YEAR(SaleDate) = 2024 THEN 1
ELSE 0
END as IsCurrentYear,
CAST((Amount - Cost) AS FLOAT) / NULLIF(Cost, 0) as ProfitMargin,
CASE
WHEN (Amount - Cost) / NULLIF(Cost, 0) > 0.05 THEN 'Pass'
ELSE 'Filter Out'
END as MarginStatus
FROM Sales
WHERE YEAR(SaleDate) >= 2023
Result: Query folding preserved, calculation runs on SQL Server (optimized engine), refresh time reduced from 30+ minutes to seconds. Power Query now only downloads 300K pre-filtered rows with margins already calculated!
Scenario: You have a SQL Server query with 8 transformation steps. When you right-click Step 3 (Filter rows), "View Native Query" is available. But when you right-click Step 5 (Added Custom Column), it's greyed out.
Question: What does this tell you? How would you fix it?
Show Solution
What it tells you:
- Steps 1-3 fold: The native SQL query is being constructed up to the filter
- Step 4 or 5 breaks folding: One of those steps contains an operation that can't be translated to SQL
- All subsequent steps don't fold: Once folding breaks, steps 6, 7, 8 also process locally
How to diagnose the exact breaking point:
- Right-click Step 4 → Check if "View Native Query" is available
- If available: Step 4 still folds, so Step 5 (custom column) broke it
- If greyed out: Step 4 broke it (might be a text operation, unpivot, or complex transformation)
Solutions to restore folding:
- Option 1 - Reorder: Move the folding-breaking step (custom column) to the very end, after all folding operations
- Option 2 - SQL approach: Calculate in your SQL source query instead of in Power Query
- Option 3 - Use separate query: Create a helper query that does the foldable steps, then reference it for the custom column
- Option 4 - Avoid Table.Buffer(): If used, remove it immediately (it completely kills folding)
Performance impact: Restoring folding by reordering might reduce refresh time from 30 minutes to 5 minutes for large datasets!
Complex scenario: You're building a sales analytics query that:
- Merges Sales table (100M rows) with Products table (50K rows) on ProductID
- Merges with Categories table (200 rows) on CategoryID
- Filters to current year only
- Calculates revenue per unit sold
- Unpivots monthly sales columns
- Groups by category and month
The refresh takes 2 hours. How would you optimize this?
Show Solution
Optimization Strategy:
1. Push filtering to the source (Game changer!):
// DON'T do this:
Source = SQL.Database("server", "database", "SELECT * FROM Sales")
= Table.SelectRows(Source, each [Year] = 2024)
// DO this instead:
Source = SQL.Database("server", "database",
[Query = "SELECT * FROM Sales WHERE Year = 2024"])
Impact: Download 100M → 10M rows (90% reduction!)
2. Merge with smallest tables first:
Step 1: Sales (10M filtered rows) - source
Step 2: Merge with Categories (200 rows) → result: 10M rows
Step 3: Merge with Products (50K rows) → result: 10M rows
(Both merges work on reasonable dataset sizes)
3. Keep foldable operations early:
Step 1: SQL Source (filtered) ✓ Folds to SQL
Step 2: Remove unnecessary columns ✓ Folds
Step 3: Merge with Categories (can fold if using merge-as-join)
Step 4: Merge with Products (might break folding)
Step 5: Change data types ✓ Still folds (probably)
Step 6: Custom column (revenue calc) ✗ Breaks folding
4. Group/Aggregate on SQL if possible:
-- Consider pre-aggregating on SQL Server instead of in Power Query:
SELECT
CategoryID,
Month,
SUM(Quantity) as TotalQty,
SUM(Amount) as TotalAmount,
SUM(Amount) / NULLIF(SUM(Quantity), 0) as AvgPrice
FROM Sales
WHERE Year = 2024
GROUP BY CategoryID, Month
Example: Create a pre-aggregated view for Power Query to consume:
CREATE VIEW vw_SalesSummary AS
SELECT
s.CategoryID,
c.CategoryName,
YEAR(s.SaleDate) as SalesYear,
MONTH(s.SaleDate) as SalesMonth,
COUNT(DISTINCT s.OrderID) as OrderCount,
SUM(s.Quantity) as TotalQuantity,
SUM(s.Amount) as TotalRevenue,
SUM(s.Amount - s.Cost) as TotalProfit,
CAST(SUM(s.Amount - s.Cost) AS FLOAT) / NULLIF(SUM(s.Amount), 0) as ProfitMargin,
CAST(SUM(s.Amount) AS FLOAT) / NULLIF(COUNT(DISTINCT s.OrderID), 0) as AvgOrderValue
FROM Sales s
INNER JOIN Categories c ON s.CategoryID = c.CategoryID
WHERE YEAR(s.SaleDate) >= 2023
GROUP BY s.CategoryID, c.CategoryName, YEAR(s.SaleDate), MONTH(s.SaleDate)
Result: Instead of Power Query downloading 100M rows and performing unpivot + grouping, SQL Server returns pre-aggregated data (potentially just 12-24 rows per category). Power Query only needs to load, rename columns, and format. Refresh time: 2 hours → 5-10 minutes!
Expected improvements:
| Optimization | Impact | Effort |
|---|---|---|
| Push year filter to SQL | 100M → 10M rows (10x faster) | Low |
| Merge before unpivot | 30% faster merging | Low |
| Pre-aggregate on SQL | 10M → 1K rows final (100x faster) | Medium |
| Total optimization | 2 hours → 5-10 minutes (10-20x faster!) | Medium |
Pro tip: Ask yourself: "Can SQL Server do this?" If yes, let it! Databases are 100x faster than Power Query at filtering, joining, and aggregating.
Challenge: You need to merge data from two different sources with different folding capabilities:
- Source A: SQL Server table (10M rows) - supports query folding
- Source B: Excel file (5K rows) - does NOT support query folding
- You need to merge A and B together
Question: How would you structure this query to maximize performance?
Show Solution
Answer: Preserve SQL folding as long as possible
DO THIS (Optimal):
Step 1: SQL Source = Query with filters applied ✓ Folds
Step 2: Filter = WHERE conditions ✓ Still folds
Step 3: Remove columns ✓ Still folds
Step 4: Excel Source = Load the 5K row file (separate query)
Step 5: Merge SQL (after folding optimizations) with Excel
✓ Folding breaks here (mixing sources)
✓ But you've minimized SQL data FIRST (e.g., 1M rows instead of 10M)
DON'T DO THIS (Inefficient):
Step 1: SQL Source = Get all 10M rows (no filters)
Step 2: Excel = Load 5K rows
Step 3: Merge = Download all 10M + process locally ✗ SLOW
Step 4: Then filter ✗ Way too late
Why the difference matters:
- Optimal: Filter 10M → 1M in SQL, merge 1M with 5K locally = 5M operations
- Inefficient: Get 10M from SQL, merge with 5K locally = 50M operations (10x slower!)
Key principle: Maximize folding with queryable sources BEFORE merging with non-foldable sources
Advanced optimization:
- Create separate queries for each source (SQL and Excel)
- Apply heavy filtering to the SQL query alone
- Only after optimization, merge the two queries
- Do additional transformations (custom columns, unpivot) AFTER merge
Example: Create an optimized SQL view that does the heavy lifting:
CREATE VIEW vw_Orders_Optimized AS
SELECT
o.OrderID,
o.CustomerID,
o.Amount,
o.SaleDate,
CASE
WHEN o.Amount > 5000 THEN 'High Value'
WHEN o.Amount > 1000 THEN 'Medium Value'
ELSE 'Standard'
END as ValueCategory,
YEAR(o.SaleDate) as OrderYear
FROM Orders o
WHERE YEAR(o.SaleDate) = 2024
AND o.Status = 'Active'
AND o.Amount > 100
Then in Power Query, use this optimized view:
// Query 1: OptimizedOrders (pre-filtered from SQL view)
let
Source = SQL.Database("server", "db",
[Query = "SELECT * FROM vw_Orders_Optimized"]),
RemovedColumns = Table.RemoveColumns(Source, {"Debug"})
in
RemovedColumns
// Query 2: CustomerLookup (from Excel)
let
Source = Excel.Workbook(File.Contents("C:\Lookups\Customers.xlsx"))
in
Source
// Query 3: MergedData (combine after optimization)
let
Merged = Table.NestedJoin(OptimizedOrders, "CustomerID", CustomerLookup, "ID", "Lookup"),
ExpandLookup = Table.ExpandTableColumn(Merged, "Lookup", {"Name"}, {"CustomerName"}),
CustomCalc = Table.AddColumn(ExpandLookup, "Profit", each [Amount] * 0.3)
in
CustomCalc
Result: SQL handles filtering and calculations (optimized engine). Power Query only downloads 1M pre-filtered rows + 5K lookup = 1M total operations instead of 10M. Query runs in minutes instead of hours!
Scenario: Your report refresh takes 45 minutes. You don't know which query is slow. How would you systematically identify and fix the bottleneck?
Question: What's your testing methodology?
Show Solution
Systematic Bottleneck Identification Process:
Phase 1: Isolate the slow query
- In Power BI Desktop, go to File → Options → Performance Analyzer
- Enable it and refresh your workbook
- The Performance Analyzer shows which queries are slowest
- Focus on the query(ies) taking >5 minutes
Phase 2: Check query folding status
In Power Query Editor:
1. Open the slow query
2. Go through each Applied Step
3. Right-click each step → "View Native Query"
4. Note where it becomes greyed out
5. This is where folding breaks
Phase 3: Quantify the impact
// Test by disabling folding-breaking steps:
1. Temporarily delete steps after folding breaks
2. Refresh and note the time
3. Re-enable steps one by one
4. Identify which step causes the biggest slowdown
5. Data rows downloaded = indicator of folding success
Phase 4: Apply optimization strategies
| Problem Identified | Solution | Expected Improvement |
|---|---|---|
| No filters before custom column | Add filter before custom column step | 50-70% faster |
| Merging large table first | Reorder to merge small table first | 20-40% faster |
| Downloading all rows unnecessarily | Add WHERE clause in SQL source | 50-90% faster (biggest gain!) |
| Using Table.Buffer() | Remove it | 30-60% faster |
| Complex M calculations | Pre-calculate on SQL server | 40-80% faster |
Phase 5: Validate improvement
- Make one optimization change
- Refresh and measure time (note in Performance Analyzer)
- If improvement >10%, keep it
- Apply next optimization
- Repeat until acceptable performance (<5 min preferred)
Real-world example:
Starting: 45 minutes (all 50M rows downloaded)
After Step 1 (add WHERE to SQL): 15 minutes (90% reduction)
After Step 2 (reorder merges): 12 minutes (20% reduction)
After Step 3 (move custom column): 8 minutes (33% reduction)
Final: 8 minutes total (5.6x improvement!)
ROI: 30-40 minutes saved per refresh = 2.5+ hours per week
Key takeaway: Most slow queries have a single root cause (usually downloading all rows instead of filtering). Find it with Performance Analyzer, fix it with SQL filtering, and watch your refresh time collapse!