we should load a replica of the identical information into Energy Question. Energy Question provides two methods to get the identical information twice: Duplicate and Reference. Let’s take a look at the distinction between these two options and when to make use of one over the opposite.
Introduction
I’ll have to load the identical information twice into Energy Question and subsequently into Energy BI.
This may occur after I should break up the information columns or carry out different transformations on the information, or after I have to extract information from a desk in two other ways.
Energy Question provides us two options to perform this:
- Duplicate:
This duplicates the M-Code for the desk and creates a brand new desk. - Reference:
This takes the output of a desk and creates a brand new desk. All adjustments made to the supply desk are additionally seen within the referencing desk.
You would possibly argue that after I use Reference, the information is learn as soon as from the supply, as I take the output of 1 desk and reuse it for a special output.
That is what this text is about: Is that this true or false?
Getting ready the Instruments
I exploit SQL Server as the information supply and SQL Profiler to research what occurs within the database.
SQL Profiler is a Instrument that may intercept all of the visitors on a SQL Server occasion.
Luckily, SQL Profiler is a part of SQL Server Management Studio (SSMS) and is free to make use of.
You may learn this piece on Medium for a extra detailed description of SQL Server Profiler: Mastering SQL Server Profiler: A Step-by-Step Guide to Unlocking Database Insights
One other approach to analyze the conduct of those two Options is Energy Question Diagnostics.
I wrote this piece on Medium about Energy Question Diagnostics: Analyzing Power Query with Load Traces
I invite you to learn it to search out out extra about this device.
However let’s return to SQL Server Profiler and how one can begin it and put together it for our particular state of affairs.
I can begin SQL profile from the Begin Menu or immediately from SSMS:
After beginning it, I have to choose the connection to my native SQL Server Occasion:

Subsequent, I arrange the Hint.
- I give it a Title and choose the TSQL template to trace the queries coming from Energy Question.
- I activate the “Save to File” possibility and choose the folder for the Hint file.
I can later open this hint file in Profiler and look into it in additional element if I want. - I swap to the second web page, “Occasion Choice”
- I activate the 2 choices “Present all occasions”.
- Within the record of all Occasions, I choose SQL:StmtStarting and SQL:StmtCompleted to get the SQL code from the queries.
- I deselect all occasions, besides the three beneath SQL.
- I deselect most columns besides these to trace the Question Textual content, Begin and Finish Time, Length, and different statistics.
That is the way it takes care of the setup (With the choice “Present all occasions” deactivated):

Lastly, I arrange a filter on my supply database to hint solely the visitors on that database:

With out this filter, I’ll get the visitors on all databases. This shall be overwhelming for a manufacturing occasion as there shall be a variety of visitors from different purposes and customers. I’d even add a filter to limit the hint to observe just for visitors from my NTUserName (My Home windows Consumer ID) to exclude all different visitors on the database.
Now I click on on Run to start out the Hint.
Importing the Knowledge into Energy Question
I exploit a View within the database named FactOnlineSales_withCustomer as my supply.
I import this View into Energy Question with out every other transformation steps. This may trigger Energy Question to get the information with a easy SQL Question from the database.
I can discover this question with out problem within the Hint Log.
Create a Duplicate and verify what occurs.
After importing the information into Energy Question, I create a Duplicate of the imported desk and cargo the information into Energy BI:

As anticipated, I see the identical Question executed twice in SQL Profiler:

You may see that the information has been retrieved twice with the identical variety of rows (The final two strains within the hint.
I anticipated this to occur, as Duplicate copies the M-Code to create a brand new desk.
One other key column is SPID. That is the inner session ID on the SQL Server occasion. Two completely different SPIDs point out that Energy Question began to separate connections to get the information twice.
This column shall be necessary when analyzing the visitors from a Referencing desk.
Create a Reference and verify what occurs.
Now, I attempt the Reference function.
I first delete the desk “FactOnlineSales_WithCustomer_Duplicate” and create a Reference from the unique “FactOnlineSales_WithCustomer” desk:

In SQL Profiler, I can clear the view to see solely new entries by clicking on the eraser button to clear the hint (This is not going to delete any information from the saved Hint file):

After refreshing the Knowledge from Energy BI, I get this lead to SQL Profiler:

Astonishingly, the information was learn twice within the database.
I can see that there are positively two connections, because the column SPID (Session ID) has two completely different numbers for the 2 SQL:StmtCompleted entries.
Because of this, from the load visitors perspective, there isn’t any distinction between duplicating and referencing a desk.
However when each trigger the identical visitors on the supply, why ought to I exploit Duplicate over Reference in Energy Question?
When utilizing Reference and when Duplicate
A while in the past, I wrote an article about changing a flat desk to a Star Schema with Energy Question: Converting a Flat Table to a Good Data Model in Power Query
On this article, I described that some operations usually are not attainable when creating a brand new desk by referencing an current desk.
For instance, Energy Question doesn’t permit merging a referencing desk with the unique desk due to a round reference.
In such a case, I have to duplicate the unique desk.
It is because a referencing desk is at all times based mostly on the final step of the referenced desk.
That is the important thing distinction between “Duplicate” and “Reference” in Energy Question:
- Duplicate is a wholly new load with out dependency on the unique desk. Modifications to the unique desk don’t have an effect on the duplicated desk.
- A Referencing desk relies on the end result of the referenced desk. Consequently, adjustments utilized to the referenced desk are robotically utilized to the referencing desk.
To be exact, the adjustments usually are not utilized, however the enter desk adjustments due to the change within the referenced desk.
However when it is advisable extract a subset from the unique desk with out altering the unique desk, Reference is the way in which to go, particularly when it’s very important to at all times get the output from the referencing desk.
In order for you a desk from the identical supply however don’t need adjustments to the unique desk utilized to the brand new desk, then you need to duplicate the unique desk.
Bear in mind that Duplicate means a duplication of the Load logic. Because of this whenever you apply a change to the unique desk, you would possibly want to repeat the logic to the duplicated desk as properly.
Potential for Conflicts throughout load
One other potential challenge is that load conflicts can happen when loading information from some sources. Excel is one among these sources that may trigger issues.
The supply of the issue is that Energy Question tries to load the information in parallel. Some sources usually are not capable of deal with parallel connections.
In such a case, you need to change a parameter to keep away from parallel loading:

The default worth is 4.
If issues happen, you would possibly have to set both a decrease Customized worth or set it to “One (disable parallel loading)” to keep away from any points altogether.
Conclusion
In Energy Question, there isn’t any distinction between “Duplicate” and “Reference” concerning load efficiency or community visitors.
Each load the information independently from the supply with a separate connection.
Subsequently, I debunked the parable that “Reference” can enhance load efficiency.
Nonetheless, realizing these two options’ variations is important, as they provide distinct potentialities when loading and reworking information.
Anyway, when loading information from a relational database, I’d create two queries or two views for the 2 tables, as an alternative of offloading any transformation to Energy Question.
In keeping with Roche’s maxim of Data Transformation:
Knowledge must be remodeled as far upstream as attainable, and as far downstream as obligatory.
However when loading textual content, Excel recordsdata, or different sources to which I can’t ship a question to get the information in the way in which I want it, I have to use both “Duplicate” or “Reference” in response to the required final result.
References
Like in my earlier articles, I exploit the Contoso pattern dataset. You may obtain the ContosoRetailDW Dataset at no cost from Microsoft here.
The Contoso Knowledge might be freely used beneath the MIT License, as described in this document.
I modified the dataset to shift the information to up to date dates.