Originally, Excel.Workbook used to read the entire workbook data to accurately assign types to the columns. This was EXTREMELY slow on large workbooks.
Having this in mind, we added this flag to delay the behavior. When set to “true”, we don’t infer any column types. Instead, the UI uses its normal inference algorithm to generate a Table.TransformColumnTypes step with the inferred types and if it inferred wrongly, the user can update the transformation explicitly.
I also posted a question on the Power Query MSDN forum and got some more details from Colin Banfield, one of the many M experts who hang out there, as well as doing some investigations of my own. This post summarises what I found out.
First of all, a simple demonstration of what it does. Consider the following Excel table:
Connecting to this table in the December 2018 release of Power BI Desktop generates the following M query:
Notice that the delayTypes option is now automatically set to true and that there is a “Changed Type” step. The output of this query, as seen in the Power Query Editor, is what you would expect:
Compare this with the output of the following query which has delayTypes set to false and has no “Changed Type” step:
There are two things to notice:
- With delayTypes set to false no “Changed Type” step is necessary, the data types on the columns are set by Excel.Workbook(). With delayTypes set to true all the columns returned by Excel.Workbook() have their data type set to Any and a subsequent “Changed Type” step is necessary to set data types to something useful.
- With delayTypes set to true the IntegerColumn and the DecimalColumn columns have different data types assigned to them by the “ChangedType” step: IntegerColumn becomes Int64.Type whereas DecimalColumn becomes type number. With delayTypes set to false both columns are set to number. So it looks like Excel.Workbook with delayTypes set to false will only convert columns to primitive data types.
The second question is: if it is, primarily, a performance optimisation how much of an impact does it have? I built four queries that read all the data from an 80MB xlsx file to test the performance of all of the variations of delayTypes true/false and having a “Changed Type” step present or not (just in case there was an overhead to having a “Changed Type” step). Here are the results – the main timings are in minutes and seconds, and in brackets is the length of the “Evaluating” phase of query execution:
|no “Changed Type” step||2:08
|“Changed Type” present||2:19
I wouldn’t read too much into these exact timings because in my experience timings for the same query can fluctuate quite a lot, but one thing is clear: setting delayTypes to true results in much better performance. Indeed with delayTypes set to false nearly half the time was spent in the “Evaluating” phase whereas with delayTypes set to true there was no “Evaluating” phase at all. As a result I can’t think of a reason not to set the delayTypes option to true.
It’s worth noting that delayTypes is set to false by default if you don’t specify this option, so be aware of this if you are writing your own M code.