Enhancing SUMX Iterator through Callbacks Optimization
Databases
Oct 29, 2023 5:45 PM

Enhancing SUMX Iterator through Callbacks Optimization

by HubSite 365 about SQLBI

Pro UserDatabasesLearning Selection

Learn how to optimize SUMX iterators in Microsoft DAX by reducing callbacks, improving query performance and simplifying database analyzes.

The YouTube video presented by "SQLBI" is centred on the ways of optimizing SUMX iterations in databases. SUMX, being a common tool for database management, can often cause issues requiring callbacks in expressions. After using the system, the author finds that pushing calculations to the VertiPaq storage engine can have many benefits.

Importantly, DAX developers are advised against fearing iterators largely due to their impressive performance when computed expressions are pressed down to the VertiPaq storage engine. An exemplary measure like Sales Amount has been used to bring this home:

In an instance where a test query is run on Contoso 100M dataset (containing 200M rows), it had brilliant performance, with only 200ms of storage engine CPU utilized, resolved within a single xmSQL query. Though different hardware would yield varying results, this query execution speed was deemed optimal.

 

A component of speed is that the VertiPaq storage engine can swiftly carry out the aggregated expression as it is a basic multiplication operation. A minor change in the iterated expression has a noteworthy negative effect on performance. To illustrate, a ROUND function is elucidated in the video where rounding prices to the nearest decimal is integrated.

The storage engine fails to execute ROUND and in consequence, intervention from the formula engine through VertiCalc and a callback are required. Though this approach has a noticeable effect on speed, it can still handle massive workload due to parallelism.

In an attempt to increase performance, the author shared tips on how to reduce the number of callbacks while working with databases. The video highlights, callbacks play a crucial part in tabular and are significantly beneficial when used appropriately. Familiarizing oneself with ROUND and the fewer times it needs to be performed can help achieve better performance. All things considered, the author suggests a careful analysis of the expression being aggregated to detect areas to optimize the process.

Additional Insights

As a general note, understanding how to optimize callbacks in SUMX iterators is an essential skill in data management. The ability to push calculations to the VertiPaq storage engine and knowing when to use this strategy can increase the effective manipulation of databases. It's also crucial to not fear the use of iterators and instead focus on how expressions can be optimized by pushing them down to the storage engine. Furthermore, understanding how to use ROUND correctly and efficiently can make a huge difference in reducing the callbacks required and thus the overall performance of your databases operation.

Please, click on this link for more information on Databases.

 

Optimizing callbacks in a SUMX iterator

If you aim to gain proficiency in optimizing callbacks, then understanding fundamental database concepts and grasping the workings of the VertiPaq storage engine is essential. It would consequently be beneficial to participate in training courses focusing on storage engine management, DAX expression writing, and SQL querying.

When SUMX iterators aren’t performing up to the mark, adequate assessments of the aggregated expressions can bring about viable optimization ideas, such as exploiting the capabilities of the VertiPaq storage engine whenever feasible. Primarily, the better the computation can be pushed down to the engine, the higher chances of generating superior query plans.

Experts in DAX should not dread iterators considering their substantial performance given that the expression calculated during the iteration can be pushed down to our previously mentioned engine. Taking the example of a typical measure like Sales Amount, we can execute it smoothly:

DEFINE MEASURE Sales[Sales Amount] = SUMX ( Sales, Sales[Quantity] * Sales[Net Price])
EVALUATE
SUMMARIZECOLUMNS ('Product'[Color], "Sales Amount", [Sales Amount])

In many database variants, like our 'Contoso 100M' for instance, this yields exceptional performance outputs. You can expect minimal usage of CPU resources and resolution into a single xmSQL query. Despite varying results on diverse hardware, it usually performs at peak efficiency.

The next aspect to navigate is if we have a slight alteration in the iterated expression. This might have an intense negative influence on performance. Let's consider here the addition of a ROUND function to round the net price to the initial decimal:

DEFINE MEASURE Sales[Sales Amount] = SUMX ( Sales, Sales[Quantity] * ROUND ( Sales[Net Price], 1 ))
EVALUATE
SUMMARIZECOLUMNS ('Product'[Color], "Sales Amount", [Sales Amount])

Unfortunately, the storage engine can't execute ROUND. Consequently, for such situations, it requires the formula engine's intervention—leading to a callback. Being pricey, an effective workaround to this problem could be to field the result of ROUND in a calculated column, escaping its computation at the query time. This technique, however, is infrequently applicable, as it implies an increase in factual table size with a potentially large column.

While callbacks hold an essential role in Tabular and contribute significantly to achieving commendable performance when used sensibly, their reduction is always a step towards better performance. In this case, the key lies in analyzing the aggregated expression:

 

MEASURE Sales[Sales Amount] = SUMX ( Sales, Sales[Quantity] * ROUND ( Sales[Net Price], 1 ) )

It’s crucial to know that the ROUND function is called for each data row. Hence, its reduction will invariably boost performance. This analysis reveals an essential insight: there’s no need to execute the rounding for all values as there aren’t as many different values for Sales[Net Price]. Consequently, these rounded figures can be computed and multiplied with corresponding figures already rounded.

While enforcing this may gather better performance than the original Sales Amount, it is much faster as it curtails the calls to ROUND. Meanwhile, the ROUND invocation transpires in the formula engine, eliminating any necessity for communication between the formula engine and the storage engine. With no longer a need for callbacks, the use of the Tabular's cache system enhances significantly.

Keep in mind, though, that some attempts to make the code more readable and well-structured, like the introduction of variables, may unintentionally hurt performance. Therefore, always be careful with your incorporating functions that require extra computations.

In conclusion, the presence of callbacks can throttle the performance of your DAX code within iterations across large tables. Be that as it may, a meticulous scrutiny of the expression being calculated might curb the iterations and cut down the function calls needing the formula engine.

Though not always attainable, optimization usually fetches splendid outcomes. Hence, always keep this handy in your DAX optimization toolkit, and utilize it whenever possible.

Keywords

Optimizing callbacks, SUMX iterator, callback optimization, SUMX optimization, improvements in SUMX iterators, efficient callbacks, boost SUMX performance, reduce callback times, speeding up SUMX iterators, enhancing callback efficiency.