Fabric: Live Pools, Profiles & Fixes

von HubSite 365 über Reza Rad (RADACAD) [MVP]

Data Analytics Microsoft Fabric Learning Selection

Microsoft expert unpacks Fabric Spark performance, Custom Live Pools, Resource Profiles and Power BI compute strategies.

Key insights

Fabric Spark vs Apache Spark: Microsoft uses a customized Spark in Fabric to serve both data engineering and data science workloads.
These customizations target faster interactive performance and tighter integration with Fabric services.
Custom Live Pools: pre-hydrated Spark clusters that start notebooks in about 5 seconds during a scheduled window.
They reduce wait times compared with standard custom pools that can take minutes to provision.
Resource Profiles and pool sizing: define how many clusters stay warm, which environment they use, and which workloads (for example writeHeavy or readHeavyForPBI) they optimize.
Correct sizing prevents over‑ or under‑provisioning and improves concurrency handling.
Compute Strategy: choose between fixed, autoscale, or hybrid approaches and understand that "bursting" is not the same as autoscale.
Match strategy to predictable schedules and peak concurrency for best cost and performance.
Common performance issues: cold starts, missed session sharing, and capacity underutilization often cause slow experiences.
Use diagnostics, throttling checks, and monitor the hard ceiling for Max Job Lifetime to troubleshoot slow runs.
Operational guidance: require paid capacity SKUs and schedule pools to publish them; republish environments after library changes and hydrate clusters before use.
Apply 60–90 minute schedule buffers, group environments with common libraries, and tune idle timeouts using utilization metrics for a controlled rollout.

In a recent episode of the Fabric Insider series, Reza Rad (RADACAD) [MVP] interviews Santhosh Kumar Ravindran from the Microsoft Fabric Data Engineering team about Spark performance in Microsoft Fabric. The conversation focuses on practical fixes and new features that aim to reduce notebook startup times and improve interactive experiences for data engineers and data scientists. As the episode explains, the difference between perceived slowness and configuration choices is often the root cause of poor performance, not the underlying Fabric Spark runtime itself. Consequently, the discussion highlights both product capabilities and operational tradeoffs teams must weigh.

Episode Overview and Context

The episode, titled “Fabric Spark: Custom Live Pools, Resource Profiles & Performance Fixes,” frames its discussion around three central themes: pre-warmed compute, workload-aware sizing, and diagnostic practices. Reza Rad and Santhosh walk through how Microsoft has adapted open-source Spark to fit Fabric’s notebook-first experience and why those adaptations matter for interactive analytics. They also explain how Fabric’s governance model layers capacity, workspace, environment, and session-level settings to control compute behavior. Thus, listeners get both conceptual background and operational guidance in a single session.

Custom Live Pools: How They Work and When to Use Them

One of the flagship features discussed is custom live pools, which keep clusters pre-hydrated on a schedule so notebook sessions can start in roughly five seconds instead of minutes. This approach reduces cold start delays by preparing nodes and environments before users connect, and it works best for predictable, recurring notebook workloads such as scheduled runs or routine explorations. However, teams must accept tradeoffs: live pools require paid capacity SKUs, mandatory schedules, and they support only notebook-based sessions rather than all Spark job types.

Furthermore, Santhosh explains that while live pools improve interactivity, they shift some responsibilities to capacity planning and scheduling. Organizations must choose how many clusters to keep warm and what environments to attach, which increases predictability but also locks in reserved resources during the scheduled window. Therefore, teams that prioritize low-latency interactive work will value the responsiveness, while those focused on cost minimization may prefer on-demand pools despite longer startup times.

Resource Profiles and Capacity Strategy

Alongside live pools, the episode introduces Resource Profiles as a way to tune cluster sizing and behavior for different workload patterns. These profiles let administrators set characteristics like read-heavy or write-heavy behaviors, which in turn influence how many clusters stay warm and what instance types are used. By matching profiles to usage patterns, organizations can balance concurrency, cost, and startup time more effectively.

Yet, configuring resource profiles requires careful testing and observation. Santhosh recommends starting with conservative max cluster counts and using utilization metrics to refine idle timeouts and schedules, because over-provisioning wastes capacity while under-provisioning causes wait problems. As a result, the best strategy combines telemetry-driven adjustments with an initial schedule buffer to guarantee hydration completes before peak use.

Performance Fixes, Diagnostics, and Tradeoffs

The episode also addresses why Spark sometimes “feels slow” and how policy choices can create cold starts, missed session sharing, and capacity underutilization. Diagnostics and throttling resolution are core topics, and Santhosh points to session-level instrumentation and capacity metrics as primary tools to pinpoint bottlenecks. Moreover, the team highlights limits like the Max Job Lifetime that impose hard ceilings and must be considered when designing long-running workloads.

Tradeoffs emerge clearly: aggressive session-sharing and longer idle timeouts improve utilization but can increase contention for CPU and memory, while strict isolation improves predictability at the cost of resource efficiency. Therefore, architects must weigh the need for fast interactive response against the financial and operational consequences of reserved capacity. The episode presents practical diagnostic steps and emphasizes iterative tuning rather than one-size-fits-all settings.

Practical Rollout and Roadmap

Finally, Reza and Santhosh outline an actionable rollout plan for organizations that want to adopt these features without disruption. They recommend pilot projects that target a subset of notebook workloads, measure real-world hydration times, and then expand schedules and profiles based on observed concurrency and utilization. This phased approach reduces risk and surfaces hidden constraints such as library management and environment publishing requirements.

Looking ahead, the conversation teases ongoing roadmap work to improve Fabric Spark’s management surfaces and broaden automation options, though some current limitations remain, like the need to publish environments in the portal after library updates. In short, the episode offers a practical blend of new product capabilities, configuration patterns, and realistic tradeoffs for teams aiming to make Spark notebooks snappier in Microsoft Fabric.

Overall, the Fabric Insider episode led by Reza Rad and guest Santhosh Kumar Ravindran provides a concise, usable guide for teams balancing responsiveness, cost, and manageability in notebook-driven Spark workloads. By combining feature walkthroughs with diagnostic tips and a sensible rollout plan, the conversation helps teams decide when to use custom live pools and Resource Profiles and how to measure success as they scale. Consequently, organizations can move from trial and error to repeatable practices that improve developer productivity and platform efficiency.

Microsoft Fabric - Fabric: Live Pools, Profiles & Fixes

Keywords

Microsoft Fabric Spark, Custom Live Pools Fabric, Fabric Resource Profiles, Fabric Performance Fixes, Fabric Insider Episode, Microsoft Fabric Optimization, Spark Performance Tuning, Live Pools Best Practices