SP1 - Details for Query 1131

Submitted Time: 2026/02/24 11:26:35
Duration: 0.6 s
Succeeded Jobs: 1570 1571

Show the Stage ID and Task ID that corresponds to the max metric

digraph G { 0 [labelType="html" label=" AdaptiveSparkPlan "]; subgraph cluster1 { isCluster="true"; label="WholeStageCodegen (2)\n \nduration: 0 ms"; 2 [labelType="html" label="HashAggregate time in aggregation build: 0 ms number of output rows: 1"]; } 3 [labelType="html" label="Exchange shuffle records written: 7 local merged chunks fetched: 0 shuffle write time total (min, med, max (stageId: taskId)) 2 ms (0 ms, 0 ms, 0 ms (stage 2642.0: task 2876)) remote merged bytes read: 0.0 B local merged blocks fetched: 0 corrupt merged block chunks: 0 remote merged reqs duration: 0 ms remote merged blocks fetched: 0 records read: 7 local bytes read: 413.0 B fetch wait time: 0 ms remote bytes read: 0.0 B merged fetch fallback count: 0 local blocks read: 7 remote merged chunks fetched: 0 remote blocks read: 0 data size total (min, med, max (stageId: taskId)) 112.0 B (16.0 B, 16.0 B, 16.0 B (stage 2642.0: task 2876)) local merged bytes read: 0.0 B number of partitions: 1 remote reqs duration: 0 ms remote bytes read to disk: 0.0 B shuffle bytes written total (min, med, max (stageId: taskId)) 413.0 B (59.0 B, 59.0 B, 59.0 B (stage 2642.0: task 2876))"]; subgraph cluster4 { isCluster="true"; label="WholeStageCodegen (1)\n \nduration: total (min, med, max (stageId: taskId))\n2.4 s (282 ms, 292 ms, 477 ms (stage 2642.0: task 2877))"; 5 [labelType="html" label="HashAggregate time in aggregation build total (min, med, max (stageId: taskId)) 2.4 s (282 ms, 292 ms, 477 ms (stage 2642.0: task 2877)) number of output rows: 7"]; 6 [labelType="html" label=" Project "]; 7 [labelType="html" label="Generate number of output rows: 1,393"]; 8 [labelType="html" label=" Project "]; 9 [labelType="html" label="Filter number of output rows: 7"]; } 10 [labelType="html" label="Scan binaryFile number of output rows: 7 number of files read: 7 metadata time: 0 ms size of files read: 992.7 KiB"]; 2->0; 3->2; 5->3; 6->5; 7->6; 8->7; 9->8; 10->9; }

AdaptiveSparkPlan isFinalPlan=true

HashAggregate(keys=[], functions=[count(1)])

WholeStageCodegen (2)

Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=51947]

HashAggregate(keys=[], functions=[partial_count(1)])

Project

Generate explode(COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509), false, [COL_4594B752_72F8_457F_B37D_B0AB041DF9AC#365541]

Project [str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n) AS COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509]

Filter ((size(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n), true) > 0) AND isnotnull(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n)))

WholeStageCodegen (1)

FileScan binaryFile [content#364727] Batched: false, DataFilters: [(size(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n), true) > 0), isnotnull(s..., Format: org.apache.spark.sql.execution.datasources.binaryfile.BinaryFileFormat@51fd5c1e, Location: InMemoryFileIndex(7 paths)[file:/data/input/depot/binary/execution/A225B276_202D_4198_B6C6_5BF504..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<content:binary>

Details

== Physical Plan ==
AdaptiveSparkPlan (17)
+- == Final Plan ==
   * HashAggregate (9)
   +- ShuffleQueryStage (8), Statistics(sizeInBytes=112.0 B, rowCount=7)
      +- Exchange (7)
         +- * HashAggregate (6)
            +- * Project (5)
               +- * Generate (4)
                  +- * Project (3)
                     +- * Filter (2)
                        +- Scan binaryFile  (1)
+- == Initial Plan ==
   HashAggregate (16)
   +- Exchange (15)
      +- HashAggregate (14)
         +- Project (13)
            +- Generate (12)
               +- Project (11)
                  +- Filter (10)
                     +- Scan binaryFile  (1)


(1) Scan binaryFile 
Output [1]: [content#364727]
Batched: false
Location: InMemoryFileIndex [file:/data/input/depot/binary/execution/A225B276_202D_4198_B6C6_5BF504CB2545/current/1000388971_FR240100933.PDF, ... 6 entries]
ReadSchema: struct<content:binary>

(2) Filter [codegen id : 1]
Input [1]: [content#364727]
Condition : ((size(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n), true) > 0) AND isnotnull(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n)))

(3) Project [codegen id : 1]
Output [1]: [str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n) AS COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509]
Input [1]: [content#364727]

(4) Generate [codegen id : 1]
Input [1]: [COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509]
Arguments: explode(COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509), false, [COL_4594B752_72F8_457F_B37D_B0AB041DF9AC#365541]

(5) Project [codegen id : 1]
Output: []
Input [1]: [COL_4594B752_72F8_457F_B37D_B0AB041DF9AC#365541]

(6) HashAggregate [codegen id : 1]
Input: []
Keys: []
Functions [1]: [partial_count(1)]
Aggregate Attributes [1]: [count#365696L]
Results [1]: [count#365697L]

(7) Exchange
Input [1]: [count#365697L]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [plan_id=51947]

(8) ShuffleQueryStage
Output [1]: [count#365697L]
Arguments: 0

(9) HashAggregate [codegen id : 2]
Input [1]: [count#365697L]
Keys: []
Functions [1]: [count(1)]
Aggregate Attributes [1]: [count(1)#365693L]
Results [1]: [count(1)#365693L AS count#365694L]

(10) Filter
Input [1]: [content#364727]
Condition : ((size(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n), true) > 0) AND isnotnull(str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n)))

(11) Project
Output [1]: [str_split_from_regex(bin_content_str(pdf, content#364727), \r?\n) AS COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509]
Input [1]: [content#364727]

(12) Generate
Input [1]: [COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509]
Arguments: explode(COL_469EFA20_191C_4DC7_9CD8_0A53C7DD5971#365509), false, [COL_4594B752_72F8_457F_B37D_B0AB041DF9AC#365541]

(13) Project
Output: []
Input [1]: [COL_4594B752_72F8_457F_B37D_B0AB041DF9AC#365541]

(14) HashAggregate
Input: []
Keys: []
Functions [1]: [partial_count(1)]
Aggregate Attributes [1]: [count#365696L]
Results [1]: [count#365697L]

(15) Exchange
Input [1]: [count#365697L]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [plan_id=51927]

(16) HashAggregate
Input [1]: [count#365697L]
Keys: []
Functions [1]: [count(1)]
Aggregate Attributes [1]: [count(1)#365693L]
Results [1]: [count(1)#365693L AS count#365694L]

(17) AdaptiveSparkPlan
Output [1]: [count#365694L]
Arguments: isFinalPlan=true

SQL / DataFrame Properties

Name	Value
spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources	parquet,orc,geoparquet