loom.etl.backends.spark

PySpark backend for the Loom ETL framework.

Public API: - SparkSourceReader: SourceReader protocol implementation - SparkTargetWriter: TargetWriter protocol implementation

class loom.etl.backends.spark.SparkSourceReader(spark, locator=None, *, route_resolver=None, file_locator=None)[source]

Bases: SourceReader

Spark source reader - reads Delta tables and files directly.

Parameters:
read(spec, params_instance)[source]

Read source spec and return DataFrame.

Parameters:
Return type:

pyspark.sql.DataFrame

execute_sql(frames, query)[source]

Execute SQL query against backend frames.

Parameters:
Return type:

pyspark.sql.DataFrame

class loom.etl.backends.spark.SparkTargetWriter(spark, locator=None, *, route_resolver=None, missing_table_policy=MissingTablePolicy.SCHEMA_MODE, file_locator=None)[source]

Bases: _WritePolicy[DataFrame, DataFrame, SparkPhysicalSchema]

Spark target writer using Delta Lake.

Parameters:
append(frame, table_ref, params_instance, *, streaming=False)[source]

Append frame to table (legacy API, creates table on first write).

Parameters:
  • frame (pyspark.sql.DataFrame)

  • table_ref (Any)

  • params_instance (Any)

  • streaming (bool)

Return type:

None

to_frame(records, /)[source]

Convert execution records into a Spark DataFrame.

Parameters:

records (Sequence[PipelineRunRecord | ProcessRunRecord | StepRunRecord])

Return type:

pyspark.sql.DataFrame