loom.etl.backends.spark¶
PySpark backend for the Loom ETL framework.
Public API: - SparkSourceReader: SourceReader protocol implementation - SparkTargetWriter: TargetWriter protocol implementation
- class loom.etl.backends.spark.SparkSourceReader(spark, locator=None, *, route_resolver=None, file_locator=None)[source]¶
Bases:
SourceReaderSpark source reader - reads Delta tables and files directly.
- Parameters:
spark (SparkSession)
locator (str | TableLocator | None)
route_resolver (TableRouteResolver | None)
file_locator (FileLocator | None)
- read(spec, params_instance)[source]¶
Read source spec and return DataFrame.
- Parameters:
spec (TableSourceSpec | FileSourceSpec | TempSourceSpec)
params_instance (Any)
- Return type:
pyspark.sql.DataFrame
- class loom.etl.backends.spark.SparkTargetWriter(spark, locator=None, *, route_resolver=None, missing_table_policy=MissingTablePolicy.SCHEMA_MODE, file_locator=None)[source]¶
Bases:
_WritePolicy[DataFrame,DataFrame,SparkPhysicalSchema]Spark target writer using Delta Lake.
- Parameters:
spark (SparkSession)
locator (str | TableLocator | None)
route_resolver (TableRouteResolver | None)
missing_table_policy (MissingTablePolicy)
file_locator (FileLocator | None)
- append(frame, table_ref, params_instance, *, streaming=False)[source]¶
Append frame to table (legacy API, creates table on first write).
- to_frame(records, /)[source]¶
Convert execution records into a Spark DataFrame.
- Parameters:
records (Sequence[PipelineRunRecord | ProcessRunRecord | StepRunRecord])
- Return type:
pyspark.sql.DataFrame