loom.etl.schema

Schema, contracts, and table-reference primitives for ETL.

loom.etl.schema.resolve_schema(contract)[source]

Convert a schema contract to a tuple[ColumnSchema, ...].

Parameters:

contract (tuple[ColumnSchema, ...] | type[Any]) – Either a tuple[ColumnSchema, ...] (returned as-is) or an annotated class whose fields map to column schemas.

Returns:

Tuple of ColumnSchema entries.

Raises:

TypeError – If contract is not a supported schema contract form.

Return type:

tuple[ColumnSchema, …]

loom.etl.schema.resolve_json_type(contract)[source]

Convert a JSON column contract to a LoomType.

Parameters:

contract (Any) – A LoomType (returned as-is), an annotated class (converted to StructType), or a list[X] generic alias (converted to ListType).

Returns:

LoomType for use with Polars str.json_decode or Spark from_json.

Raises:

TypeError – If contract cannot be converted to a LoomType.

Return type:

LoomDtype | ListType | ArrayType | StructType | DecimalType | DatetimeType | DurationType | CategoricalType | EnumType

class loom.etl.schema.ColumnSchema(name, dtype, nullable=True)[source]

Bases: object

Backend-agnostic definition of a single table column.

Used by TableDiscovery to describe table schemas and by backend writers to align frames before writing.

Parameters:

Example:

schema = (
    ColumnSchema("order_id", LoomDtype.INT64, nullable=False),
    ColumnSchema("amount",   DecimalType(precision=18, scale=4)),
    ColumnSchema("tags",     ListType(inner=LoomDtype.UTF8)),
    ColumnSchema("address",  StructType(fields=(
        StructField("street", LoomDtype.UTF8),
        StructField("zip",    LoomDtype.UTF8),
    ))),
)
class loom.etl.schema.LoomDtype(value)[source]

Bases: StrEnum

Canonical data types understood by loom’s schema system.

Values match Polars naming conventions to minimise translation friction in the Polars backend, while remaining backend-agnostic at the protocol level.

For complex / parametrised types use the dedicated structural classes (ListType, ArrayType, StructType, DecimalType, DatetimeType, DurationType, CategoricalType, EnumType). The enum members LIST, STRUCT, etc. are kept for coarse (unparameterised) schema declarations; they accept any inner type during validation.

exception loom.etl.schema.SchemaNotFoundError[source]

Bases: Exception

Raised when a write is attempted but no schema is registered for the table.

Register the schema via update_schema() before the first write, or use a backend that creates the table explicitly (for example with OVERWRITE on first write).

exception loom.etl.schema.SchemaError[source]

Bases: Exception

Raised when a frame is incompatible with the registered table schema.

class loom.etl.schema.ListType(inner)[source]

Bases: object

Homogeneous list column — List[inner].

Parameters:

inner (LoomDtype | ListType | ArrayType | StructType | DecimalType | DatetimeType | DurationType | CategoricalType | EnumType) – Element type. May itself be a StructType, ListType, or any other LoomType.

Example:

ListType(inner=LoomDtype.UTF8)
ListType(inner=StructType(fields=(StructField("x", LoomDtype.INT64),)))
class loom.etl.schema.ArrayType(inner, width)[source]

Bases: object

Fixed-width array column — Array[inner, width].

Parameters:

Example:

ArrayType(inner=LoomDtype.FLOAT64, width=3)
class loom.etl.schema.StructType(fields)[source]

Bases: object

Record / struct column with named fields.

Parameters:

fields (tuple[StructField, ...]) – Ordered tuple of StructField definitions.

Example:

StructType(fields=(
    StructField("lat", LoomDtype.FLOAT64),
    StructField("lon", LoomDtype.FLOAT64),
))
class loom.etl.schema.StructField(name, dtype, nullable=True)[source]

Bases: object

A single named field within a StructType.

Parameters:
class loom.etl.schema.DecimalType(precision=None, scale=None)[source]

Bases: object

Fixed-precision decimal column.

Parameters:
  • precision (int | None) – Total number of significant digits (None = backend default).

  • scale (int | None) – Number of digits after the decimal point (None = backend default).

Example:

DecimalType(precision=18, scale=4)
class loom.etl.schema.DatetimeType(time_unit='us', time_zone=None)[source]

Bases: object

Datetime column with explicit time unit and optional timezone.

Parameters:
  • time_unit (str) – "us" (microseconds, default), "ns", or "ms".

  • time_zone (str | None) – IANA timezone string (e.g. "UTC") or None for naive.

Example:

DatetimeType("us", "UTC")
DatetimeType("ns")          # naive, nanosecond precision
class loom.etl.schema.DurationType(time_unit='us')[source]

Bases: object

Duration (timedelta) column with explicit time unit.

Parameters:

time_unit (str) – "us" (microseconds, default), "ns", or "ms".

Example:

DurationType("ms")
class loom.etl.schema.CategoricalType[source]

Bases: object

Dictionary-encoded categorical string column.

Example:

ColumnSchema("region", CategoricalType())
class loom.etl.schema.EnumType(categories)[source]

Bases: object

Fixed-vocabulary enum column.

Parameters:

categories (tuple[str, ...]) – Ordered tuple of allowed string values.

Example:

EnumType(categories=("low", "medium", "high"))
class loom.etl.schema.TableRef(ref)[source]

Bases: object

Logical table identifier used by the ETL declarative DSL.

Parameters:

ref (str) – Dotted logical table reference (for example "raw.orders").

property ref: str

Raw dotted table reference.

property c: _ColumnNamespace

Column namespace for bound references.

qualify(default_catalog)[source]

Return a catalog-qualified reference when the ref is 2-part.

If the ref already contains 3 parts (catalog.schema.table) or default_catalog is empty, the original reference is returned unchanged.

Parameters:

default_catalog (str) – Catalog name to prepend when the ref is 2-part.

Returns:

New TableRef with catalog prefix, or self if already qualified.

Return type:

TableRef

loom.etl.schema.col(name)[source]

Return an unbound column reference for DSL predicates.

Parameters:

name (str)

Return type:

UnboundColumnRef