Solutions

Pipeline Editor

A visual, node-based interface for designing data pipelines. Drag and drop nodes, connect them with edges, and execute pipelines against Databricks Delta Live Tables.

Overview

The Pipeline Editor provides a no-code/low-code interface for building data pipelines. Users can visually design ETL workflows by dragging transformation nodes onto a canvas and connecting them to define data flow.

Pipelines are stored in the application database and can be executed against Databricks Delta Live Tables (DLT) for production workloads. The editor integrates with the Data Catalog for table selection and the SQL Editor for custom transformations.

Key Benefits

Visual pipeline design without writing code
Drag-and-drop node palette with common transformations
Real-time execution preview with sample data
Integration with Databricks Delta Live Tables
Pipeline sharing and collaboration
Version history and rollback (enhancement)

How It Works

The Pipeline Editor is built with React Flow for the visual canvas, Zustand for state management, and TanStack Query for persistence. Pipeline definitions are stored in PostgreSQL and can be executed via Databricks DLT APIs.

Architecture

Data Model

Pipelines are stored as JSON documents containing React Flow nodes and edges, along with metadata like name, description, and access controls.

-- Pipeline storage schema
CREATE TABLE pipelines (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name VARCHAR(255) NOT NULL,
  description TEXT,
  nodes JSONB NOT NULL,           -- React Flow nodes
  edges JSONB NOT NULL,           -- React Flow edges
  organization_id INTEGER REFERENCES organizations(id),
  created_by_id INTEGER REFERENCES users(id),
  access VARCHAR(50) DEFAULT 'private',  -- private, organization, public
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Example node structure
{
  "id": "source-1",
  "type": "source",
  "position": { "x": 100, "y": 100 },
  "data": {
    "label": "Customer Data",
    "catalog": "main",
    "schema": "sales",
    "table": "customers"
  }
}

User Experience

The Pipeline Editor features a multi-panel layout with a node palette, canvas, properties panel, and execution console.

Features

Drag-and-drop node palette with common data transformations
Visual canvas with zoom, pan, and minimap navigation
Properties panel for configuring selected nodes
Edge connections representing data flow between nodes
Execution console with live output and logs
Save, rename, and share pipelines
Parallel sampling for data preview at each stage
Undo/redo for design changes

Node Types

Source Nodes

Unity Catalog Table
Delta Sharing Table
File Upload (CSV, JSON)
External Database

Transform Nodes

Select/Project Columns
Filter Rows
Join Tables
Aggregate/Group By
Custom SQL

Sink Nodes

Delta Table
Unity Catalog Table
File Export
Webhook/API

Canvas Interactions

Mouse Controls

Drag from palette: Add new node
Click node: Select and show properties
Drag node: Reposition on canvas
Drag from handle: Create edge connection
Scroll wheel: Zoom in/out
Click + drag canvas: Pan view

Keyboard Shortcuts

Delete/Backspace: Remove selected node
Cmd+Z: Undo
Cmd+Shift+Z: Redo
Cmd+S: Save pipeline
Cmd+Enter: Run pipeline
Escape: Deselect

Backend Architecture

Pipeline definitions are persisted to PostgreSQL with Zustand managing the in-memory state. Execution is handled via Databricks Delta Live Tables APIs.

API Routes

Route	Method	Description
`/api/pipelines`	GET	List user's pipelines
`/api/pipelines`	POST	Create new pipeline
`/api/pipelines/{id}`	GET	Load pipeline definition
`/api/pipelines/{id}`	PUT	Save pipeline definition
`/api/pipelines/{id}/clone`	POST	Clone pipeline
`/api/databricks/pipelines/{id}/start`	POST	Trigger DLT execution
`/api/databricks/pipelines/{id}/stop`	POST	Stop running pipeline

State Management

The Pipeline Editor uses Zustand for client-side state management, providing a simple API for updating nodes, edges, and selection state.

// Zustand store for pipeline state
import { create } from 'zustand';
import { addEdge, applyNodeChanges, applyEdgeChanges } from 'reactflow';

interface PipelineStore {
  nodes: Node[];
  edges: Edge[];
  selectedNode: Node | null;

  // Node operations
  addNode: (node: Node) => void;
  updateNodeData: (nodeId: string, data: Partial<NodeData>) => void;
  removeNode: (nodeId: string) => void;

  // Edge operations
  connectNodes: (connection: Connection) => void;
  removeEdge: (edgeId: string) => void;

  // Selection
  selectNode: (node: Node | null) => void;

  // Persistence
  loadPipeline: (pipeline: Pipeline) => void;
  savePipeline: () => Pipeline;
}

const usePipelineStore = create<PipelineStore>((set, get) => ({
  nodes: [],
  edges: [],
  selectedNode: null,

  addNode: (node) => set((state) => ({
    nodes: [...state.nodes, node]
  })),

  updateNodeData: (nodeId, data) => set((state) => ({
    nodes: state.nodes.map(n =>
      n.id === nodeId ? { ...n, data: { ...n.data, ...data } } : n
    )
  })),

  connectNodes: (connection) => set((state) => ({
    edges: addEdge(connection, state.edges)
  })),

  selectNode: (node) => set({ selectedNode: node }),

  loadPipeline: (pipeline) => set({
    nodes: pipeline.nodes,
    edges: pipeline.edges,
  }),

  savePipeline: () => ({
    nodes: get().nodes,
    edges: get().edges,
  }),
}));

Delta Live Tables Execution

When a pipeline is executed, the visual definition is converted to DLT code and submitted to Databricks for processing.

// Convert visual pipeline to DLT code
const convertToDLT = (nodes: Node[], edges: Edge[]): string => {
  const dltCode: string[] = [];

  // Process nodes in topological order
  const sortedNodes = topologicalSort(nodes, edges);

  for (const node of sortedNodes) {
    switch (node.type) {
      case 'source':
        dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
    return spark.table("${node.data.catalog}.${node.data.schema}.${node.data.table}")
`);
        break;

      case 'filter':
        const inputNode = getInputNode(node, edges);
        dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
    return dlt.read("${inputNode.data.outputName}").filter("${node.data.condition}")
`);
        break;

      // ... more node types
    }
  }

  return dltCode.join('\n');
};

Enhancement Opportunities

The Pipeline Editor can be extended with additional features to improve productivity and enable more advanced use cases.

Version History

Track changes to pipeline definitions over time with ability to view diffs, compare versions, and rollback to previous states.

Real-time Collaboration

Enable multiple users to edit pipelines simultaneously with cursors, presence indicators, and conflict resolution.

Pipeline Templates

Provide pre-built templates for common patterns like ETL, CDC, medallion architecture, and ML feature pipelines.

Execution Scheduling

Schedule pipeline runs with cron expressions, dependencies, and monitoring with alerts for failures.

Data Quality Rules

Add data quality expectations to nodes with automatic validation and alerting when constraints are violated.

Custom Node Types

Allow developers to create custom node types with custom UIs for organization-specific transformations.