Pipeline Editor
A visual, node-based interface for designing data pipelines. Drag and drop nodes, connect them with edges, and execute pipelines against Databricks Delta Live Tables.
Overview
The Pipeline Editor provides a no-code/low-code interface for building data pipelines. Users can visually design ETL workflows by dragging transformation nodes onto a canvas and connecting them to define data flow.
Pipelines are stored in the application database and can be executed against Databricks Delta Live Tables (DLT) for production workloads. The editor integrates with the Data Catalog for table selection and the SQL Editor for custom transformations.
Key Benefits
- Visual pipeline design without writing code
- Drag-and-drop node palette with common transformations
- Real-time execution preview with sample data
- Integration with Databricks Delta Live Tables
- Pipeline sharing and collaboration
- Version history and rollback (enhancement)
How It Works
The Pipeline Editor is built with React Flow for the visual canvas, Zustand for state management, and TanStack Query for persistence. Pipeline definitions are stored in PostgreSQL and can be executed via Databricks DLT APIs.
Architecture
Data Model
Pipelines are stored as JSON documents containing React Flow nodes and edges, along with metadata like name, description, and access controls.
-- Pipeline storage schema
CREATE TABLE pipelines (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
nodes JSONB NOT NULL, -- React Flow nodes
edges JSONB NOT NULL, -- React Flow edges
organization_id INTEGER REFERENCES organizations(id),
created_by_id INTEGER REFERENCES users(id),
access VARCHAR(50) DEFAULT 'private', -- private, organization, public
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Example node structure
{
"id": "source-1",
"type": "source",
"position": { "x": 100, "y": 100 },
"data": {
"label": "Customer Data",
"catalog": "main",
"schema": "sales",
"table": "customers"
}
}User Experience
The Pipeline Editor features a multi-panel layout with a node palette, canvas, properties panel, and execution console.
Features
- Drag-and-drop node palette with common data transformations
- Visual canvas with zoom, pan, and minimap navigation
- Properties panel for configuring selected nodes
- Edge connections representing data flow between nodes
- Execution console with live output and logs
- Save, rename, and share pipelines
- Parallel sampling for data preview at each stage
- Undo/redo for design changes
Node Types
Source Nodes
- Unity Catalog Table
- Delta Sharing Table
- File Upload (CSV, JSON)
- External Database
Transform Nodes
- Select/Project Columns
- Filter Rows
- Join Tables
- Aggregate/Group By
- Custom SQL
Sink Nodes
- Delta Table
- Unity Catalog Table
- File Export
- Webhook/API
Canvas Interactions
Mouse Controls
- Drag from palette: Add new node
- Click node: Select and show properties
- Drag node: Reposition on canvas
- Drag from handle: Create edge connection
- Scroll wheel: Zoom in/out
- Click + drag canvas: Pan view
Keyboard Shortcuts
- Delete/Backspace: Remove selected node
- Cmd+Z: Undo
- Cmd+Shift+Z: Redo
- Cmd+S: Save pipeline
- Cmd+Enter: Run pipeline
- Escape: Deselect
Backend Architecture
Pipeline definitions are persisted to PostgreSQL with Zustand managing the in-memory state. Execution is handled via Databricks Delta Live Tables APIs.
API Routes
| Route | Method | Description |
|---|---|---|
/api/pipelines | GET | List user's pipelines |
/api/pipelines | POST | Create new pipeline |
/api/pipelines/{id} | GET | Load pipeline definition |
/api/pipelines/{id} | PUT | Save pipeline definition |
/api/pipelines/{id}/clone | POST | Clone pipeline |
/api/databricks/pipelines/{id}/start | POST | Trigger DLT execution |
/api/databricks/pipelines/{id}/stop | POST | Stop running pipeline |
State Management
The Pipeline Editor uses Zustand for client-side state management, providing a simple API for updating nodes, edges, and selection state.
// Zustand store for pipeline state
import { create } from 'zustand';
import { addEdge, applyNodeChanges, applyEdgeChanges } from 'reactflow';
interface PipelineStore {
nodes: Node[];
edges: Edge[];
selectedNode: Node | null;
// Node operations
addNode: (node: Node) => void;
updateNodeData: (nodeId: string, data: Partial<NodeData>) => void;
removeNode: (nodeId: string) => void;
// Edge operations
connectNodes: (connection: Connection) => void;
removeEdge: (edgeId: string) => void;
// Selection
selectNode: (node: Node | null) => void;
// Persistence
loadPipeline: (pipeline: Pipeline) => void;
savePipeline: () => Pipeline;
}
const usePipelineStore = create<PipelineStore>((set, get) => ({
nodes: [],
edges: [],
selectedNode: null,
addNode: (node) => set((state) => ({
nodes: [...state.nodes, node]
})),
updateNodeData: (nodeId, data) => set((state) => ({
nodes: state.nodes.map(n =>
n.id === nodeId ? { ...n, data: { ...n.data, ...data } } : n
)
})),
connectNodes: (connection) => set((state) => ({
edges: addEdge(connection, state.edges)
})),
selectNode: (node) => set({ selectedNode: node }),
loadPipeline: (pipeline) => set({
nodes: pipeline.nodes,
edges: pipeline.edges,
}),
savePipeline: () => ({
nodes: get().nodes,
edges: get().edges,
}),
}));Delta Live Tables Execution
When a pipeline is executed, the visual definition is converted to DLT code and submitted to Databricks for processing.
// Convert visual pipeline to DLT code
const convertToDLT = (nodes: Node[], edges: Edge[]): string => {
const dltCode: string[] = [];
// Process nodes in topological order
const sortedNodes = topologicalSort(nodes, edges);
for (const node of sortedNodes) {
switch (node.type) {
case 'source':
dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
return spark.table("${node.data.catalog}.${node.data.schema}.${node.data.table}")
`);
break;
case 'filter':
const inputNode = getInputNode(node, edges);
dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
return dlt.read("${inputNode.data.outputName}").filter("${node.data.condition}")
`);
break;
// ... more node types
}
}
return dltCode.join('\n');
};Enhancement Opportunities
The Pipeline Editor can be extended with additional features to improve productivity and enable more advanced use cases.
Version History
Track changes to pipeline definitions over time with ability to view diffs, compare versions, and rollback to previous states.
Real-time Collaboration
Enable multiple users to edit pipelines simultaneously with cursors, presence indicators, and conflict resolution.
Pipeline Templates
Provide pre-built templates for common patterns like ETL, CDC, medallion architecture, and ML feature pipelines.
Execution Scheduling
Schedule pipeline runs with cron expressions, dependencies, and monitoring with alerts for failures.
Data Quality Rules
Add data quality expectations to nodes with automatic validation and alerting when constraints are violated.
Custom Node Types
Allow developers to create custom node types with custom UIs for organization-specific transformations.