Data Catalog
A hierarchical browser for Unity Catalog, allowing users to explore catalogs, schemas, tables, and columns with a modern, intuitive interface.
Overview
The Data Catalog provides a native React interface for browsing Unity Catalog metadata. It features a hierarchical tree view that loads data on-demand as users expand nodes, with client-side caching for performance.
Like the SQL Editor, the Data Catalog is implemented as a native component rather than an iframe, enabling tight integration with other features like SQL autocomplete and pipeline node configuration. All API calls use the SSO-SPN authentication pattern.
Key Benefits
- Hierarchical navigation (Catalogs → Schemas → Tables → Columns)
- Lazy loading for fast initial render
- Client-side caching to minimize API calls
- Integration with SQL Editor for autocomplete
- Support for BYOD (Bring Your Own Data) Delta Sharing catalogs
How It Works
The Data Catalog fetches Unity Catalog metadata through Next.js API routes. Each level of the hierarchy has a dedicated endpoint, and metadata is loaded on-demand when users expand tree nodes.
Architecture
Lazy Loading Strategy
The tree view uses a lazy loading strategy to ensure fast initial render and efficient API usage:
- Initial load: Only top-level catalogs are fetched
- Expand catalog: Schemas for that catalog are fetched
- Expand schema: Tables in that schema are fetched
- Select table: Column details are fetched and displayed
- Cache hit: If data is already cached, no API call is made
User Experience
The Data Catalog features a two-panel interface with tree navigation on the left and detailed metadata display on the right.
Features
- Hierarchical tree view (Catalogs → Schemas → Tables → Columns)
- Expand/collapse nodes with click or keyboard navigation
- Detailed metadata panel showing column names, types, and descriptions
- Two view modes: compact (editor sidebar) and full (dedicated page)
- Client-side caching to prevent redundant API calls
- Search/filter within each level of the hierarchy
- Quick copy of fully-qualified table names
Metadata Display
When a table is selected, the metadata panel shows detailed information:
Table Information
- Name: Full three-level name (catalog.schema.table)
- Type: MANAGED, EXTERNAL, or VIEW
- Format: DELTA, PARQUET, CSV, etc.
- Location: Storage path (for external tables)
- Owner: User or group that owns the table
- Comment: Table description if provided
Column Details
- Name: Column identifier
- Type: Data type (STRING, INT, TIMESTAMP, etc.)
- Nullable: Whether NULL values are allowed
- Comment: Column description if provided
- Partition: Whether column is a partition key
Backend Architecture
The Data Catalog uses Unity Catalog REST APIs through Next.js API routes. Each level of the hierarchy has a dedicated endpoint.
API Routes
| Route | Description | Databricks API |
|---|---|---|
/api/databricks/unity-catalog/catalogs | List all accessible catalogs | GET /api/2.1/unity-catalog/catalogs |
/api/databricks/unity-catalog/schemas | List schemas in a catalog | GET /api/2.1/unity-catalog/schemas |
/api/databricks/unity-catalog/tables | List tables in a schema | GET /api/2.1/unity-catalog/tables |
/api/databricks/unity-catalog/table-details | Get column details for a table | GET /api/2.1/unity-catalog/tables/{name} |
Client-Side Caching
The Data Catalog maintains a client-side cache to avoid redundant API calls. This cache is shared across components, enabling the SQL Editor to use cached metadata for autocomplete.
// Catalog metadata cache structure
interface CatalogMetadataCache {
[catalogName: string]: {
schemas?: Schema[];
[schemaName: string]: {
tables?: Table[];
[tableName: string]: {
columns?: Column[];
};
};
};
}
// Example: Load schemas with caching
const handleExpandCatalog = async (catalogName: string) => {
// Check cache first
if (catalogCache[catalogName]?.schemas) {
return; // Already loaded
}
// Fetch from API
const response = await fetch(
`/api/databricks/unity-catalog/schemas?catalog_name=${catalogName}`
);
const { schemas } = await response.json();
// Update cache
setCatalogCache(prev => ({
...prev,
[catalogName]: { ...prev[catalogName], schemas }
}));
};BYOD Integration
For organizations using Bring Your Own Data (BYOD), the Data Catalog supports Delta Sharing catalogs. These are external catalogs shared from other Databricks workspaces or providers.
BYOD Catalog Flow
- Organization admin configures Delta Sharing provider in settings
- System validates provider credentials and discovers available shares
- Admin selects which catalogs to mount for their organization
- Users see shared catalogs in the Data Catalog alongside regular catalogs
- Queries against shared catalogs use Delta Sharing protocol
Catalog Validation
// POST /api/sso-spn/byod/databricks/catalogs
// Validates that Delta Sharing catalogs are properly configured
const validateCatalogs = async (orgId: string) => {
// Uses global admin SPN to validate Delta Sharing providers
const adminToken = await getGlobalAdminToken();
// List providers and their shares
const providers = await listDeltaSharingProviders(adminToken);
// Validate each configured catalog still exists
const validCatalogs = await Promise.all(
catalogs.map(async (catalog) => {
const exists = providers.some(
p => p.sharingCode === catalog.providerCode &&
p.shares.includes(catalog.shareName)
);
return { ...catalog, valid: exists };
})
);
// Update cache with validation status
await updateCatalogCache(orgId, validCatalogs);
return validCatalogs;
};Enhancement Opportunities
The Data Catalog can be extended with additional features to improve data discovery and governance.
Full-Text Search
Add search across all catalog objects (catalogs, schemas, tables, columns) to quickly find data assets by name or description.
Data Lineage
Visualize table dependencies and data flow using Unity Catalog lineage APIs to understand how data moves through pipelines.
Data Preview
Sample rows from selected tables directly in the catalog interface for quick data exploration without writing queries.
Permissions View
Display user/group permissions on catalog objects to help users understand their access and request additional permissions.
Data Quality
Show data quality metrics, freshness indicators, and validation status for tables to help users trust their data.
Favorites & Tags
Allow users to bookmark frequently-used tables and add custom tags for organization-specific categorization.