FireFly Analytics LogoFireFly Analytics
Solutions

Data Catalog

A hierarchical browser for Unity Catalog, allowing users to explore catalogs, schemas, tables, and columns with a modern, intuitive interface.

Overview

The Data Catalog provides a native React interface for browsing Unity Catalog metadata. It features a hierarchical tree view that loads data on-demand as users expand nodes, with client-side caching for performance.

Like the SQL Editor, the Data Catalog is implemented as a native component rather than an iframe, enabling tight integration with other features like SQL autocomplete and pipeline node configuration. All API calls use the SSO-SPN authentication pattern.

Key Benefits

  • Hierarchical navigation (Catalogs → Schemas → Tables → Columns)
  • Lazy loading for fast initial render
  • Client-side caching to minimize API calls
  • Integration with SQL Editor for autocomplete
  • Support for BYOD (Bring Your Own Data) Delta Sharing catalogs

How It Works

The Data Catalog fetches Unity Catalog metadata through Next.js API routes. Each level of the hierarchy has a dedicated endpoint, and metadata is loaded on-demand when users expand tree nodes.

Architecture

Lazy Loading Strategy

The tree view uses a lazy loading strategy to ensure fast initial render and efficient API usage:

  1. Initial load: Only top-level catalogs are fetched
  2. Expand catalog: Schemas for that catalog are fetched
  3. Expand schema: Tables in that schema are fetched
  4. Select table: Column details are fetched and displayed
  5. Cache hit: If data is already cached, no API call is made

User Experience

The Data Catalog features a two-panel interface with tree navigation on the left and detailed metadata display on the right.

Features

  • Hierarchical tree view (Catalogs → Schemas → Tables → Columns)
  • Expand/collapse nodes with click or keyboard navigation
  • Detailed metadata panel showing column names, types, and descriptions
  • Two view modes: compact (editor sidebar) and full (dedicated page)
  • Client-side caching to prevent redundant API calls
  • Search/filter within each level of the hierarchy
  • Quick copy of fully-qualified table names

Metadata Display

When a table is selected, the metadata panel shows detailed information:

Table Information

  • Name: Full three-level name (catalog.schema.table)
  • Type: MANAGED, EXTERNAL, or VIEW
  • Format: DELTA, PARQUET, CSV, etc.
  • Location: Storage path (for external tables)
  • Owner: User or group that owns the table
  • Comment: Table description if provided

Column Details

  • Name: Column identifier
  • Type: Data type (STRING, INT, TIMESTAMP, etc.)
  • Nullable: Whether NULL values are allowed
  • Comment: Column description if provided
  • Partition: Whether column is a partition key

Backend Architecture

The Data Catalog uses Unity Catalog REST APIs through Next.js API routes. Each level of the hierarchy has a dedicated endpoint.

API Routes

RouteDescriptionDatabricks API
/api/databricks/unity-catalog/catalogsList all accessible catalogsGET /api/2.1/unity-catalog/catalogs
/api/databricks/unity-catalog/schemasList schemas in a catalogGET /api/2.1/unity-catalog/schemas
/api/databricks/unity-catalog/tablesList tables in a schemaGET /api/2.1/unity-catalog/tables
/api/databricks/unity-catalog/table-detailsGet column details for a tableGET /api/2.1/unity-catalog/tables/{name}

Client-Side Caching

The Data Catalog maintains a client-side cache to avoid redundant API calls. This cache is shared across components, enabling the SQL Editor to use cached metadata for autocomplete.

// Catalog metadata cache structure
interface CatalogMetadataCache {
  [catalogName: string]: {
    schemas?: Schema[];
    [schemaName: string]: {
      tables?: Table[];
      [tableName: string]: {
        columns?: Column[];
      };
    };
  };
}

// Example: Load schemas with caching
const handleExpandCatalog = async (catalogName: string) => {
  // Check cache first
  if (catalogCache[catalogName]?.schemas) {
    return; // Already loaded
  }

  // Fetch from API
  const response = await fetch(
    `/api/databricks/unity-catalog/schemas?catalog_name=${catalogName}`
  );
  const { schemas } = await response.json();

  // Update cache
  setCatalogCache(prev => ({
    ...prev,
    [catalogName]: { ...prev[catalogName], schemas }
  }));
};

BYOD Integration

For organizations using Bring Your Own Data (BYOD), the Data Catalog supports Delta Sharing catalogs. These are external catalogs shared from other Databricks workspaces or providers.

BYOD Catalog Flow

  1. Organization admin configures Delta Sharing provider in settings
  2. System validates provider credentials and discovers available shares
  3. Admin selects which catalogs to mount for their organization
  4. Users see shared catalogs in the Data Catalog alongside regular catalogs
  5. Queries against shared catalogs use Delta Sharing protocol

Catalog Validation

// POST /api/sso-spn/byod/databricks/catalogs
// Validates that Delta Sharing catalogs are properly configured

const validateCatalogs = async (orgId: string) => {
  // Uses global admin SPN to validate Delta Sharing providers
  const adminToken = await getGlobalAdminToken();

  // List providers and their shares
  const providers = await listDeltaSharingProviders(adminToken);

  // Validate each configured catalog still exists
  const validCatalogs = await Promise.all(
    catalogs.map(async (catalog) => {
      const exists = providers.some(
        p => p.sharingCode === catalog.providerCode &&
             p.shares.includes(catalog.shareName)
      );
      return { ...catalog, valid: exists };
    })
  );

  // Update cache with validation status
  await updateCatalogCache(orgId, validCatalogs);

  return validCatalogs;
};

Enhancement Opportunities

The Data Catalog can be extended with additional features to improve data discovery and governance.

Full-Text Search

Add search across all catalog objects (catalogs, schemas, tables, columns) to quickly find data assets by name or description.

Data Lineage

Visualize table dependencies and data flow using Unity Catalog lineage APIs to understand how data moves through pipelines.

Data Preview

Sample rows from selected tables directly in the catalog interface for quick data exploration without writing queries.

Permissions View

Display user/group permissions on catalog objects to help users understand their access and request additional permissions.

Data Quality

Show data quality metrics, freshness indicators, and validation status for tables to help users trust their data.

Favorites & Tags

Allow users to bookmark frequently-used tables and add custom tags for organization-specific categorization.