FireFly Analytics LogoFireFly Analytics
Architecture / Security

Security Architecture

A comprehensive guide to FireFly Analytics security architecture, covering authentication, encryption, multi-tenant isolation, access control, and audit trails.

Overview

Security is a foundational principle of FireFly Analytics. The platform implements defense-in-depth with multiple security layers protecting user data, credentials, and system integrity. The SSO-SPN architecture inherently provides security benefits by separating user identity from Databricks access.

This document covers the key security mechanisms, from authentication and encryption to multi-tenant isolation and comprehensive audit trails.

Security Highlights

  • OAuth 2.0 + PKCE: Industry-standard authentication with proof key for code exchange
  • AES-256-GCM: Military-grade encryption for all sensitive data at rest
  • TLS 1.3: All data encrypted in transit with modern cryptographic protocols
  • Multi-tenant isolation: Complete data separation between organizations
  • Unity Catalog permissions: Fine-grained access control at the data layer
  • Comprehensive audit trails: Full traceability from user action to data access

Security Layers Overview

FireFly implements four distinct security layers that work together to protect the platform and its data:

Authentication Layer

SSO/OIDC integration, session management, and OAuth PKCE ensure only verified users can access the platform.

Encryption Layer

TLS 1.3 for data in transit, AES-256-GCM for data at rest, and secure key management protect sensitive information.

Access Control Layer

Role-based access control, organization isolation, and Unity Catalog permissions enforce least-privilege access.

Audit Layer

Comprehensive logging at application and Databricks levels provides full traceability for compliance.

Authentication Security

FireFly uses a two-layer authentication model that separates user identity from Databricks access. Users authenticate via your organization's identity provider (Okta, Azure AD, Auth0), while Databricks access uses organization-specific Service Principals.

Authentication Flow

The following diagram shows the complete authentication flow, from initial SSO login through session creation and API access:

OAuth 2.0 with PKCE

PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks, which is critical for web applications:

1. Code Verifier Generation

A cryptographically random 43-128 character string is generated client-side and stored securely (never sent over the network).

2. Code Challenge Creation

The code challenge is a SHA-256 hash of the verifier, base64url encoded. This is sent to the authorization server.

3. Token Exchange

When exchanging the authorization code for tokens, the original verifier is sent. The server hashes it and compares to the stored challenge.

4. Attack Prevention

Even if an attacker intercepts the authorization code, they cannot exchange it without the original code verifier.

Session Security

Sessions are managed with multiple security controls:

HttpOnly

Session cookies cannot be accessed by JavaScript, preventing XSS attacks

Secure

Cookies only transmitted over HTTPS, preventing man-in-the-middle attacks

SameSite=Lax

Prevents CSRF attacks while allowing top-level navigation

32-byte Token

Cryptographically random session identifier with 256 bits of entropy

30-day Expiry

Sessions automatically expire after 30 days of inactivity

Session Metadata

Each session stores security metadata for anomaly detection:

  • IP Address: Client IP for geographic validation
  • User Agent: Browser/device fingerprinting
  • Created At: Session creation timestamp
  • Last Activity: Most recent request timestamp

Data Encryption

All sensitive data is encrypted both in transit and at rest using industry-standard cryptographic algorithms.

Encryption in Transit

All network communication uses TLS 1.3, the most modern and secure transport layer protocol:

TLS 1.3 Features

  • Reduced handshake latency (1-RTT)
  • Forward secrecy by default
  • Removed legacy cipher suites
  • Encrypted handshake messages

Protected Channels

  • Browser to Next.js (HTTPS)
  • Next.js to PostgreSQL (TLS)
  • Next.js to Databricks (HTTPS)
  • Go Proxy to Databricks (HTTPS)

Encryption at Rest

All sensitive data stored in PostgreSQL is encrypted using AES-256-GCM before storage:

AES-256-GCM Details

  • Algorithm: AES (Advanced Encryption Standard)
  • Key Size: 256 bits (32 bytes)
  • Mode: GCM (Galois/Counter Mode)
  • IV Size: 96 bits (12 bytes, unique per encryption)
  • Auth Tag: 128 bits (16 bytes)
  • Property: Authenticated encryption (confidentiality + integrity)

What Gets Encrypted

!

Service Principal Credentials

Client ID and Client Secret for Databricks SPNs

!

OAuth Access Tokens

Bearer tokens for Databricks API access

!

OAuth Refresh Tokens

Long-lived tokens for obtaining new access tokens

Key Management

  • Encryption keys stored in environment variables, never in code
  • Keys should be rotated periodically (recommended: quarterly)
  • Production keys different from development keys
  • Key access logged for security auditing

Multi-Tenant Isolation

FireFly is designed as a multi-tenant platform where multiple organizations share the same infrastructure while maintaining complete data isolation. This is achieved through a combination of application-level controls and Databricks platform features.

Isolation Architecture

Isolation Layers

1. Session Isolation

Each user session is bound to a specific organization. The session context includes the organization ID, and all database queries are filtered by this context. Users cannot access data outside their active organization.

2. Database Isolation

All organization data is stored with an organization ID foreign key. Database queries automatically filter by the session's organization context, preventing cross-organization data leakage.

3. Service Principal Isolation

Each organization has its own Databricks Service Principal with specific Unity Catalog permissions. Organizations cannot access data outside their assigned catalogs.

4. Unity Catalog Isolation

At the Databricks level, Unity Catalog enforces data access based on SPN permissions. Even if application-level controls fail, Databricks prevents unauthorized access.

Defense in Depth

Multiple isolation layers ensure that a failure in one layer is caught by another:

  • Application bug: Unity Catalog still enforces SPN permissions
  • Stolen session: Session bound to specific organization
  • Compromised SPN: Limited to assigned catalogs only
  • Network breach: All data encrypted at rest

Access Control

Access control in FireFly operates at two levels: application-level role-based access control (RBAC) and data-level Unity Catalog permissions.

Application RBAC

FireFly supports three roles within each organization:

Owner

Full administrative control

  • Manage organization settings
  • Add/remove members
  • Configure SPN credentials
  • Delete organization

Admin

Administrative access

  • Manage members
  • View audit logs
  • Configure settings
  • Cannot delete org

Member

Standard user access

  • Browse catalogs
  • Execute queries
  • Use applications
  • No admin functions

Unity Catalog Permissions

At the data layer, Unity Catalog provides fine-grained access control:

Permission Hierarchy

Permissions flow down the Unity Catalog hierarchy:

CatalogSchemaTable/Volume

Common Permissions

  • USE_CATALOG: Access catalog metadata
  • USE_SCHEMA: Access schema metadata
  • SELECT: Read table data
  • MODIFY: Insert/update/delete data
  • READ_VOLUME: Read files from volume
  • WRITE_VOLUME: Write files to volume

Global Admin SPN Architecture

FireFly uses a two-tier Service Principal architecture to separate administrative operations from user data access. This design follows the principle of least privilege by ensuring users never have direct access to elevated administrative credentials.

Credential Separation

Critical Security Design: The Global Admin SPN credentials are stored in environment variables and are never exposed to users or stored per-organization. Users authenticate with their own limited-scope SPNs for data access.

Two-Tier SPN Architecture

Global Admin SPN Role

The Global Admin SPN (FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID and FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET) is used exclusively for administrative operations that require elevated privileges:

Unity Catalog Administration

Creating and managing Delta Sharing catalogs on behalf of organizations:

  • Create Catalogs: Mount Delta Sharing catalogs from providers
  • Delete Catalogs: Unmount catalogs when no longer needed
  • Grant Permissions: Assign catalog permissions to user SPNs
  • List Providers/Shares: Discover available Delta Sharing resources
  • Validate Catalogs: Verify catalog configurations exist and match

SCIM & Group Management

Managing workspace groups and membership verification:

  • Verify Group Existence: Check if organization groups exist in workspace
  • Check Group Membership: Validate user SPNs are in correct groups
  • Storage Settings Verification: Validate organization storage configurations

Schema & Volume Operations

Managing uploads schemas and user volumes:

  • Check Schema Existence: Verify uploads schema exists in catalogs
  • List Volumes: Enumerate volumes in schemas
  • Query Permissions: Check catalog permissions for groups
Global Admin Token Generation
// From: src/app/api/sso-spn/byod/databricks/catalogs/route.ts

async function getGlobalAdminToken(workspaceUrl: string) {
  // Credentials from environment variables (never stored in database)
  const clientId = process.env.FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID;
  const clientSecret = process.env.FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET;

  if (!clientId || !clientSecret) {
    return { success: false, error: "Global admin SPN credentials not configured" };
  }

  // OAuth 2.0 client credentials flow
  const tokenUrl = `${workspaceUrl}/oidc/v1/token`;
  const basicAuth = Buffer.from(`${clientId}:${clientSecret}`).toString("base64");

  const response = await fetch(tokenUrl, {
    method: "POST",
    headers: {
      "Content-Type": "application/x-www-form-urlencoded",
      Authorization: `Basic ${basicAuth}`,
    },
    body: new URLSearchParams({
      grant_type: "client_credentials",
      scope: "all-apis",  // Admin scope for all API access
    }),
  });

  const data = await response.json();
  return { success: true, accessToken: data.access_token };
}

User SPN Role

Each user has their own Service Principal with limited permissions. User SPNs are stored per-user in the userSpns table and are only used for data access operations:

User SPN Capabilities

  • Execute SQL queries via Serverless SQL
  • Browse assigned catalogs and schemas
  • Read data from tables with SELECT permission
  • Read files from volumes with READ_VOLUME permission
  • Write to tables/volumes if MODIFY/WRITE_VOLUME granted

User SPN Restrictions

  • Cannot create or delete catalogs
  • Cannot modify catalog permissions
  • Cannot access other organizations' data
  • Cannot perform SCIM operations
  • Limited to assigned catalogs only
Granting Permissions to User SPN
// From: src/app/api/sso-spn/byod/databricks/catalogs/mount/route.ts

// After creating a catalog with Global Admin SPN:
// Grant permissions to the user's service principal

async function mountCatalog(orgId: string, userEmail: string, catalogName: string) {
  // 1. Get global admin token (elevated privileges)
  const adminToken = await getGlobalAdminToken(workspaceUrl);

  // 2. Create the Delta Sharing catalog
  await createDeltaSharingCatalog(workspaceUrl, adminToken, catalogName, provider, share);

  // 3. Get the user's SPN from database
  const userSpn = await db.query.userSpns.findFirst({
    where: eq(userSpns.email, userEmail),
  });

  // 4. Grant limited permissions to user's SPN (not admin permissions)
  await updateCatalogPermissions(workspaceUrl, adminToken, catalogName, [
    {
      principal: userSpn.clientId,
      add: ["BROWSE", "EXECUTE", "READ_VOLUME", "SELECT", "USE_CATALOG", "USE_SCHEMA"],
      // Note: No MODIFY, CREATE_*, or administrative permissions
    },
  ]);
}

Security Benefits

Least Privilege

User SPNs only have permissions required for data access. They cannot perform administrative operations even if compromised.

Credential Isolation

Global Admin credentials are in environment variables, not the database. A database breach doesn't expose admin credentials.

Audit Trail Separation

Administrative operations are clearly identifiable in audit logs by the Global Admin SPN identity vs. user SPN identities.

Blast Radius Limitation

A compromised user SPN can only access that user's assigned data. It cannot escalate to administrative access.

Environment Variables

The Global Admin SPN is configured via environment variables:

FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID=your_global_admin_client_id
FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET=your_global_admin_secret

These credentials should be rotated periodically and stored in a secrets manager for production deployments.

API Routes Using Global Admin SPN

The following API routes use the Global Admin SPN for administrative operations:

API RouteOperationWhy Admin Required
/api/sso-spn/byod/databricks/catalogsList & validate catalogsRequires listing all catalogs
/api/sso-spn/byod/databricks/catalogs/mountCreate catalog & grant permissionsCreates catalogs, modifies permissions
/api/sso-spn/byod/databricks/catalogs/unmountDelete catalogRequires catalog deletion permission
/api/sso-spn/byod/databricks/providersList providers & sharesRequires listing all providers
/api/sso-spn/storage-settings/verify-groupVerify group membershipRequires SCIM API access

Audit & Compliance

Comprehensive audit trails enable security monitoring, incident investigation, and compliance reporting. FireFly logs events at multiple levels to provide full traceability.

Audit Architecture

Audit Levels

Application-Level Audit

FireFly logs all user actions with full context:

  • User identity (ID, email, organization)
  • Action type (login, query, file upload, etc.)
  • Timestamp (UTC)
  • Request metadata (IP, user agent)
  • Request/response summary (sanitized)

API-Level Audit

All Databricks API calls are logged:

  • Target API endpoint
  • SPN identity used
  • Request parameters
  • Response status and duration

Databricks-Level Audit

Databricks Unity Catalog audit logs capture:

  • SPN identity making requests
  • Data objects accessed (tables, volumes)
  • Operations performed (SELECT, INSERT, etc.)
  • Query text (for SQL operations)

Audit Correlation

By correlating logs across all levels, you can trace any data access back to the originating user:

Audit Trail Example
// User action → Data access trace

1. Application Log:
   User: alice@company.com (ID: user_123)
   Action: Execute SQL Query
   Organization: Company Inc (ID: org_456)
   Timestamp: 2024-01-15T14:30:00Z
   Query: SELECT * FROM sales.transactions LIMIT 100

2. API Log:
   Endpoint: POST /api/2.0/sql/statements
   SPN: spn_company_inc
   Request ID: req_789
   Duration: 450ms
   Status: 200

3. Databricks Unity Catalog Log:
   Principal: spn_company_inc
   Action: SELECT
   Object: sales.transactions
   Timestamp: 2024-01-15T14:30:00.123Z
   Rows Returned: 100

Compliance Support

The audit system supports various compliance requirements:

  • SOC 2: User access logging and monitoring
  • GDPR: Data access tracking and user consent
  • HIPAA: PHI access audit trails
  • PCI DSS: Cardholder data access logging

Security Best Practices

Follow these best practices to maximize the security of your FireFly deployment:

Authentication

  • Enable MFA (multi-factor authentication) in your identity provider
  • Use strong password policies (minimum 12 characters, complexity)
  • Implement session timeout for inactive users
  • Review and revoke unused sessions regularly

Access Control

  • Follow least-privilege principle for SPN permissions
  • Review Unity Catalog grants quarterly
  • Use separate SPNs for production and development
  • Audit organization membership regularly

Encryption

  • Rotate encryption keys quarterly
  • Use separate keys for different environments
  • Store keys in a secrets manager (not environment files)
  • Enable database-level encryption (TDE) in PostgreSQL

Monitoring

  • Set up alerts for unusual access patterns
  • Monitor failed authentication attempts
  • Track SPN token usage across organizations
  • Review audit logs weekly for anomalies

Future Improvements for Production

While FireFly Analytics implements robust security measures, there are additional enhancements recommended for production deployments handling sensitive data at scale. This section outlines key improvements for enterprise-grade security.

Current State

The following sensitive data is currently stored without column-level encryption in PostgreSQL. While the database connection uses TLS and the database itself can be configured with Transparent Data Encryption (TDE), application-level encryption provides an additional security layer.

  • userSpns.clientSecret - Per-user Service Principal credentials
  • byodDatabricksSpns.clientSecret - Organization BYOD SPN credentials
  • account.accessToken - OAuth access tokens
  • account.refreshToken - OAuth refresh tokens
  • account.idToken - OIDC ID tokens

SPN Credential Encryption

Service Principal credentials (client ID and client secret) should be encrypted at the application level before storage. This protects against:

  • Database backup exposure
  • SQL injection attacks that bypass application logic
  • Unauthorized database administrator access
  • Data breaches from database compromise
Recommended Implementation
// Example: Encrypting SPN credentials before storage
import { encryptToken, decryptToken } from "@/lib/token-encryption";

// When storing SPN credentials
async function createOrganizationSpn(orgId: string, clientId: string, clientSecret: string) {
  const encryptedSecret = encryptToken(clientSecret);

  await db.insert(byodDatabricksSpns).values({
    id: generateId(),
    organizationId: orgId,
    clientId: clientId,           // Client ID can remain plaintext
    clientSecret: encryptedSecret, // Encrypted with AES-256-GCM
    createdAt: new Date(),
    updatedAt: new Date(),
  });
}

// When retrieving SPN credentials
async function getOrganizationSpn(orgId: string) {
  const spn = await db.query.byodDatabricksSpns.findFirst({
    where: eq(byodDatabricksSpns.organizationId, orgId),
  });

  if (spn) {
    return {
      ...spn,
      clientSecret: decryptToken(spn.clientSecret), // Decrypt on read
    };
  }
  return null;
}

Per-Tenant Encryption Keys

For maximum security isolation, each organization (tenant) should have its own encryption key. This ensures that a compromised key only affects one organization, not the entire platform.

Current: Global Key

Single encryption key for all tenants

  • Simpler key management
  • Single point of compromise
  • Key rotation affects all data

Recommended: Per-Tenant Keys

Unique encryption key per organization

  • Blast radius limited to one org
  • Independent key rotation
  • Better compliance posture
Per-Tenant Key Architecture
// Database schema addition for tenant keys
export const organizationKeys = pgTable("organization_keys", {
  id: text("id").primaryKey(),
  organizationId: text("organization_id")
    .notNull()
    .references(() => organization.id),
  keyVersion: integer("key_version").notNull().default(1),
  encryptedKey: text("encrypted_key").notNull(), // Wrapped with master key
  createdAt: timestamp("created_at").notNull().defaultNow(),
  rotatedAt: timestamp("rotated_at"),
});

// Key hierarchy:
// 1. Master Key (HSM or KMS) - Never stored in database
// 2. Tenant Keys - Encrypted with master key, stored in DB
// 3. Data - Encrypted with tenant key

async function getTenantKey(orgId: string): Promise<Buffer> {
  const keyRecord = await db.query.organizationKeys.findFirst({
    where: eq(organizationKeys.organizationId, orgId),
    orderBy: desc(organizationKeys.keyVersion),
  });

  // Decrypt tenant key using master key from KMS
  const masterKey = await kms.getKey("firefly-master-key");
  return unwrapKey(keyRecord.encryptedKey, masterKey);
}

Key Management Service Integration

For production deployments, integrate with a cloud KMS:

  • AWS KMS: Use envelope encryption with CMKs
  • Azure Key Vault: Managed HSM for key protection
  • Google Cloud KMS: Hardware-backed key storage
  • HashiCorp Vault: Self-hosted secrets management

OAuth Token Encryption

OAuth tokens in the account table should be encrypted at rest. These tokens provide direct access to user accounts and must be protected:

!

Access Tokens

Short-lived but provide immediate API access. Encrypt to prevent unauthorized use during their validity window.

!

Refresh Tokens

Long-lived and can generate new access tokens. Critical to encrypt as compromise allows persistent access.

!

ID Tokens

Contain user identity claims. Encrypt to protect PII and prevent identity spoofing.

Token Encryption Integration
// Better-Auth plugin for automatic token encryption
import { encryptToken, decryptToken } from "@/lib/token-encryption";

const tokenEncryptionPlugin = {
  name: "token-encryption",
  hooks: {
    // Encrypt tokens before database write
    beforeCreateAccount: async (account) => ({
      ...account,
      accessToken: account.accessToken
        ? encryptToken(account.accessToken)
        : null,
      refreshToken: account.refreshToken
        ? encryptToken(account.refreshToken)
        : null,
      idToken: account.idToken
        ? encryptToken(account.idToken)
        : null,
    }),

    // Decrypt tokens after database read
    afterGetAccount: async (account) => ({
      ...account,
      accessToken: account.accessToken
        ? decryptToken(account.accessToken)
        : null,
      refreshToken: account.refreshToken
        ? decryptToken(account.refreshToken)
        : null,
      idToken: account.idToken
        ? decryptToken(account.idToken)
        : null,
    }),
  },
};

Production Readiness Checklist

Before deploying to production with sensitive data, ensure the following security enhancements are implemented:

1Database Encryption

  • Enable PostgreSQL TDE (Transparent Data Encryption)
  • Implement column-level encryption for SPN client secrets
  • Encrypt OAuth tokens (access, refresh, ID tokens)
  • Encrypt any PII stored in user tables

2Key Management

  • Integrate with cloud KMS (AWS KMS, Azure Key Vault, etc.)
  • Implement per-tenant encryption keys
  • Establish key rotation procedures and schedule
  • Remove encryption keys from environment variables

3Access Controls

  • Restrict database access to application service accounts only
  • Implement database query logging and monitoring
  • Use separate database credentials per environment
  • Enable row-level security where applicable

4Monitoring & Compliance

  • Set up alerts for encryption key access
  • Monitor for bulk data access patterns
  • Document encryption practices for compliance audits
  • Test key rotation and disaster recovery procedures

Conclusion

FireFly Analytics implements a comprehensive security architecture that protects data at every layer. The combination of strong authentication, encryption, multi-tenant isolation, and audit trails provides:

Confidentiality

Data is encrypted in transit and at rest, accessible only to authorized users and systems.

Integrity

Authenticated encryption and access controls prevent unauthorized modification of data.

Accountability

Comprehensive audit trails enable full traceability from user action to data access.

Explore More

Learn about other aspects of the FireFly Analytics architecture.