Security Architecture
A comprehensive guide to FireFly Analytics security architecture, covering authentication, encryption, multi-tenant isolation, access control, and audit trails.
Overview
Security is a foundational principle of FireFly Analytics. The platform implements defense-in-depth with multiple security layers protecting user data, credentials, and system integrity. The SSO-SPN architecture inherently provides security benefits by separating user identity from Databricks access.
This document covers the key security mechanisms, from authentication and encryption to multi-tenant isolation and comprehensive audit trails.
Security Highlights
- OAuth 2.0 + PKCE: Industry-standard authentication with proof key for code exchange
- AES-256-GCM: Military-grade encryption for all sensitive data at rest
- TLS 1.3: All data encrypted in transit with modern cryptographic protocols
- Multi-tenant isolation: Complete data separation between organizations
- Unity Catalog permissions: Fine-grained access control at the data layer
- Comprehensive audit trails: Full traceability from user action to data access
Security Layers Overview
FireFly implements four distinct security layers that work together to protect the platform and its data:
Authentication Layer
SSO/OIDC integration, session management, and OAuth PKCE ensure only verified users can access the platform.
Encryption Layer
TLS 1.3 for data in transit, AES-256-GCM for data at rest, and secure key management protect sensitive information.
Access Control Layer
Role-based access control, organization isolation, and Unity Catalog permissions enforce least-privilege access.
Audit Layer
Comprehensive logging at application and Databricks levels provides full traceability for compliance.
Authentication Security
FireFly uses a two-layer authentication model that separates user identity from Databricks access. Users authenticate via your organization's identity provider (Okta, Azure AD, Auth0), while Databricks access uses organization-specific Service Principals.
Authentication Flow
The following diagram shows the complete authentication flow, from initial SSO login through session creation and API access:
OAuth 2.0 with PKCE
PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks, which is critical for web applications:
1. Code Verifier Generation
A cryptographically random 43-128 character string is generated client-side and stored securely (never sent over the network).
2. Code Challenge Creation
The code challenge is a SHA-256 hash of the verifier, base64url encoded. This is sent to the authorization server.
3. Token Exchange
When exchanging the authorization code for tokens, the original verifier is sent. The server hashes it and compares to the stored challenge.
4. Attack Prevention
Even if an attacker intercepts the authorization code, they cannot exchange it without the original code verifier.
Session Security
Sessions are managed with multiple security controls:
Session cookies cannot be accessed by JavaScript, preventing XSS attacks
Cookies only transmitted over HTTPS, preventing man-in-the-middle attacks
Prevents CSRF attacks while allowing top-level navigation
Cryptographically random session identifier with 256 bits of entropy
Sessions automatically expire after 30 days of inactivity
Session Metadata
Each session stores security metadata for anomaly detection:
- IP Address: Client IP for geographic validation
- User Agent: Browser/device fingerprinting
- Created At: Session creation timestamp
- Last Activity: Most recent request timestamp
Data Encryption
All sensitive data is encrypted both in transit and at rest using industry-standard cryptographic algorithms.
Encryption in Transit
All network communication uses TLS 1.3, the most modern and secure transport layer protocol:
TLS 1.3 Features
- Reduced handshake latency (1-RTT)
- Forward secrecy by default
- Removed legacy cipher suites
- Encrypted handshake messages
Protected Channels
- Browser to Next.js (HTTPS)
- Next.js to PostgreSQL (TLS)
- Next.js to Databricks (HTTPS)
- Go Proxy to Databricks (HTTPS)
Encryption at Rest
All sensitive data stored in PostgreSQL is encrypted using AES-256-GCM before storage:
AES-256-GCM Details
- Algorithm: AES (Advanced Encryption Standard)
- Key Size: 256 bits (32 bytes)
- Mode: GCM (Galois/Counter Mode)
- IV Size: 96 bits (12 bytes, unique per encryption)
- Auth Tag: 128 bits (16 bytes)
- Property: Authenticated encryption (confidentiality + integrity)
What Gets Encrypted
Service Principal Credentials
Client ID and Client Secret for Databricks SPNs
OAuth Access Tokens
Bearer tokens for Databricks API access
OAuth Refresh Tokens
Long-lived tokens for obtaining new access tokens
Key Management
- Encryption keys stored in environment variables, never in code
- Keys should be rotated periodically (recommended: quarterly)
- Production keys different from development keys
- Key access logged for security auditing
Multi-Tenant Isolation
FireFly is designed as a multi-tenant platform where multiple organizations share the same infrastructure while maintaining complete data isolation. This is achieved through a combination of application-level controls and Databricks platform features.
Isolation Architecture
Isolation Layers
1. Session Isolation
Each user session is bound to a specific organization. The session context includes the organization ID, and all database queries are filtered by this context. Users cannot access data outside their active organization.
2. Database Isolation
All organization data is stored with an organization ID foreign key. Database queries automatically filter by the session's organization context, preventing cross-organization data leakage.
3. Service Principal Isolation
Each organization has its own Databricks Service Principal with specific Unity Catalog permissions. Organizations cannot access data outside their assigned catalogs.
4. Unity Catalog Isolation
At the Databricks level, Unity Catalog enforces data access based on SPN permissions. Even if application-level controls fail, Databricks prevents unauthorized access.
Defense in Depth
Multiple isolation layers ensure that a failure in one layer is caught by another:
- Application bug: Unity Catalog still enforces SPN permissions
- Stolen session: Session bound to specific organization
- Compromised SPN: Limited to assigned catalogs only
- Network breach: All data encrypted at rest
Access Control
Access control in FireFly operates at two levels: application-level role-based access control (RBAC) and data-level Unity Catalog permissions.
Application RBAC
FireFly supports three roles within each organization:
Owner
Full administrative control
- Manage organization settings
- Add/remove members
- Configure SPN credentials
- Delete organization
Admin
Administrative access
- Manage members
- View audit logs
- Configure settings
- Cannot delete org
Member
Standard user access
- Browse catalogs
- Execute queries
- Use applications
- No admin functions
Unity Catalog Permissions
At the data layer, Unity Catalog provides fine-grained access control:
Permission Hierarchy
Permissions flow down the Unity Catalog hierarchy:
Common Permissions
USE_CATALOG: Access catalog metadataUSE_SCHEMA: Access schema metadataSELECT: Read table data
MODIFY: Insert/update/delete dataREAD_VOLUME: Read files from volumeWRITE_VOLUME: Write files to volume
Global Admin SPN Architecture
FireFly uses a two-tier Service Principal architecture to separate administrative operations from user data access. This design follows the principle of least privilege by ensuring users never have direct access to elevated administrative credentials.
Credential Separation
Critical Security Design: The Global Admin SPN credentials are stored in environment variables and are never exposed to users or stored per-organization. Users authenticate with their own limited-scope SPNs for data access.
Two-Tier SPN Architecture
Global Admin SPN Role
The Global Admin SPN (FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID and FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET) is used exclusively for administrative operations that require elevated privileges:
Unity Catalog Administration
Creating and managing Delta Sharing catalogs on behalf of organizations:
- Create Catalogs: Mount Delta Sharing catalogs from providers
- Delete Catalogs: Unmount catalogs when no longer needed
- Grant Permissions: Assign catalog permissions to user SPNs
- List Providers/Shares: Discover available Delta Sharing resources
- Validate Catalogs: Verify catalog configurations exist and match
SCIM & Group Management
Managing workspace groups and membership verification:
- Verify Group Existence: Check if organization groups exist in workspace
- Check Group Membership: Validate user SPNs are in correct groups
- Storage Settings Verification: Validate organization storage configurations
Schema & Volume Operations
Managing uploads schemas and user volumes:
- Check Schema Existence: Verify uploads schema exists in catalogs
- List Volumes: Enumerate volumes in schemas
- Query Permissions: Check catalog permissions for groups
// From: src/app/api/sso-spn/byod/databricks/catalogs/route.ts
async function getGlobalAdminToken(workspaceUrl: string) {
// Credentials from environment variables (never stored in database)
const clientId = process.env.FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID;
const clientSecret = process.env.FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET;
if (!clientId || !clientSecret) {
return { success: false, error: "Global admin SPN credentials not configured" };
}
// OAuth 2.0 client credentials flow
const tokenUrl = `${workspaceUrl}/oidc/v1/token`;
const basicAuth = Buffer.from(`${clientId}:${clientSecret}`).toString("base64");
const response = await fetch(tokenUrl, {
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded",
Authorization: `Basic ${basicAuth}`,
},
body: new URLSearchParams({
grant_type: "client_credentials",
scope: "all-apis", // Admin scope for all API access
}),
});
const data = await response.json();
return { success: true, accessToken: data.access_token };
}User SPN Role
Each user has their own Service Principal with limited permissions. User SPNs are stored per-user in the userSpns table and are only used for data access operations:
User SPN Capabilities
- Execute SQL queries via Serverless SQL
- Browse assigned catalogs and schemas
- Read data from tables with SELECT permission
- Read files from volumes with READ_VOLUME permission
- Write to tables/volumes if MODIFY/WRITE_VOLUME granted
User SPN Restrictions
- Cannot create or delete catalogs
- Cannot modify catalog permissions
- Cannot access other organizations' data
- Cannot perform SCIM operations
- Limited to assigned catalogs only
// From: src/app/api/sso-spn/byod/databricks/catalogs/mount/route.ts
// After creating a catalog with Global Admin SPN:
// Grant permissions to the user's service principal
async function mountCatalog(orgId: string, userEmail: string, catalogName: string) {
// 1. Get global admin token (elevated privileges)
const adminToken = await getGlobalAdminToken(workspaceUrl);
// 2. Create the Delta Sharing catalog
await createDeltaSharingCatalog(workspaceUrl, adminToken, catalogName, provider, share);
// 3. Get the user's SPN from database
const userSpn = await db.query.userSpns.findFirst({
where: eq(userSpns.email, userEmail),
});
// 4. Grant limited permissions to user's SPN (not admin permissions)
await updateCatalogPermissions(workspaceUrl, adminToken, catalogName, [
{
principal: userSpn.clientId,
add: ["BROWSE", "EXECUTE", "READ_VOLUME", "SELECT", "USE_CATALOG", "USE_SCHEMA"],
// Note: No MODIFY, CREATE_*, or administrative permissions
},
]);
}Security Benefits
Least Privilege
User SPNs only have permissions required for data access. They cannot perform administrative operations even if compromised.
Credential Isolation
Global Admin credentials are in environment variables, not the database. A database breach doesn't expose admin credentials.
Audit Trail Separation
Administrative operations are clearly identifiable in audit logs by the Global Admin SPN identity vs. user SPN identities.
Blast Radius Limitation
A compromised user SPN can only access that user's assigned data. It cannot escalate to administrative access.
Environment Variables
The Global Admin SPN is configured via environment variables:
FIREFLY_SPN_GLOBAL_ADMIN_CLIENT_ID=your_global_admin_client_idFIREFLY_SPN_GLOBAL_ADMIN_CLIENT_SECRET=your_global_admin_secretThese credentials should be rotated periodically and stored in a secrets manager for production deployments.
API Routes Using Global Admin SPN
The following API routes use the Global Admin SPN for administrative operations:
| API Route | Operation | Why Admin Required |
|---|---|---|
| /api/sso-spn/byod/databricks/catalogs | List & validate catalogs | Requires listing all catalogs |
| /api/sso-spn/byod/databricks/catalogs/mount | Create catalog & grant permissions | Creates catalogs, modifies permissions |
| /api/sso-spn/byod/databricks/catalogs/unmount | Delete catalog | Requires catalog deletion permission |
| /api/sso-spn/byod/databricks/providers | List providers & shares | Requires listing all providers |
| /api/sso-spn/storage-settings/verify-group | Verify group membership | Requires SCIM API access |
Audit & Compliance
Comprehensive audit trails enable security monitoring, incident investigation, and compliance reporting. FireFly logs events at multiple levels to provide full traceability.
Audit Architecture
Audit Levels
Application-Level Audit
FireFly logs all user actions with full context:
- User identity (ID, email, organization)
- Action type (login, query, file upload, etc.)
- Timestamp (UTC)
- Request metadata (IP, user agent)
- Request/response summary (sanitized)
API-Level Audit
All Databricks API calls are logged:
- Target API endpoint
- SPN identity used
- Request parameters
- Response status and duration
Databricks-Level Audit
Databricks Unity Catalog audit logs capture:
- SPN identity making requests
- Data objects accessed (tables, volumes)
- Operations performed (SELECT, INSERT, etc.)
- Query text (for SQL operations)
Audit Correlation
By correlating logs across all levels, you can trace any data access back to the originating user:
// User action → Data access trace
1. Application Log:
User: alice@company.com (ID: user_123)
Action: Execute SQL Query
Organization: Company Inc (ID: org_456)
Timestamp: 2024-01-15T14:30:00Z
Query: SELECT * FROM sales.transactions LIMIT 100
2. API Log:
Endpoint: POST /api/2.0/sql/statements
SPN: spn_company_inc
Request ID: req_789
Duration: 450ms
Status: 200
3. Databricks Unity Catalog Log:
Principal: spn_company_inc
Action: SELECT
Object: sales.transactions
Timestamp: 2024-01-15T14:30:00.123Z
Rows Returned: 100Compliance Support
The audit system supports various compliance requirements:
- SOC 2: User access logging and monitoring
- GDPR: Data access tracking and user consent
- HIPAA: PHI access audit trails
- PCI DSS: Cardholder data access logging
Security Best Practices
Follow these best practices to maximize the security of your FireFly deployment:
Authentication
- Enable MFA (multi-factor authentication) in your identity provider
- Use strong password policies (minimum 12 characters, complexity)
- Implement session timeout for inactive users
- Review and revoke unused sessions regularly
Access Control
- Follow least-privilege principle for SPN permissions
- Review Unity Catalog grants quarterly
- Use separate SPNs for production and development
- Audit organization membership regularly
Encryption
- Rotate encryption keys quarterly
- Use separate keys for different environments
- Store keys in a secrets manager (not environment files)
- Enable database-level encryption (TDE) in PostgreSQL
Monitoring
- Set up alerts for unusual access patterns
- Monitor failed authentication attempts
- Track SPN token usage across organizations
- Review audit logs weekly for anomalies
Future Improvements for Production
While FireFly Analytics implements robust security measures, there are additional enhancements recommended for production deployments handling sensitive data at scale. This section outlines key improvements for enterprise-grade security.
Current State
The following sensitive data is currently stored without column-level encryption in PostgreSQL. While the database connection uses TLS and the database itself can be configured with Transparent Data Encryption (TDE), application-level encryption provides an additional security layer.
userSpns.clientSecret- Per-user Service Principal credentialsbyodDatabricksSpns.clientSecret- Organization BYOD SPN credentialsaccount.accessToken- OAuth access tokensaccount.refreshToken- OAuth refresh tokensaccount.idToken- OIDC ID tokens
SPN Credential Encryption
Service Principal credentials (client ID and client secret) should be encrypted at the application level before storage. This protects against:
- Database backup exposure
- SQL injection attacks that bypass application logic
- Unauthorized database administrator access
- Data breaches from database compromise
// Example: Encrypting SPN credentials before storage
import { encryptToken, decryptToken } from "@/lib/token-encryption";
// When storing SPN credentials
async function createOrganizationSpn(orgId: string, clientId: string, clientSecret: string) {
const encryptedSecret = encryptToken(clientSecret);
await db.insert(byodDatabricksSpns).values({
id: generateId(),
organizationId: orgId,
clientId: clientId, // Client ID can remain plaintext
clientSecret: encryptedSecret, // Encrypted with AES-256-GCM
createdAt: new Date(),
updatedAt: new Date(),
});
}
// When retrieving SPN credentials
async function getOrganizationSpn(orgId: string) {
const spn = await db.query.byodDatabricksSpns.findFirst({
where: eq(byodDatabricksSpns.organizationId, orgId),
});
if (spn) {
return {
...spn,
clientSecret: decryptToken(spn.clientSecret), // Decrypt on read
};
}
return null;
}Per-Tenant Encryption Keys
For maximum security isolation, each organization (tenant) should have its own encryption key. This ensures that a compromised key only affects one organization, not the entire platform.
Current: Global Key
Single encryption key for all tenants
- Simpler key management
- Single point of compromise
- Key rotation affects all data
Recommended: Per-Tenant Keys
Unique encryption key per organization
- Blast radius limited to one org
- Independent key rotation
- Better compliance posture
// Database schema addition for tenant keys
export const organizationKeys = pgTable("organization_keys", {
id: text("id").primaryKey(),
organizationId: text("organization_id")
.notNull()
.references(() => organization.id),
keyVersion: integer("key_version").notNull().default(1),
encryptedKey: text("encrypted_key").notNull(), // Wrapped with master key
createdAt: timestamp("created_at").notNull().defaultNow(),
rotatedAt: timestamp("rotated_at"),
});
// Key hierarchy:
// 1. Master Key (HSM or KMS) - Never stored in database
// 2. Tenant Keys - Encrypted with master key, stored in DB
// 3. Data - Encrypted with tenant key
async function getTenantKey(orgId: string): Promise<Buffer> {
const keyRecord = await db.query.organizationKeys.findFirst({
where: eq(organizationKeys.organizationId, orgId),
orderBy: desc(organizationKeys.keyVersion),
});
// Decrypt tenant key using master key from KMS
const masterKey = await kms.getKey("firefly-master-key");
return unwrapKey(keyRecord.encryptedKey, masterKey);
}Key Management Service Integration
For production deployments, integrate with a cloud KMS:
- AWS KMS: Use envelope encryption with CMKs
- Azure Key Vault: Managed HSM for key protection
- Google Cloud KMS: Hardware-backed key storage
- HashiCorp Vault: Self-hosted secrets management
OAuth Token Encryption
OAuth tokens in the account table should be encrypted at rest. These tokens provide direct access to user accounts and must be protected:
Access Tokens
Short-lived but provide immediate API access. Encrypt to prevent unauthorized use during their validity window.
Refresh Tokens
Long-lived and can generate new access tokens. Critical to encrypt as compromise allows persistent access.
ID Tokens
Contain user identity claims. Encrypt to protect PII and prevent identity spoofing.
// Better-Auth plugin for automatic token encryption
import { encryptToken, decryptToken } from "@/lib/token-encryption";
const tokenEncryptionPlugin = {
name: "token-encryption",
hooks: {
// Encrypt tokens before database write
beforeCreateAccount: async (account) => ({
...account,
accessToken: account.accessToken
? encryptToken(account.accessToken)
: null,
refreshToken: account.refreshToken
? encryptToken(account.refreshToken)
: null,
idToken: account.idToken
? encryptToken(account.idToken)
: null,
}),
// Decrypt tokens after database read
afterGetAccount: async (account) => ({
...account,
accessToken: account.accessToken
? decryptToken(account.accessToken)
: null,
refreshToken: account.refreshToken
? decryptToken(account.refreshToken)
: null,
idToken: account.idToken
? decryptToken(account.idToken)
: null,
}),
},
};Production Readiness Checklist
Before deploying to production with sensitive data, ensure the following security enhancements are implemented:
1Database Encryption
- Enable PostgreSQL TDE (Transparent Data Encryption)
- Implement column-level encryption for SPN client secrets
- Encrypt OAuth tokens (access, refresh, ID tokens)
- Encrypt any PII stored in user tables
2Key Management
- Integrate with cloud KMS (AWS KMS, Azure Key Vault, etc.)
- Implement per-tenant encryption keys
- Establish key rotation procedures and schedule
- Remove encryption keys from environment variables
3Access Controls
- Restrict database access to application service accounts only
- Implement database query logging and monitoring
- Use separate database credentials per environment
- Enable row-level security where applicable
4Monitoring & Compliance
- Set up alerts for encryption key access
- Monitor for bulk data access patterns
- Document encryption practices for compliance audits
- Test key rotation and disaster recovery procedures
Conclusion
FireFly Analytics implements a comprehensive security architecture that protects data at every layer. The combination of strong authentication, encryption, multi-tenant isolation, and audit trails provides:
Confidentiality
Data is encrypted in transit and at rest, accessible only to authorized users and systems.
Integrity
Authenticated encryption and access controls prevent unauthorized modification of data.
Accountability
Comprehensive audit trails enable full traceability from user action to data access.
Explore More
Learn about other aspects of the FireFly Analytics architecture.