Back to Home

Unstructured Data Portal

Next.jsFastifyPostgreSQLBullMQRedisDocker

Unstructured Data Portal

A full-stack, self-hosted content aggregation and search platform for ingesting, processing, and exploring heterogeneous data from across the web: articles, products, job listings, videos, documents, and more.

Built as a monorepo with Next.js, Fastify, PostgreSQL (Supabase), BullMQ, and Redis. The system normalizes disparate content into a unified schema and runs it through a multi-stage worker pipeline before serving it through a web UI with faceted search, infinite scroll, and detail drawers.

  • Hybrid search combining Postgres full-text search (BM25), trigram fuzzy matching, and vector embeddings for semantic retrieval
  • Automatic fallback across search strategies

AI Enrichment

  • Multi-stage worker pipeline: normalize, enrich, embed, media, index
  • AI-powered generation of titles, summaries, categories, and text embeddings via OpenRouter
  • Configurable prompt overrides and cost tracking

Security & Multi-Tenancy

  • Row-Level Security enforced multi-tenancy with per-request scoped database connections
  • SSRF-hardened ingestion with URL validation, HMAC-signed webhooks, and scoped API key authentication
  • SSO via a central auth portal with iron-session cookies and passkey/social login support

Operations & UX

  • Admin dashboard with pipeline monitoring, health checks, AI spend analytics, and audit logs
  • Telegram bot for mobile content capture with real-time pipeline progress
  • Media pipeline with Puppeteer screenshots, thumbnail downloads, and LQIP blur placeholders
  • Progressive web UX with keyboard shortcuts, density toggle, bulk operations, and RSS/JSON/CSV export
  • Accessibility-first: skip-to-content, aria-live regions, 44px touch targets, responsive design