How It Works - ST6 AI Mentions Pipeline

Ingestion Rate

100-200 items/hour

Entity Extraction

50-100 entities/min

API Response

<100ms cached

Database Size

~50MB retention

AI Research

10-30s per entity

Data Sources

60+63 feeds

Data Flow Pipeline

graph LR Reddit["📡 Reddit API
~60 subreddits"] RSS["📰 RSS Feeds
~63 feeds"] Ingest["📥 Ingestion Layer
Every 1 min ±15s"] Extract["⚙️ Entity Extraction
spaCy NER
Every 2 min ±30s"] Aggregate["📊 Aggregation
1d/7d/30d rollups
Every 5 min"] DB["💾 SQLite Database
4 main tables"] API["🔗 FastAPI Server
Port 8080"] Dashboard["🎨 Dashboard
Real-time UI"] Perplexity["🔎 Perplexity API
Web search"] OpenAI["✨ OpenAI API
Structured output"] Research["🎯 AI Research
10-30 seconds"] Reddit --> Ingest RSS --> Ingest Ingest --> DB DB --> Extract Extract --> DB DB --> Aggregate Aggregate --> DB DB --> API API --> Dashboard API --> Research Research --> Perplexity Perplexity --> OpenAI OpenAI --> Research Research --> DB

Processing Pipeline

graph TD S1["Step 1: Ingestion
Timing: Every ~1 min
Action: Poll Reddit/RSS
Output: source_items table
Retention: 30 minutes"] S2["Step 2: Entity Extraction
Timing: Every 2 min
Action: spaCy NER processing
Output: entities table
Retention: CASCADE delete"] S3["Step 3: Aggregation
Timing: Every 5 min
Action: Calculate metrics
Output: entity_metrics table
Retention: Permanent"] S4["Step 4: API & Dashboard
Timing: On demand / 30s cache
Action: FastAPI queries
Output: JSON response
Users: Real-time visualization"] S5["Step 5: AI Research
Timing: 10-30 seconds
Action: Perplexity + OpenAI
Output: entity_research table
Fields: 48 comprehensive"] S1 --> S2 S2 --> S3 S3 --> S4 S4 -.->|On demand| S5 S5 --> S3 style S1 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S2 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S3 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S4 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S5 fill:#f8f8f8,stroke:#F9BF3B,stroke-width:2px,color:#1a1a1a

Database Schema

erDiagram source_items ||--o{ entities : contains entities ||--o| entity_metrics : aggregated entities ||--o| entity_research : researched source_items { int id string item_id UK string source string title text content timestamp created_at } entities { int id int source_item_id FK string text string normalized UK string entity_type timestamp extracted_at } entity_metrics { int id string normalized UK string entity_type string timeframe int mention_count float velocity_index timestamp calculated_at } entity_research { int id string normalized UK string entity_type text summary json swot_analysis json quantitative_metrics int use_case_fit_score json jira_tickets timestamp generated_at }