How It Works
Real-time monitoring and AI research architecture
Ingestion Rate
100-200 items/hour
Entity Extraction
50-100 entities/min
API Response
<100ms cached
Database Size
~50MB retention
AI Research
10-30s per entity
Data Sources
60+63 feeds
graph LR
Reddit["📡 Reddit API
~60 subreddits"] RSS["📰 RSS Feeds
~63 feeds"] Ingest["📥 Ingestion Layer
Every 1 min ±15s"] Extract["⚙️ Entity Extraction
spaCy NER
Every 2 min ±30s"] Aggregate["📊 Aggregation
1d/7d/30d rollups
Every 5 min"] DB["💾 SQLite Database
4 main tables"] API["🔗 FastAPI Server
Port 8080"] Dashboard["🎨 Dashboard
Real-time UI"] Perplexity["🔎 Perplexity API
Web search"] OpenAI["✨ OpenAI API
Structured output"] Research["🎯 AI Research
10-30 seconds"] Reddit --> Ingest RSS --> Ingest Ingest --> DB DB --> Extract Extract --> DB DB --> Aggregate Aggregate --> DB DB --> API API --> Dashboard API --> Research Research --> Perplexity Perplexity --> OpenAI OpenAI --> Research Research --> DB
~60 subreddits"] RSS["📰 RSS Feeds
~63 feeds"] Ingest["📥 Ingestion Layer
Every 1 min ±15s"] Extract["⚙️ Entity Extraction
spaCy NER
Every 2 min ±30s"] Aggregate["📊 Aggregation
1d/7d/30d rollups
Every 5 min"] DB["💾 SQLite Database
4 main tables"] API["🔗 FastAPI Server
Port 8080"] Dashboard["🎨 Dashboard
Real-time UI"] Perplexity["🔎 Perplexity API
Web search"] OpenAI["✨ OpenAI API
Structured output"] Research["🎯 AI Research
10-30 seconds"] Reddit --> Ingest RSS --> Ingest Ingest --> DB DB --> Extract Extract --> DB DB --> Aggregate Aggregate --> DB DB --> API API --> Dashboard API --> Research Research --> Perplexity Perplexity --> OpenAI OpenAI --> Research Research --> DB
graph TD
S1["Step 1: Ingestion
Timing: Every ~1 min
Action: Poll Reddit/RSS
Output: source_items table
Retention: 30 minutes"] S2["Step 2: Entity Extraction
Timing: Every 2 min
Action: spaCy NER processing
Output: entities table
Retention: CASCADE delete"] S3["Step 3: Aggregation
Timing: Every 5 min
Action: Calculate metrics
Output: entity_metrics table
Retention: Permanent"] S4["Step 4: API & Dashboard
Timing: On demand / 30s cache
Action: FastAPI queries
Output: JSON response
Users: Real-time visualization"] S5["Step 5: AI Research
Timing: 10-30 seconds
Action: Perplexity + OpenAI
Output: entity_research table
Fields: 48 comprehensive"] S1 --> S2 S2 --> S3 S3 --> S4 S4 -.->|On demand| S5 S5 --> S3 style S1 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S2 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S3 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S4 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S5 fill:#f8f8f8,stroke:#F9BF3B,stroke-width:2px,color:#1a1a1a
Timing: Every ~1 min
Action: Poll Reddit/RSS
Output: source_items table
Retention: 30 minutes"] S2["Step 2: Entity Extraction
Timing: Every 2 min
Action: spaCy NER processing
Output: entities table
Retention: CASCADE delete"] S3["Step 3: Aggregation
Timing: Every 5 min
Action: Calculate metrics
Output: entity_metrics table
Retention: Permanent"] S4["Step 4: API & Dashboard
Timing: On demand / 30s cache
Action: FastAPI queries
Output: JSON response
Users: Real-time visualization"] S5["Step 5: AI Research
Timing: 10-30 seconds
Action: Perplexity + OpenAI
Output: entity_research table
Fields: 48 comprehensive"] S1 --> S2 S2 --> S3 S3 --> S4 S4 -.->|On demand| S5 S5 --> S3 style S1 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S2 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S3 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S4 fill:#f8f8f8,stroke:#67809F,stroke-width:2px,color:#1a1a1a style S5 fill:#f8f8f8,stroke:#F9BF3B,stroke-width:2px,color:#1a1a1a
erDiagram
source_items ||--o{ entities : contains
entities ||--o| entity_metrics : aggregated
entities ||--o| entity_research : researched
source_items {
int id
string item_id UK
string source
string title
text content
timestamp created_at
}
entities {
int id
int source_item_id FK
string text
string normalized UK
string entity_type
timestamp extracted_at
}
entity_metrics {
int id
string normalized UK
string entity_type
string timeframe
int mention_count
float velocity_index
timestamp calculated_at
}
entity_research {
int id
string normalized UK
string entity_type
text summary
json swot_analysis
json quantitative_metrics
int use_case_fit_score
json jira_tickets
timestamp generated_at
}