MD Market Miner
v1.0ETL Pipeline Dashboard
Monitor the automated e-commerce discovery process.
Seeds
not started
Generate search queries in RO, RU, and EN to find potential e-commerce domains.
Generates keyword combinations
Uses platform-specific footprints
Discover
not started
Use SERP APIs or Playwright to discover candidate URLs based on generated seeds.
Respects API rate limits
Supports Bing, SerpAPI, Google CSE
Crawl
not started
Asynchronously crawl candidate sites, fetching home, delivery, and contact pages.
Respects robots.txt
Per-domain concurrency limiting
Caches HTTP responses
Detect
not started
Apply a scoring algorithm and GenAI to identify shops and their platforms.
Scores based on cart buttons, checkout paths, schema.org
Detects WooCommerce, Shopify, etc.
AI-enhanced detection for ambiguous cases
Enrich
not started
Extract metadata like language, currency, delivery info, and product estimates.
Checks for Moldova delivery markers
Parses sitemaps for product count
Extracts contact details
Dedupe
not started
Canonicalize domains and merge data from duplicate entries (e.g., www vs. non-www).
Prioritizes HTTPS
Merges signals from multiple sources
Export
not started
Generate final report and export data to CSV and Parquet formats.
Creates summary report in Markdown
Saves raw data for analysis