๐Ÿ› ๏ธ Batch Data Processing with Python, MySQL, Docker, Airflow, and Streamlit

๐Ÿ“˜ Project Overview

This project showcases an automated data pipeline that scrapes trending posts from technology-focused subreddits using the Reddit API and presents them in an interactive web application built with Streamlit. The backend is powered by Airflow for scheduling, MySQL for storage, and Docker for seamless deployment.

๐Ÿง  What the App Does

๐Ÿงฐ Technologies Used

๐Ÿ Python: Scripting and backend logic
๐Ÿ—„๏ธ MySQL: Structured data storage
๐Ÿณ Docker: Containerized environment for scraper and frontend
๐Ÿ•น๏ธ Airflow: DAG automation for scraping and exporting
๐Ÿ“Š Streamlit: Interactive UI and visualization
๐Ÿ› ๏ธ PRAW: Reddit API wrapper
๐Ÿง  Gemini API: Contextual explanations of technical terms

๐Ÿ“ˆ Data Flow Summary

Reddit posts are scraped daily, stored in a MySQL database, and exported to CSV for use in the Streamlit dashboard. The scraper runs via Airflow DAGs, and Docker ensures environment consistency across systems. A separate container hosts the frontend app, offering a seamless experience for users exploring the latest in data science and technology.

๐Ÿš€ Live Demo & Code

๐Ÿ”— Live App: Reddit_News

๐Ÿงช Other Projects

๐Ÿ“š Table of Contents