๐Ÿ“ก Real-Time Data Processing with Kafka, AWS Cloud, MySQL, and Looker Studio

๐Ÿ“˜ Project Overview

In this project, I scraped data from YouTubeโ€™s tech niche using `yt_dlp`, streamed it using Kafka, cleaned and stored it into a MySQL database hosted on AWS RDS, then visualized insights with Looker Studio.

๐Ÿ“Š What the Dashboard Shows

Dashboard Screenshot

๐Ÿ”— View Live Dashboard

๐Ÿงฑ Project Structure


https://github.com/JoyKimaiyo/youtubetechvideos
โ”œโ”€โ”€ producer.py        ๐Ÿš€ Scrapes & streams data
โ”œโ”€โ”€ consumer.py        ๐Ÿ“ฅ Consumes and stores data
โ”œโ”€โ”€ config.py          โš™๏ธ  DB & Kafka settings
โ”œโ”€โ”€ requirements.txt   ๐Ÿ“ฆ Dependencies
      

โ˜๏ธ MySQL Cloud Setup & Connection

AWS RDS provides scalable, secure MySQL hosting. I configured my instance via the AWS Console and connected using the MySQL CLI. This setup simplifies remote database management and scales with production workloads.

๐Ÿ” Kafka Streaming

I launched Zookeeper and Kafka to manage real-time streaming. The producer captured live YouTube metadata and sent it to a Kafka topic. The consumer subscribed to this topic, processed the data, and saved it to the cloud-hosted MySQL database.

๐Ÿ“ˆ Connecting to Looker Studio

Looker Studio was used for live dashboards. I connected it to MySQL via a connector and visualized high-level metrics like total views per niche, most popular channels, and trending keywords. These insights aid in content strategy for creators and marketers.

๐Ÿš€ Other Projects

๐Ÿ“š Table of Contents