SQL Portfolio Project: Analyzing Digital Marketing Metrics

By Joyce Kimaiyo | 6 min read

Business Question

The digital marketing dataset from Kaggle (https://www.kaggle.com/datasets/sinderpreet/analyze-the-marketing-spending) was used for this project to analyze marketing expenditure, revenue generated, and customer behavior—specifically, identifying who clicked on the ad banner, who signed up, and who completed a purchase. Additionally, various marketing metrics were calculated to assess the effectiveness of the campaign and identify key parameters relevant for further analysis. The project aimed to answer the following questions:

Overall ROMI (Return on marketing investment) ROMI by campaigns
Performance of the campaign depending on the date — on which date did the company spend the most money on advertising, when did the company get the biggest revenue when conversion rates were high and low?
What were the average order values by the campaign?
When buyers are more active? What is the average revenue on weekdays and weekends?
Which types of campaigns work best — social, banner, influencer, or search?
Which geo locations are better for targeting — tier 1 or tier 2 cities?

Data Cleaning

In the Campaign Name category, two types of information were grouped into one, requiring the identification of geo-locations that were more effective for targeting—Tier 1 or Tier 2 cities. It was necessary to separate "Facebook_tier1" from "Facebook" and "tier1," which was achieved by first altering the table and adding a new column. The column was then updated, and the substring_index string function was used to separate the two words using the underscore as a delimiter. Additionally, there was an issue with inconsistent spelling in some instances of the word "Facebook," such as "facebOOK." To standardize the data, the .lower string function was applied to ensure all values in the column were converted to lowercase. Lastly, since MySQL expects date values in the YYYY-MM-DD format for proper date and time operations, the existing date format, which was in MM-DD-YYYY, needed to be converted accordingly.

Data Analysis

Data Analysis Screenshot

  • Overall ROMI (Return on Marketing Investment)
    To calculate overall ROMI, the total revenue is summed, and the total marketing expenditure is subtracted.
  • ROMI by Campaigns
    To calculate ROMI by campaign category, the data was grouped by campaign category by adding a category column to the SELECT clause.
  • Date with the Highest Advertising Spend
    To determine the date on which the company spent the most on advertising, the date and market spend values can be selected, ordered by revenue in descending order, and limited to one result.
  • Dates with the Highest Revenue and Conversion Rates
    To identify the date with the highest revenue, the date and revenue values were selected, ordered by revenue in descending order (from largest to smallest). This helps determine the periods when the company achieved high conversion rates from its campaigns.
  • Average Order Values by Campaign
    To determine the average order value by campaign, the category and average number of orders are selected and then grouped by category.
  • Weekdays or weekends?

    Buyer Activity on Weekdays vs. Weekends

    To differentiate between weekdays and weekends, the DAYOFWEEK() function is used, which returns a number from 1 to 7 representing the day of the week (1 = Sunday, 7 = Saturday). First, the table is altered to add a new column, day_type, to store the values. The table is then updated using a CASE function, categorizing days 1 and 7 as weekends and the rest as weekdays. Finally, buyer activity is analyzed by selecting day_type and orders, then grouping by day_type. The analysis indicates that weekdays have more active buyers than weekends.

    Most Effective Campaign Types

    Weekdays or weekends?>

    To evaluate campaign effectiveness, ROMI per category is used, revealing that influencer campaigns yield the best results.

    Best Geo-Locations for Targeting: Tier 1 vs. Tier 2 Cities

    To determine the most effective geo-locations for targeting, overall ROMI per geo-location is analyzed. First, the table is altered to include a new column, ROMI, which stores the difference between revenue and marketing spend. The table is then updated with calculated values, and the geo-locations are selected, with ROMI summed as total_ROMI, grouped by geo-location, and ordered in descending order. The analysis indicates that Tier 1 cities yield a higher return on investment compared to Tier 2 cities.

    📌 Complete portfolio available at: GitHub Repository

    Other Projects

    Table of Contents