Basic Data Analysis: Advertisement vs Sales

Author: Iqbal Hossain

Case: A prominent company would like to know how many products could be sold near future if they give advertisement in different electronic media. The analyst operated a survey and collected some primary data. Now could these data help management to assume the sales volume? The answer is yes it is possible to assume or predict the sales volume based on the collected data. In this post we will try to understand how to analyze data in Python. We will use basic machine learning process to predict sales volume. Let’s view few records from sample data first:

TV Radio Newspaper Sales
230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

There are 200 records and 4 columns of data like above table. As we will try to predict sales, 'sales' data is our response data. TV, Radio, Newspaper are feature data. We will use these feature data to find the sales on particular cost of spend on different electronic media. First we need to select which field will be our features and which one will be response. Here we want to predict sales value using data of TV, Radio and Newspaper. So these 3 fields of numeric data are our features. Feature: 1. TV: Advertising cost spent on TV for a single product in a given market (in thousands) 2. Radio: Advertising cost spent on Radio 3. Newspaper: Advertising cost spent on Newspaper Response: 1. Sales: Sales of a single product in a given market (in thousands pieces)
Questions About the Advertising Data
1. Is there a relationship between ads and sales? 2. How strong is that relationship? 3. Which ad types contribute to sales? 4. What is the effect of each ad type of sales? 5. Given ad spending in a particular market, can sales be predicted? To find the answers of above questions we will use Linear Regression model, P-value test and plot the data for observation Start Python to write following codes. I will explain step by step. I have used PyCharm for my IDE and install libraries of Padas, Scikit Lear, Matlib
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

df = pd.read_csv('advertising.csv')
print(df.head())
tvData = pd.DataFrame(df, columns=['TV', 'sales'])
radioData = pd.DataFrame(df, columns=['radio', 'sales'])
newspaperData = pd.DataFrame(df, columns=['newspaper', 'sales'])

# stat model
lmTv = smf.ols(formula='sales ~ TV', data=df).fit()
# print(lmTv.params)

lmRadio = smf.ols(formula='sales ~ radio', data=df).fit()
# print(lmRadio.params)

lmNewspaper = smf.ols(formula='sales ~ newspaper', data=df).fit()
# print(lmNewspaper.params)

figure = plt.figure()
ax1 = figure.add_subplot(1, 3, 1)
ax2 = figure.add_subplot(1, 3, 2)
ax3 = figure.add_subplot(1, 3, 3)

ax1.scatter(x=tvData['TV'], y=tvData['sales'])
ax1.plot(tvData['TV'], lmTv.predict(tvData['TV']), c='red')
ax1.set(title="TV vs Sales", xlabel="dollars (in thousand) spend", ylabel="sales (in thousand)")

ax2.scatter(x=radioData['radio'], y=radioData['sales'])
ax2.plot(radioData['radio'], lmRadio.predict(radioData['radio']), c='red')
ax2.set(title="Radio vs Sales", xlabel="dollars (in thousand) spend", ylabel="sales (in thousand)")

ax3.scatter(x=newspaperData['newspaper'], y=newspaperData['sales'])
ax3.plot(newspaperData['newspaper'], lmNewspaper.predict(newspaperData['newspaper']), c='red')
ax3.set(title="Newspaper vs Sales", xlabel="dollars (in thousand) spend", ylabel="sales (in thousand)")
plt.show(figure)


def predict_sales(category, money):
    if category == "TV":
        print("Spending $" + str(money*1000) + " to TV Adv. may grow sales to "
              + str(int(lmTv.predict(pd.DataFrame({'TV': [money]})) * 1000)))
    if category == "Radio":
        print("Spending $" + str(money*1000) + " to Radio Adv. may grow sales to "
              + str(int(lmRadio.predict(pd.DataFrame({'radio': [money]})) * 1000)))
    if category == "Newspaper":
        print("Spending $" + str(money*1000) + " to Newspaper Adv. may grow sales to "
              + str(int(lmNewspaper.predict(pd.DataFrame({'newspaper': [money]})) * 1000)))


predict_sales("TV", 500)
predict_sales("Radio", 500)
predict_sales("Newspaper", 500)