Pareto Analysis known as 80/20 rule is a statistical method in decision-making used for the selection of prioritise the tasks for significant effect. It is based on the idea that 80 percent of benefits can come from doing 20 percent of the works. In this small writing I tried to implement python plain codes and libraries to generate pareto analysis chart using very simple test data. I tried to make it easy to understand and reveal the detail codes to re-use for your project.

Think we are analysing health care related data and we are tyring to figure out the clinical errors after doctor visit and prescription issue. Our survey received
following data.

Errors | Case found |
---|---|

Wrong prescription | 2 |

Over dose intake | 20 |

Low dose intake | 10 |

Repeated doses intake | 5 |

Medicine wrong time intake | 42 |

Patient unawareness | 21 |

Intake forgotten | 3 |

Wrong medicine intake | 1 |

Despite of the number of cases high or low, it may not wise to decide which errors are more significant and which are less. We should no prioritize base on their number of frequency. In such case we may use Pareto 80/20 rule which may help us to identify which errors are more critical and which are trivial. The following codes were written in plain python thus it could be self explanatory to the reader.

import matplotlib as mpl mpl.use('TkAgg') from matplotlib import pyplot as plt import pandas as pd import numpy as np error_cat = ["Wrong\nprescription", "Over\ndose\nintake", "Low\ndose\nintake", "Repeated\ndoses\nintake", "Wrong\ntime\nintake", "Patient\nunawareness", "Intake\nforgotten", "Wrong\nmedicine\nintake", "Medicine\nunavailable"] error_freq = [2, 20, 10, 5, 42, 21, 30, 20, 1] # make dataframe data = list(zip(error_cat, error_freq)) df = pd.DataFrame(data, columns=['category', 'frequency']) # sort by frequency and re-index the rows df = df.sort_values(['frequency'], ascending=False).reset_index(drop=True) sum_for_frequency = df["frequency"].sum() # calculate relative frequency and cumulative frequency df["relative_frequency"] = round((df["frequency"] / sum_for_frequency) * 100, 2) df["cumulative_frequency"] = np.cumsum(df["relative_frequency"]) # prepare plot fig, axes1 = plt.subplots() plt.xticks(rotation=0, fontsize=8) # prepare axes axes1.set_ylim(0, df["frequency"].max() + 10) axes1.set_ylabel("frequency", color='black') axes1.spines['left'].set_visible(True) axes1.spines['top'].set_visible(False) axes1.yaxis.set_visible(True) axes1_bars = axes1.bar(df['category'], df['frequency'], color="#818380", zorder=10) df_20 = df.loc[df["cumulative_frequency"] < 80] critical_few = df_20.shape[0] trivial_many = df.shape[0] - critical_few # mark critical bars and show frequency on top of the bar for i in range(critical_few): axes1_bars[i].set_color("#173F5F") plt.text(axes1_bars[i].get_x() + axes1_bars[i].get_width() / 3, axes1_bars[i].get_height() + 1, axes1_bars[i].get_height(), fontsize=9, color='black', zorder=20) # share x-axis and prepare axes2 axes2 = axes1.twinx() axes2.set_ylim(0, 105) axes2.set_ylabel("cumulative frequency in %", color="gray") axes2.spines['left'].set_visible(False) axes2.spines['top'].set_visible(False) axes2.tick_params(axis='y', colors='gray') axes2.plot(df['category'], df['cumulative_frequency'], "--o", color='#767775', linewidth=.8) # share y-axis and prepare axes3 for 80/20 marks ax3 = axes2.twiny() ax3.spines['left'].set_visible(False) ax3.spines['top'].set_visible(True) ax3.spines['right'].set_visible(False) ax3.yaxis.set_visible(False) ax3.xaxis.set_visible(False) ax3.set_xlim([0, df.shape[0]]) # draw insertion line and mark critical and trivial ax3.axhline(80, 1, critical_few/10, color="red", linestyle="--", linewidth=.5) ax3.axvline(critical_few, .8, 0, color="red", linestyle="--", linewidth=.5) ax3.text(critical_few - 2, 100, "critical (20%)", fontsize=10, color="red") ax3.text(critical_few + 2, 60, "trivial (80%)", fontsize=10, color="#818380") # set title and show plot title = plt.title("critical vs trivial clinical errors : n = %s" % df["frequency"].sum(), loc="center", fontsize=10) plt.setp(title, color="black") fig.tight_layout() plt.show()

You can modify the codes according to your needs. As well as you may improve the codes and most welcome to contribute it to my email address thus I can put it here.