Stats Vizuals
Creating histograms
- Arrange data points in ascending order
- Calculate range = Max value - Min value
- Decide number of classes for histogram (number of groups)
- Calculate class width = Range / Number of classes (Round up if needed)
- Determine groups:
- Group1: Min Val to (MinVal + Class Width)
- Group 2: Group1 Upper Val to (Group1 Upper Val+ Class Width)
- …
Whisker Plots

Density Curves
- The area under a density curve will always represent 100 % of the data, or 1.0
- The curve will never dip below the x-axis.
QQ Plot (Quantile-Quantile Plot)
- Used to determine whether a data set is distributed in a certain way
- Unless otherwise specified, it usually showcases how the data fits a normal distribution
- The Y-axis shows the values and the X-axis shows Theoretical Quantiles
- The diagonal line represnts the normal distribution
Sample Code
import scipy.stats as stats
import pylab
# first parameter is the data to plot
# second parameter is the type of plot
stats.probplot(df_ts_base1.spx, plot = pylab)
pylab.show()
Trendline
- Scatterplot with line indicating the trend
- Since this is a scatterplot, date values will not work in the x axis
- We need to use numeric values (e.g. indices) instead
- Can be linear or polynomial
Sample Code
import seaborn as sns
# Linear Trendline
sns.regplot(x = df_ts_2.index.values, y=df_ts_2.spx)
# Polynomial Trendline
# use the order parameter to generate polynomial trendlines
# default is order =1 and plots a linear trendline
sns.regplot(x = df_ts_2.index.values, y=df_ts_2.spx, order=2)
Lag PLots
- Used to understand the effect of lag periods on current value
Sample Code
# Helps analyze correlation w.r.t a single lag period
from pandas.plotting import lag_plot
lag_plot(df_ts_3.spx)
Autocorrelation PLots
- Can be used to analyze correlation w.r.t. all previous periods
Sample Code
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(df_ts_3.spx)