There are two classes for linear models:
- Weighted Least Squares (WLS)
- Ordinary Least Squares (OLS)
OLS is the core model class in Appelpy. Many of the model diagnostics available are built for OLS models.
The classes are built upon Statsmodels. Features that are available in Appelpy include standardized estimates for models.
Fit a model
The 10 Minutes To Appelpy notebook fits an OLS model to the California Test Score dataset.
- : interactive experience of the 10 Minutes to Appelpy tutorial via Binder.
- : static render of the 10 Minutes to Appelpy notebook.
from appelpy.linear_model import OLS model1 = OLS(df, y_list, X_list).fit() model1.results_output # returns summary results
fit method must be called in order to set attributes for the model object.
There are three important parameters for initialising any model class in Appelpy:
df: the dataframe to use for modelling. This must have no NaN values, no infinite values, etc.
y_list: a list with the dependent variable column name.
X_list: a list of column names for independent variables (regressors).
Other keyword arguments can be used when initialising the model class, e.g.
cov_type to specify the type of standard errors in the model.
Here are some attributes available for linear models:
X: the dataframes of the dependent and independent variables.
X_standardized: the standardized versions of
results_outputfor the Statsmodels summary of the model. Note: the Statsmodels results object is also stored in the
results_output_standardizedfor the standardized estimates of the model.
model_selection_stats: dictionary of key statistics on the model fit, including the root mean square error (root MSE).
- The model residuals
residand their standardized form
For all linear model classes there is a
significant_regressors method that returns a list of the significant independent variables of a model, given a significance level alpha.
fit to fit a model.
Pass a Numpy array to a
predict call in order to make predictions given a model. The method considers whether the regressors values passed to the method are 'within sample' before returning predictions. By default, predictions are only returned if all the regressor values are 'within sample'.
Diagnostic plot method
There is also a convenient
diagnostic_plot method that consumes an OLS model object, in order to return plots such as:
- P-P plot (
- Q-Q plot (
- Residuals vs fitted values plot (
- Residuals vs predictor values plot (
Here is a useful recipe for producing a 2x2 grid of diagnostic plots with Matplotlib:
fig, axarr = plt.subplots(2, 2, figsize=(10, 10)) model.diagnostic_plot('pp_plot', ax=axarr) model.diagnostic_plot('qq_plot', ax=axarr) model.diagnostic_plot('rvf_plot', ax=axarr) axarr[0, 1].axis('off') plt.tight_layout()