These are some reasons why Appelpy adds so much more to the vanilla Python data stack:

General principles

  • Fluent interface and API design make it easier to build pipelines for modelling & data pre-processing.
  • Appelpy is a more accessible Python library for Stata users, by having a simple syntax and having most functionality rely on a fitted model object, while realising the benefits of object-oriented programming.

Data transformation pipe

nbviewer: click the badge to see a demonstration of model pipelines with Appelpy.

Data transformation

  • Useful encoders for transforming datasets, e.g.:
    • DummyEncoder: encode dummy columns from variables based on different policies for handling NaN values.
    • InteractionEncoder: encode columns (in a batch) for the interaction effect between two variables.
  • get_dataframe_columns_diff method makes it easier to compare the columns between two dataframes when building pipelines for transforming data.

Interaction encoder

Exploratory data analysis (EDA)

  • One method to examine all statistical_moments in a dataset: mean, variance, skewness and kurtosis.
  • Correlation heatmap plot with triangular matrix to maximise the data–ink ratio.

Model estimation

Model objects take on a special importance in Appelpy. More attributes and methods are available through these model objects. These are examples of what model objects contain:

  • Standardized model estimates (Beta coefficients): model.results_output_standardized.
  • Standardized y and X values.
  • Access the main metrics of a model via model.model_selection_stats, e.g. Root MSE for OLS models.
  • Odds ratio available for Logit models.

Standardized model estimates

Model diagnostics

It's easy enough to fit a model in Statsmodels. Where Appelpy shines is the ability to do diagnostics using only the model object, instead of having to call multiple methods scattered across a library. These are some diagnostics available:

  • One method for calling diagnostic plots to assess whether OLS assumptions hold in a model, e.g. model.diagnostic_plot('rvf_plot'). The plots are similar to commands available in Stata, but here they are accessible within the Appelpy OLS model object.
  • BadApples class does a decomposition of influence analysis into three parts: leverage, outlier and influence measures. The leverage vs residuals squared plot (similar to Stata's lvr2plot) can also be called from an instance of the class.
  • Identify extreme observations in a model based on common heuristics.
  • Perform diagnostics not implemented in the main Python libraries, e.g. studentized Breusch–Pagan test of heteroskedasticity.
  • Joint hypothesis testing, using built-in Python data structures (lists & dicts).

OLS diagnostics plot

Model usage

  • Model prediction methods have the option to return predictions only for observations that are 'within sample'.