Understand the structure of your data — who groups together, what drives variance, and how to create balanced comparison groups.
Highlight functions
CreateKMeansClusters
K-means clustering with automatic visualization of cluster assignments. Returns the labeled DataFrame and a cluster profile summary.
from analysistoolbox.descriptive_analytics import CreateKMeansClusters
df_clustered = CreateKMeansClusters(
dataframe=df,
list_of_clustering_variables=["recency", "frequency", "monetary"],
number_of_clusters=4
)
CreateHierarchicalClusters
Agglomerative hierarchical clustering with dendrogram output. Useful when you don't know the number of clusters in advance.
from analysistoolbox.descriptive_analytics import CreateHierarchicalClusters
CreateHierarchicalClusters(
dataframe=df,
list_of_clustering_variables=["feature_1", "feature_2", "feature_3"]
)
ConductPropensityScoreMatching
Creates matched control and treatment groups using propensity scores — the standard approach for causal inference from observational data.
from analysistoolbox.descriptive_analytics import ConductPropensityScoreMatching
df_matched = ConductPropensityScoreMatching(
dataframe=df,
treatment_variable="received_treatment",
list_of_covariates=["age", "income", "baseline_score"]
)
ConductPrincipalComponentAnalysis
PCA with scree plot, explained variance, and a 2D projection of the data.
All functions
| Function | Description |
|---|---|
| ConductManifoldLearning | Non-linear dimensionality reduction (UMAP, t-SNE) |
| ConductPrincipalComponentAnalysis | PCA with visualization |
| ConductPropensityScoreMatching | Balanced groups for causal inference |
| CreateAssociationRules | Market basket / affinity analysis |
| CreateGaussianMixtureClusters | Soft probabilistic clustering |
| CreateHierarchicalClusters | Hierarchical clustering with dendrogram |
| CreateKMeansClusters | K-means clustering |
| GenerateEDAWithLIDA | AI-powered EDA using Microsoft LIDA |