18 functions

Data Processing

Clean, transform, and prepare data for analysis.

Functions

Add temporal components (year, quarter, month, day, day of week) as separate columns to a DataFrame.

Pad numeric or string values with leading zeros to achieve a fixed string length.

Add a sequential row count column within groups based on specified sorting criteria.

Calculate elapsed time periods from the earliest date in a DataFrame.

Identify and flag statistical outliers using Tukey's IQR fence method.

Remove leading and trailing whitespace from all string columns in a DataFrame.

Detect multivariate anomalies using z-score based probability analysis.

Match and link records across two DataFrames using fuzzy string matching algorithms.

Convert odds to probability values in a pandas DataFrame.

Calculate the count of missing values within specified groups.

Discretize continuous numeric data into discrete bins using specified strategies.

Generate a comprehensive technical summary and data dictionary for a DataFrame.

Randomly assign DataFrame records to a specified number of groups.

Consolidate low-frequency categorical values into a single catch-all category.

Partition a DataFrame into balanced groups using stratified random assignment.

Geocode U.S. addresses into latitude and longitude coordinates using the U.S. Census Bureau service.

Fill missing numeric data using K-Nearest Neighbors imputation.

Verify and enforce a unique level of granularity for a DataFrame.

The most-used module in the library, covering the full data preparation workflow — from first look to analysis-ready dataset.