Skip to content

tidypyverse/tidypandas

Repository files navigation

PyPI version

tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

tidypandas python package provides minimal, pythonic API for common data manipulation tasks:

  • tidyframe class (wrapper over pandas dataframe) provides a dataframe with simplified index structure (no more resetting indexes and multi indexes)
  • Consistent ‘verbs’ (select, arrange, distinct, …) as methods to tidyframe class which mostly return a tidyframe
  • Unified interface for summarizing (aggregation) and mutate (assign) operations across groups
  • Utilites for pandas dataframes and series
  • Uses simple python data structures, No esoteric classes, No pipes, No Non-standard evaluation
  • No copy data conversion between tidyframe and pandas dataframes
  • An accessor to apply tidyframe verbs to simple pandas datarames

Example

  • tidypandas code:
df.filter(lambda x: x['col_1'] > x['col_1'].mean(), by = 'col_2')
  • equivalent pandas code:
(df.groupby('col2')
   .apply(lambda x: x.loc[x['col_1'] > x['col_1'].mean(), :])
   .reset_index(drop = True)
   )

Why use tidypandas

tidypandas is for you if:

  • you frequently write data manipulation code using pandas
  • you prefer to have stay in pandas ecosystem (see accessor)
  • you prefer to remember a limited set of methods
  • you do not want to write (or be surprised by) reset_index, rename_axis often
  • you prefer writing free flowing, expressive code in dplyr style

tidypandas relies on the amazing pandas library and offers a consistent API with a different philosophy.

Presentation

Learn more about tidypandas (presentation)

Installation

  1. Install release version from Pypi using pip:

    pip install tidypandas
    
  2. For offline installation, use whl/tar file from the releases page on github.

Contribution/bug fixes/Issues:

  1. Open an issue/suggestion/bugfix on the github issues page.

  2. Use the master branch from github repo to submit your PR.