-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.rmd
77 lines (49 loc) · 2.91 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
output: md_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
![](docs/logo.png)
[![PyPI version](https://badge.fury.io/py/tidypandas.svg)](https://badge.fury.io/py/tidypandas)
# `tidypandas`
> A **grammar of data manipulation** for [pandas](https://pandas.pydata.org/docs/index.html) inspired by [tidyverse](https://tidyverse.tidyverse.org/)
`tidypandas` python package provides *minimal, pythonic* API for common data manipulation tasks:
- `tidyframe` class (wrapper over pandas dataframe) provides a dataframe with simplified index structure (no more resetting indexes and multi indexes)
- Consistent 'verbs' (`select`, `arrange`, `distinct`, ...) as methods to `tidyframe` class which mostly return a `tidyframe`
- Unified interface for summarizing (aggregation) and mutate (assign) operations across groups
- Utilites for pandas dataframes and series
- Uses simple python data structures, No esoteric classes, No pipes, No Non-standard evaluation
- No copy data conversion between `tidyframe` and pandas dataframes
- An accessor to apply `tidyframe` verbs to simple pandas datarames
- ...
## Example
- `tidypandas` code:
```{python, eval = FALSE}
df.filter(lambda x: x['col_1'] > x['col_1'].mean(), by = 'col_2')
```
- equivalent pandas code:
```{python, eval = FALSE}
(df.groupby('col2')
.apply(lambda x: x.loc[x['col_1'] > x['col_1'].mean(), :])
.reset_index(drop = True)
)
```
## Why use `tidypandas`
`tidypandas` is for you if:
- you *frequently* write data manipulation code using pandas
- you prefer to have stay in pandas ecosystem (see accessor)
- you *prefer* to remember a [limited set of methods](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428)
- you do not want to write (or be surprised by) [`reset_index`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html), [`rename_axis`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename_axis.html) often
- you prefer writing free flowing, expressive code in [dplyr](https://dplyr.tidyverse.org/) style
> `tidypandas` relies on the amazing `pandas` library and offers a consistent API with a different [philosophy](https://tidyverse.tidyverse.org/articles/manifesto.html).
## Presentation
Learn more about tidypandas ([presentation](https://github.com/talegari/tidypandas/blob/master/docs/tp_pres.html))
## Installation
1. Install release version from Pypi using pip:
pip install tidypandas
2. For offline installation, use whl/tar file from the [releases page](https://github.com/talegari/tidypandas/releases) on github.
## Contribution/bug fixes/Issues:
1. Open an issue/suggestion/bugfix on the github [issues](https://github.com/talegari/tidypandas/issues) page.
2. Use the master branch from [github](https://github.com/talegari/tidypandas) repo to submit your PR.
------------------------------------------------------------------------