-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISCUSSION: Propose topics for pandas tutorials #140
Comments
I'm thinking something along data cleansing - we can start with a real-world example of a messy dataset (with duplicated rows, missing values, unnecessary columns/rows...) and end up with a tidy one. I think this could be useful for people who are using pandas to clean their dataset, especially when the data gets too large for software to handle that it ends up slowing down their process. However I can imagine that there are many ways to define what a messy dataset is, and since we're looking to address a specific problem, we might end up trying to solve too many problems at once. I did run a workshop on this topic (notebook here, though it's in Indonesian) and we covered duplicated rows, missing values, removing columns/rows, and renaming column names on one real-world dataset. Would love to hear all your thoughts on whether this use case is worth having a tutorial or not. Looking forward to discussing all other use cases as well. |
@datapythonista In text Preprocessing, pandas plays a big role in giving some structure to the data. It's blissful to simply apply functions along columns. @galuhsahid I think it's a good idea to use a real world dataset, and the use case is worth it from my perspective. |
I agree that an end-end tutorial is always better. Also, as mentioned by @WuraolaOyewusi showing pandas usecase on text Preprocessing will be another good usecase. Most tutorials we see for Pandas cover numerical analysis, text analysis tutorial will be a plus. |
In the pandas documentation, we would like to add tutorials that cover end to end real use cases of pandas. This should make things very easy for first time users trying to address a specific problem with pandas.
Based on my personal experience, those are the kind of problems I usually address:
I'm sure people is doing other cool things with pandas, would be great to brainstorm and find more use cases, that are worth having a tutorial.
The text was updated successfully, but these errors were encountered: