Fast Python for Data Science

Welcome to the code repository for the book Fast Python

Fast Python is your guide to optimizing every part of your Python-based data analysis process, from the pure Python code you write to managing the resources of modern hardware and GPUs. You'll learn to rewrite inefficient data structures, improve underperforming code with multithreading, and simplify your datasets without sacrificing accuracy.

Here you can find the code for the book. Here is a chapter-oriented roadmap. Given that the book is in early access, the repo is also under construction

Introduction

Extracting maximum performance from built-in features

Concurrency, parallelism, and asynchronous processing

Using NumPy more efficiently

Extracting maximum efficiency of hardware and networks

Re-implementing critical code with Cython

Memory hierarchy, storage and networking

Optimizing modern data processing libraries

High performance Pandas and Apache Arrow

Storing big data

Advanced topics

Data analysis using GPU computing

Analyzing big data with Dask

Appendixes

Setting up the environment

Using Numba to generate efficient low level code

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
00-docker		00-docker
02-python		02-python
03-concurrency		03-concurrency
04-numpy		04-numpy
05-cython		05-cython
06-hardware		06-hardware
07-pandas		07-pandas
08-persistence		08-persistence
09-gpu		09-gpu
10-dask		10-dask
attic		attic
shared		shared
.gitignore		.gitignore
README.md		README.md
cover.png		cover.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Python for Data Science

Introduction

Extracting maximum performance from built-in features

Concurrency, parallelism, and asynchronous processing

Using NumPy more efficiently

Extracting maximum efficiency of hardware and networks

Re-implementing critical code with Cython

Memory hierarchy, storage and networking

Optimizing modern data processing libraries

High performance Pandas and Apache Arrow

Storing big data

Advanced topics

Data analysis using GPU computing

Analyzing big data with Dask

Appendixes

Setting up the environment

Using Numba to generate efficient low level code

About

Languages

tiagoantao/python-performance

Folders and files

Latest commit

History

Repository files navigation

Fast Python for Data Science

Introduction

Extracting maximum performance from built-in features

Concurrency, parallelism, and asynchronous processing

Using NumPy more efficiently

Extracting maximum efficiency of hardware and networks

Re-implementing critical code with Cython

Memory hierarchy, storage and networking

Optimizing modern data processing libraries

High performance Pandas and Apache Arrow

Storing big data

Advanced topics

Data analysis using GPU computing

Analyzing big data with Dask

Appendixes

Setting up the environment

Using Numba to generate efficient low level code

About

Topics

Resources

Stars

Watchers

Forks

Languages