Welcome to the code repository for the book Fast Python
Fast Python is your guide to optimizing every part of your Python-based data analysis process, from the pure Python code you write to managing the resources of modern hardware and GPUs. You'll learn to rewrite inefficient data structures, improve underperforming code with multithreading, and simplify your datasets without sacrificing accuracy.
Here you can find the code for the book. Here is a chapter-oriented roadmap. Given that the book is in early access, the repo is also under construction
- Profiling applications with both IO and computing workloads
- Profiling code to detect performance bottlenecks
- Optimizing basic data structures for speed: lists, sets, dictionaries
- Finding excessive memory allocation
- Using laziness and generators for big-data pipelining
- Writing the scaffold of an asynchronous server
- Implementing the first MapReduce engine
- Implementing a concurrent version of a MapReduce engine
- Using multi-processing to implement MapReduce
- Tying it all together: an asynchronous multi-threaded and multi-processing MapReduce server
- Understanding NumPy from a performance perspective
- Using array programming
- Tuning NumPy's internal architecture for performance
- A whirlwind tour of Cython
- Profiling Cython code
- Optimizing array access with Cython memoryviews
- Writing NumPy generalized universal functions in Cython
- Advanced array access in Cython
- Parallelism in Cython
- How modern hardware architectures impact Python performance
- Efficient data storage with Blosc
- Accelerating NumPy with NumExpr
- The performance implications of using the local network
- Optimizing memory and time when loading data
- Techniques to increase data analysis speed
- Pandas on top of NumPy, Cython and NumExpr
- Reading data into Pandas with Arrow
- Using Arrow interop to delegate work to moere efficient languages and systems
- A unified interface for file access: fsspec
- Parquet: an efficient format to store columnar data
- Dealing with larger than memory datasets the old-fashioned way
- Zarr for large array persistence