Bumping version to 0.3.0

aws · Feb 4, 2020 · ee1809a · ee1809a
1 parent f045512
commit ee1809a
Show file tree

Hide file tree

Showing 37 changed files with 8,620 additions and 112 deletions.
diff --git a/README.md b/README.md
@@ -2,73 +2,74 @@
 
 > DataFrames on AWS
 
-[![Release](https://img.shields.io/badge/release-0.2.6-brightgreen.svg)](https://pypi.org/project/awswrangler/)
+[![Release](https://img.shields.io/badge/release-0.3.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
 [![Downloads](https://img.shields.io/pypi/dm/awswrangler.svg)](https://pypi.org/project/awswrangler/)
 [![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7-brightgreen.svg)](https://pypi.org/project/awswrangler/)
 [![Documentation Status](https://readthedocs.org/projects/aws-data-wrangler/badge/?version=latest)](https://aws-data-wrangler.readthedocs.io/en/latest/?badge=latest)
 [![Coverage](https://img.shields.io/badge/coverage-89%25-brightgreen.svg)](https://pypi.org/project/awswrangler/)
 [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/awslabs/aws-data-wrangler.svg)](http://isitmaintained.com/project/awslabs/aws-data-wrangler "Average time to resolve an issue")
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 
-**[Read the Docs!](https://aws-data-wrangler.readthedocs.io)**
+## [Read the Docs](https://aws-data-wrangler.readthedocs.io)
 
-**[Read the Tutorials](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials): [Catalog & Metadata](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/catalog_and_metadata.ipynb) | [Athena Nested](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/athena_nested.ipynb) | [S3 Write Modes](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/s3_write_modes.ipynb)**
+## [Read the Tutorials](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials)
+- [Catalog & Metadata](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/catalog_and_metadata.ipynb)
+- [Athena Nested](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/athena_nested.ipynb)
+- [S3 Write Modes](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/s3_write_modes.ipynb)
 
----
-
-*Contents:* **[Use Cases](#Use-Cases)** | **[Installation](#Installation)** | **[Examples](#Examples)** | **[Diving Deep](#Diving-Deep)** | **[Step By Step](#Step-By-Step)** | **[Contributing](#Contributing)**
-
----
+## Contents
+- [Use Cases](#Use-Cases)
+- [Installation](#Installation)
+- [Examples](#Examples)
+- [Diving Deep](#Diving-Deep)
+- [Step By Step](#Step-By-Step)
+- [Contributing](#Contributing)
 
 ## Use Cases
 
 ### Pandas
 
-* Pandas -> Parquet (S3) (Parallel)
-* Pandas -> CSV (S3) (Parallel)
-* Pandas -> Glue Catalog Table
-* Pandas -> Athena (Parallel)
-* Pandas -> Redshift (Append/Overwrite/Upsert) (Parallel)
-* Pandas -> Aurora (MySQL/PostgreSQL) (Append/Overwrite) (Via S3) (NEW :star:)
-* Parquet (S3) -> Pandas (Parallel)
-* CSV (S3) -> Pandas (One shot or Batching)
-* Glue Catalog Table -> Pandas (Parallel)
-* Athena -> Pandas (One shot, Batching or Parallel)
-* Redshift -> Pandas (Parallel)
-* CloudWatch Logs Insights -> Pandas
-* Aurora -> Pandas (MySQL) (Via S3) (NEW :star:)
-* Encrypt Pandas Dataframes on S3 with KMS keys
-* Glue Databases Metadata -> Pandas (Jupyter output compatible)
-* Glue Table Metadata -> Pandas (Jupyter output compatible)
+| FROM                     | TO              | Features                                                                                                                                                                                                                           |
+|--------------------------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Pandas DataFrame         | Amazon S3       | Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes,<br>KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto)                                                                          |
+| Amazon S3                | Pandas DataFrame| Parquet (Pushdown filters), CSV, Partitions, Parallelism,<br>KMS Encryption, Multiple files                                                                                                                                        |
+| Amazon Athena            | Pandas DataFrame| Workgroups, S3 output path, Encryption, and two different engines:<br><br>- ctas_approach=False **->** Batching and restrict memory environments<br>- ctas_approach=True  **->** Blazing fast, parallelism and enhanced data types |
+| Pandas DataFrame         | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenes<br>Append/Overwrite/Upsert modes                                                                                                                                       |
+| Amazon Redshift          | Pandas DataFrame| Blazing fast using parallel parquet on S3 behind the scenes                                                                                                                                                                        |
+| Pandas DataFrame         | Amazon Aurora   | Supported engines: MySQL, PostgreSQL<br>Blazing fast using parallel CSV on S3 behind the scenes<br>Append/Overwrite modes                                                                                                          |
+| Amazon Aurora            | Pandas DataFrame| Supported engines: MySQL<br>Blazing fast using parallel CSV on S3 behind the scenes                                                                                                                                                |
+| CloudWatch Logs Insights | Pandas DataFrame| Query results                                                                                                                                                                                                                      |
+| Glue Catalog             | Pandas DataFrame| List and get Tables details. Good fit with Jupyter Notebooks.                                                                                                                                                                      |
 
 ### PySpark
 
-* PySpark -> Redshift (Parallel)
-* Register Glue table from Dataframe stored on S3
-* Flatten nested DataFrames
+| FROM                        | TO                        | Features                                                                                 |
+|-----------------------------|---------------------------|------------------------------------------------------------------------------------------|
+| PySpark DataFrame           | Amazon Redshift            | Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes |
+| PySpark DataFrame           | Glue Catalog              | Register Parquet or CSV DataFrame on Glue Catalog                                        |
+| Nested PySpark<br>DataFrame | Flat PySpark<br>DataFrames| Flatten structs and break up arrays in child tables                                      |
 
 ### General
 
-* List S3 objects (Parallel)
-* Delete S3 objects (Parallel)
-* Delete listed S3 objects (Parallel)
-* Delete NOT listed S3 objects (Parallel)
-* Copy listed S3 objects (Parallel)
-* Get the size of S3 objects (Parallel)
-* Get CloudWatch Logs Insights query results
-* Load partitions on Athena/Glue table (repair table)
-* Create EMR cluster (For humans)
-* Terminate EMR cluster
-* Get EMR cluster state
-* Submit EMR step(s) (For humans)
-* Get EMR step state
-* Get EMR step state
-* Athena query to receive the result as python primitives (*Iterable[Dict[str, Any]*)
-* Load and Unzip SageMaker jobs outputs
-* Load and Unzip SageMaker models
-* Redshift -> Parquet (S3)
-* Aurora -> CSV (S3) (MySQL) (NEW :star:)
-* Get Glue Metadata
+| Feature                                     | Details                             |
+|---------------------------------------------|-------------------------------------|
+| List S3 objects                             | e.g. wr.s3.list_objects("s3://...") |
+| Delete S3 objects                           | Parallel                            |
+| Delete listed S3 objects                    | Parallel                            |
+| Delete NOT listed S3 objects                | Parallel                            |
+| Copy listed S3 objects                      | Parallel                            |
+| Get the size of S3 objects                  | Parallel                            |
+| Get CloudWatch Logs Insights query results  |                                     |
+| Load partitions on Athena/Glue table        | Through "MSCK REPAIR TABLE"         |
+| Create EMR cluster                          | "For humans"                        |
+| Terminate EMR cluster                       | "For humans"                        |
+| Get EMR cluster state                       | "For humans"                        |
+| Submit EMR step(s)                          | "For humans"                        |
+| Get EMR step state                          | "For humans"                        |
+| Query Athena to receive python primitives   | Returns *Iterable[Dict[str, Any]*   |
+| Load and Unzip SageMaker jobs outputs       |                                     |
+| Dump Amazon Redshift as Parquet files on S3 |                                     |
+| Dump Amazon Aurora as CSV files on S3       | Only for MySQL engine               |
 
 ## Installation
 

diff --git a/awswrangler/__version__.py b/awswrangler/__version__.py
@@ -1,4 +1,4 @@
 __title__ = "awswrangler"
 __description__ = "DataFrames on AWS."
-__version__ = "0.2.6"
+__version__ = "0.3.0"
 __license__ = "Apache License 2.0"
diff --git a/awswrangler/pandas.py b/awswrangler/pandas.py
@@ -831,7 +831,7 @@ def _cast_pandas(dataframe: pd.DataFrame, cast_columns: Dict[str, str]) -> pd.Da
             elif pandas_type == "date":
                 dataframe[col] = pd.to_datetime(dataframe[col]).dt.date.replace(to_replace={pd.NaT: None})
             else:
-                dataframe[col] = dataframe[col].astype(pandas_type, skipna=True)
+                dataframe[col] = dataframe[col].astype(pandas_type)
         return dataframe
 
     @staticmethod

diff --git a/docs/source/api/awswrangler.dynamodb.rst b/docs/source/api/awswrangler.dynamodb.rst
@@ -0,0 +1,7 @@
+awswrangler.dynamodb module
+===========================
+
+.. automodule:: awswrangler.dynamodb
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/awswrangler.rst b/docs/source/api/awswrangler.rst
@@ -10,6 +10,7 @@ Submodules
    awswrangler.aurora
    awswrangler.cloudwatchlogs
    awswrangler.data_types
+   awswrangler.dynamodb
    awswrangler.emr
    awswrangler.exceptions
    awswrangler.glue