Releases: aws/aws-sdk-pandas
AWS Data Wrangler 2.5.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- New HTML tutorials #551
- Use bump2version for changing version numbers #573
- Mishandling of wildcard characters in read_parquet #564
Enhancements
- Support for
ExpectedBucketOwner
#562
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.4.0 (Docs updated)
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
New Functionalities
- Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
- S3 Upload/download files #506
- Include dataset BUCKETING for s3 datasets writing #443
- Enable Merge Upsert for existing Glue Tables on Primary Keys #503
- Support Requester Pays S3 Buckets #430
- Add botocore Config to wr.config #535
Enhancements
- Pandas 1.2.1 support #525
- Numpy 1.20.0 support
- Apache Arrow 3.0.0 support #531
- Python 3.9 support #454
Bug Fix
- Return DataFrame with unique index for Athena CTAS queries #527
- Remove unnecessary schema inference. #524
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana, @dragonH, @nikwerhypoport, @hwangji.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.4.0
New Functionalities
- Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
- S3 Upload/download files #506
- Include dataset BUCKETING for s3 datasets writing #443
- Enable Merge Upsert for existing Glue Tables on Primary Keys #503
- Support Requester Pays S3 Buckets #430
- Add botocore Config to wr.config #535
Enhancements
- Pandas 1.2.1 support #525
- Numpy 1.20.0 support
- Apache Arrow 3.0.0 support #531
- Python 3.9 support #454
Bug Fix
- Return DataFrame with unique index for Athena CTAS queries #527
- Remove unnecessary schema inference. #524
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.3.0
New Functionalities
- DynamoDB support #448
- SQLServer support (Driver must be installed separately) #356
- Excel files support #419 #509
- Amazon S3 Access Point support #393
- Amazon Chime initial support #494
- Write compressed CSV and JSON files on S3 #308 #359 #412
Enhancements
- Add query parameters for Athena #432
- Add metadata caching for Athena #461
- Add suffix filters for
s3.read_parquet_table()
#495
Bug Fix
- Fix
keep_files
behavior for failed Redshift COPY executions #505
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @danielwo, @jiteshsoni, @gvermillion, @rodalarcon, @imanebosch, @dwbelliston, @tochandrashekhar, @kylepierce, @njdanielsen, @jasadams, @gtossou, @JasonSanchez, @kokes, @hanan-vian @igorborgest.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.2.0
New Functionalities
- Add
aws_access_key_id
,aws_secret_access_key
,aws_session_token
andboto3_session
for Redshift copy/unload #484
Bug Fix
- Remove dtype print statement #487
Thanks
We thank the following contributors/users for their work on this release:
@danielwo, @thetimbecker, @njdanielsen, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 2.1.0
New Functionalities
- Add secretmanager module and support for databases connections #402
con = wr.redshift.connect(secret_id="my-secret", dbname="my-db")
df = wr.redshift.read_sql_query("SELECT ...", con=con)
con.close()
Bug Fix
- Fix connection attributes quoting for
wr.*.connect()
#481 - Fix parquet table append for nested struct columns #480
Thanks
We thank the following contributors/users for their work on this release:
@danielwo, @nmduarteus, @nivf33, @kinghuang, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 2.0.1
New Functionalities
- New wr.timestream.create_database() function
- New wr.timestream.create_table() function
- New wr.timestream.delete_database() function
- New wr.timestream.delete_table() function
- New
ignore_empty
argument to ignore 0 bytes files for:
Enhancements
- Automatically rollback in case of failed queries for:
Thanks
We thank the following contributors/users for their work on this release:
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 2.0.0
Breaking changes
sqlalchemy
andpsycopg2
dependencies replaced byredshift_connector
andpg8000
- All
wr.db.*
functions was distributed intowr.redshift.*
,wr.postgresql.*
andwr.mysql.*
(Tutorial) - Redshift COPY and UNLOAD function was refactored into
wr.redshift.*
(Tutorial) wr.catalog.get_engine()
was replaced bywr.redshift.connect()
,wr.postgresql.connect()
,wr.mysql.connect()
(Tutorial)
New Functionalities
Enhancements
- General performance improved for s3 I/O removing eventual consistency guardrails (Reference)
- Add retry with decorrelated jitter for Athena and Glue Catalog calls to overcome throttling in high concurrency scenarios.
Docs
- Updates regarding all new functionalities
- Add Amazon Timestream tutorial
- Add Amazon Timestream tutorial 2
AWS re:Invent related news
- AWS Lambda now supports up to 10 GB of memory and 6 vCPU cores
- Amazon S3 now delivers strong read-after-write consistency
- AWS Lambda now supports container images as a packaging format
- Serverless Batch Scheduling with AWS Batch and AWS Fargate
Thanks
We thank the following contributors/users for their work on this release:
@Brooke-white, @danielwo, @sapientderek, @pmleveque, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.10.1
New Functionalities
Enhancements
Bug Fix
- Fix Athena read with
ctas_approach=False
andchunksize=True
#458 - Fix overwriting for not enforced configs #450
Docs
Thanks
We thank the following contributors/users for their work on this release:
@tuannguyen0901, @bryanyang0528, @czagoni, @jesusch, @danielwo, @DonghanYang, @eric-valente, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.10.0
New Functionalities
- Add configurable Endpoint URL for AWS services #418
- Add global environment configuration for Athena workgroups #437
Enhancements
- Support for Apache Arrow 2.0.0 #436
- Allow Decimal to float casting for
wr.db.read_sql_query()
#431 - Allow unsafe conversions for
wr.db.read_sql_query()
#427
Bug Fix
- QuickSight functions now allow usernames with "/" #434
- Fix duplicated carriage return for
wr.s3.to_csv()
running on Windows platform.
Thanks
We thank the following contributors/users for their work on this release:
@martinSpears-ECS, @imanebosch, @Eric-He-98, @brombach, @Thomas-Hirsch, @vuchetichbalint, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!