Releases: ArthurHeitmann/arctic_shift
2024 February
Download options:
- Torrent
- Speeds initially might not be as fast
- original .zst_blocks files: https://academictorrents.com/details/1dc131c38d09d8f3912a0040a9a7434ffccc1c78
- .zst re-encoded: https://academictorrents.com/details/5969ae3e21bb481fea63bf649ec933c222c1f824
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes:
RC_2024-02.zst_blocks | ab1e21a622e856d334f7e22d3b2270749f988c7378eddc40dad8f33a98b49e0e |
RS_2024-02.zst_blocks | 37dcace0c0597a3bc55d35617b80ce615df8bd9ccbaddb6ac3d6ba970264c5c4 |
Subreddits 2024-01
Information and statistics of 18 million subreddits, retrieved in January 2024.
Of those, 2 million were no longer available (private, banned, quarantined, etc.). Those are separate in subreddits_meta_only_2024-01.zst and only contain the name, id, potentially subscribers and statistics.
Statistics contain aggregate information from the pushshift and arctic shift datasets: date of earliest post & comment, number of posts & comments, when that data was last updated.
JSON schemas are available here.
Download options:
- Torrent https://academictorrents.com/details/c902f4b65f0e82a5e37db205c3405f02a028ecdf
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 hashes:
subreddits_2024-01.zst | 5088dc88977820fdd3e42a6a8a4c8dbcd674831ab34eb833f2ebc8101160917b |
subreddits_meta_only_2024-01.zst | 0f21248ca6d3a19d7f9cdfe0fced8f936c544622508b581ee10e5149a1898148 |
2024 January
Download options:
- Torrent
- Speeds initially might not be as fast
- original .zst_blocks files: https://academictorrents.com/details/c440a293602270f03a47e3110a174365b965a093
- .zst re-encoded: https://academictorrents.com/details/ac88546145ca3227e2b90e51ab477c4527dd8b90
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes:
RC_2024-01.zst_blocks | 5968e4bf0356b41a70d78b5e478d8d70eaf6bee5ccb9398fa3135037646097a9 |
RS_2024-01.zst_blocks | cd69c33348eb45a17f8c0b3ae6f6ce2afde1394935d26caacbc60e5d60658031 |
2023 December
Download options:
- Torrent
- Speeds initially might not be as fast
- original .zst_blocks files: https://academictorrents.com/details/0d0364f8433eb90b6e3276b7e150a37da8e4a12b
- .zst re-encoded: https://academictorrents.com/details/9c263fc85366c1ef8f5bb9da0203f4c8c8db75f4
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes:
RC_2023-12.zst_blocks | 064cea33a0cf82e474a0e7d5d62232c4148f5e7634e33516ff7ce8b6c12514e2 |
RS_2023-12.zst_blocks | 0b8cae80418d3b992403db50e0b99cd62a8e7a68153489ea6aeab0245468c314 |
2023 November
Download options:
- Torrent
- Speeds initially might not be as fast
- original .zst_blocks files: https://academictorrents.com/details/425b791647cdb2752f921351828452ca8e09aef8
- .zst re-encoded: https://academictorrents.com/details/aee7728b787892d3cce4d6df3c86c2728e2be1d7
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes:
RC_2023-11.zst_blocks | ac0e7e297e9f2f65d002300b1c95c28a0558cfa917495e99dc5a8ad55a69bc70 |
RS_2023-11.zst_blocks | bb7c13033db49bb19910e5bbbe9ffe93f863f172eaf0c68764b0e5ed6098b41e |
All data is now being archived a second time and merged with the original. As a results fields like score
or num_comments
are now more accurate. For all new changes, see here.
2023 October
Download options:
- Torrent
- Speeds initially might not be as fast
- original .zst_blocks files: https://academictorrents.com/details/52e18b6a61f243e6ae42a1f2fc8aaf9fd9c9dbdb
- .zst re-encoded: https://academictorrents.com/details/9a3f77cf1b16f064b8f82e75ee8d470b49c90512
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes:
RC_2023-10.zst_blocks | ff538290b85ea26025bdb7f8a234c8cdcf4561ad1f3c80c14c64e9caa096e70a |
RS_2023-10.zst_blocks | fdcd1a591b8841b80d27d367e1aad55ee38bcf8db9d49fe7fe114a8a9289ced0 |
2023 September + revised comments
Download options:
- Torrent: https://academictorrents.com/details/7810d20b3651c0060cb670032ec33818230f654d
- Speeds initially might not be as fast
- Direct download https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVhdnTkPDmZAeuOhk
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA256 Hashes (all files, old and new):
RC_2023-04.zst_blocks | 29739c03ffd2f4364fb278a7b3f935f607926374f9525e99aaefdc3941a777c5 |
RC_2023-05.zst_blocks | c19a5d47728963b2fc1dc6f9a2a7db4bf348dff4108ce0e5ca9a58fec5d5b864 |
RC_2023-06.zst_blocks | 0812eb2364f77c3c86a08f8f92684f4c4e5a954ece1ceae704295112a3d0dbba |
RC_2023-07.zst_blocks | d1c46b9ad00a0455548b1f399b405b7d57afed0bb2e73c3c6754a338e4d1a48e |
RC_2023-08.zst_blocks | 21ead74fe2eb36f9a915356d6b822d16bade5360bda76b1591598c956d6fdbff |
RC_2023-09.zst_blocks | e8edb4ce8593fd66921a27e660cc43f8c1b600d6e144f010abca0150d47ca24a |
RS_2023-04.zst_blocks | bdde9571668c7c88e1d630fb9c3d5064ba6a6a88119fbf2d0a42abef6ccd5c74 |
RS_2023-05.zst_blocks | 6e662c2933513bcc12911b1e33decec9e269e9484e197965641d4bf7c0b18c14 |
RS_2023-06.zst_blocks | 13d0ffd35ea31248430ecf33cd4006525265b33ed28ff2b54e58934b288af3d4 |
RS_2023-07.zst_blocks | 9e770eea7bd0dbc3879a3e98be2a1043d506ffcde2b774dab7c87e48579fdeef |
RS_2023-08.zst_blocks | ee6fdd7cd6ac520d7ddcf98de9c0d1e72ecb8cca8cdd23d6e7c7c2197e970a80 |
RS_2023-09.zst_blocks | 1cab685cb0fd14381ecb47827f28d70becb57c42693ef6040559ec8081076752 |
I've (again) made some small changes to the previous files. Affected are:
- RC_2023-04.zst_blocks
- RC_2023-05.zst_blocks
- RC_2023-06.zst_blocks
- RC_2023-07.zst_blocks
- RC_2023-08.zst_blocks
- RS_2023-07.zst_blocks
For the comments 04, 05, 06, 08 about 0.1% were duplicate. For 07 11% were duplicate. For RS_2023-07 I just changed the sort order for consistency, from sorted by ID to sorted first by created_utc and then ID. If you've previously downloaded these files and don't care if there's a small number of duplicate records, you can ignore this.
2023 April, May, June, July, August
Download instructions:
- Download from here
- Download one file at a time, using the "Normal download" option (not "ZIP download" or "Download folder", those are too slow)
SHA 256 hashes:
RC_2023-04.zst_blocks | 54b2c0014174579bc7df663dc1b1071d9f6327992c57712070a6876a1f489921 |
RC_2023-05.zst_blocks | a380c39ccde8627909848d42b39cf113d803c07be09e9613c8bba9a7913280f5 |
RC_2023-06.zst_blocks | c3cdbee87dc20f9b39991459d1994a61e5efd66bc0f13379f6786a718b4782f5 |
RC_2023-07.zst_blocks | 165a6b07afa262ce54428af83f26ae5750eede81dbdbfb83718713635eb8d0ff |
RC_2023-08.zst_blocks | b434eab9ed21097b2039cd79095f7a0d46200bf08a075baae6c7e5656e23b6cc |
RS_2023-04.zst_blocks | bdde9571668c7c88e1d630fb9c3d5064ba6a6a88119fbf2d0a42abef6ccd5c74 |
RS_2023-05.zst_blocks | 6e662c2933513bcc12911b1e33decec9e269e9484e197965641d4bf7c0b18c14 |
RS_2023-06.zst_blocks | 13d0ffd35ea31248430ecf33cd4006525265b33ed28ff2b54e58934b288af3d4 |
RS_2023-07.zst_blocks | 55b920a65d3314bdc5bb0e0cdd174751a9d68495b4ae205aaba7078e7aa9117c |
RS_2023-08.zst_blocks | ee6fdd7cd6ac520d7ddcf98de9c0d1e72ecb8cca8cdd23d6e7c7c2197e970a80 |
Fixes from previous release:
- The objected are sorted by
["created_utc", "id"]
&
,<
,>
have been replaced with&
,<
and>
- Removed trailing new line characters
About 30 million unavailable, partially deleted or fully deleted comments were recovered with data from before the reddit blackouts. Big thank you to FlyingPackets for providing that data.
2023-07 (DEPRECATED)
There are some issues with this data that have been fixed in the following release
Download from here (normal download one file at a time, not the entire folder, to speed things up).
SHA 256 hashes:
RS_2023-07.zst_blocks | 6ad9c550f2065a26e8b75a0a2196238687cb86a6727053b401833105e4e58012 |
RC_2023-07.zst_blocks | 162b7a5a59fcdc10ee3ec21a7c1b8cf9fa9350abe028ab29ddceaa507bc9f95b |
Posts info:
- total rows: 44216130
- uncompressed size: 205775254685
Comments info:
- total rows: 243749352
- uncompressed size: 426159816976