Skip to content

Commit

Permalink
update schemas and links
Browse files Browse the repository at this point in the history
  • Loading branch information
ArthurHeitmann committed May 6, 2024
1 parent cce36d5 commit 4ec369d
Show file tree
Hide file tree
Showing 13 changed files with 17,205 additions and 1,462 deletions.
12 changes: 1 addition & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ Interact with the data through large dumps, an API or web interface.

## Downloads

### New dumps

All download links are organized [here](./download_links.md). Once a new dump is available, it will
also be added on the [releases page](https://github.com/ArthurHeitmann/arctic_shift/releases).
Expand All @@ -15,15 +14,6 @@ Alternatively for downloading data of users or smaller subreddits, you can use [

For information on how the data was collected and modified, see [here](./file_content_explanations.md).

### Original dumps

These dumps are available thanks to Pushshift.

- [2005-06 - 2022-12 (academic torrents)](https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee)
- [2023-01 (academic torrents)](https://academictorrents.com/details/c861d265525c488a9439fb874bd9c3fc38dcdfa5)
- [2023-02 (academic torrents)](https://academictorrents.com/details/9971c68d2909843a100ae955c6ab6de3e09c04a1)
- [2023-03 (archive.org)](https://archive.org/details/pushshift-reddit-2023-03/)

## API

Depending on your use case, you can try my (limited) [API](./api). For manual queries, you can use [this tool](https://arctic-shift.photon-reddit.com/search).
Expand All @@ -36,7 +26,7 @@ Generally I'd recommend to work with the compressed files instead of unpacking t
course you have seemingly infinite disk space.

With the helper scripts in this repository you can quickly get started. If you don't want to
use those files or want to use a CLI tool, head over to the [zst_blocks repository](https://github.com/ArthurHeitmann/zst_blocks_format).
use those files or want to use a CLI tool, head over to the [zst_blocks repository](https://github.com/ArthurHeitmann/zst_blocks_format) for processing .zst_blocks files.

For using the helper scripts:

Expand Down
8 changes: 5 additions & 3 deletions download_links.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ https://drive.filen.io/f/fb67389b-2eb2-42e8-9d2f-474ca153e105#cgZ5eW2NWXuS9n9rVh

## Torrents

Torrents are available either in the original zst_blocks format or as recompressed zst files. Zst files are reformatted and updated with additional data sources. For question specifically regarding zst files, ask [u/Watchful1](https://www.reddit.com/user/Watchful1/).
Torrents are available either in the original zst_blocks format or as recompressed zst files. Zst files up to 2024-03 are reformatted and updated with additional data sources. For question specifically regarding zst files, ask [u/Watchful1](https://www.reddit.com/user/Watchful1/). Starting from 2024-04, archives will only be released as zst files by me.

Please seed the torrents for as long as possible. Shortly after release downloads will be a bit slow.

Expand All @@ -30,8 +30,10 @@ Please seed the torrents for as long as possible. Shortly after release download
| 2023-11 | [Academic Torrents (superseded)](https://academictorrents.com/details/aee7728b787892d3cce4d6df3c86c2728e2be1d7) | [Academic Torrents](https://academictorrents.com/details/425b791647cdb2752f921351828452ca8e09aef8) |
| 2023-12 | - | [Academic Torrents](https://academictorrents.com/details/0d0364f8433eb90b6e3276b7e150a37da8e4a12b) |
| 2005-06 - 2023-12 | [Academic Torrents](https://academictorrents.com/details/9c263fc85366c1ef8f5bb9da0203f4c8c8db75f4) | - |
| 2023-01 | [Academic Torrents](https://academictorrents.com/details/ac88546145ca3227e2b90e51ab477c4527dd8b90) | [Academic Torrents](https://academictorrents.com/details/c440a293602270f03a47e3110a174365b965a093) |
| 2023-02 | [Academic Torrents](https://academictorrents.com/details/5969ae3e21bb481fea63bf649ec933c222c1f824) | [Academic Torrents](https://academictorrents.com/details/1dc131c38d09d8f3912a0040a9a7434ffccc1c78) |
| 2024-01 | [Academic Torrents](https://academictorrents.com/details/ac88546145ca3227e2b90e51ab477c4527dd8b90) | [Academic Torrents](https://academictorrents.com/details/c440a293602270f03a47e3110a174365b965a093) |
| 2024-02 | [Academic Torrents](https://academictorrents.com/details/5969ae3e21bb481fea63bf649ec933c222c1f824) | [Academic Torrents](https://academictorrents.com/details/1dc131c38d09d8f3912a0040a9a7434ffccc1c78) |
| 2024-03 | [Academic Torrents](https://academictorrents.com/details/deef710de36929e0aa77200fddda73c86142372c) | [Academic Torrents](https://academictorrents.com/details/ca989aa94cbd0ac5258553500d9b0f3584f6e4f7) |
| 2024-04 | [Academic Torrents](https://academictorrents.com/details/ad4617a3e9c1f52405197fc088b28a8018e12a7a) | discontinued |

### Other

Expand Down
5 changes: 5 additions & 0 deletions file_content_explanations.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,8 @@ the new content are merged, in the following way:
- If a thing could not be retrieved a 2nd time (usually because the subreddit was banned),
`_meta.note` is set to `"no_2nd_retrieval"`
- If a thing was initially unavailable, but now is, `_meta.note` is set to `"initially_unavailable"`

## 2024-04+

Archives will now only be released as .zst files, now that I'm changing by database and API architecture,
and am no longer using .zst_blocks files.
2 changes: 1 addition & 1 deletion schemas/RC.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ interface RedditComment {
_meta?: {
is_edited?: boolean,
note?: "no_2nd_retrieval"|"initially_unavailable",
removal_type?: "deleted"|"removed",
removal_type?: "deleted"|"removed"|"removed by reddit",
retrieved_2nd_on?: number,
was_deleted_later?: boolean,
was_initially_deleted?: boolean,
Expand Down
2 changes: 1 addition & 1 deletion schemas/RC/2024.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ interface RedditComment_2024 {
_meta: {
is_edited?: boolean,
note?: "no_2nd_retrieval"|"initially_unavailable",
removal_type?: "deleted"|"removed",
removal_type?: "deleted"|"removed"|"removed by reddit",
retrieved_2nd_on?: number,
was_deleted_later?: boolean,
was_initially_deleted?: boolean,
Expand Down
Loading

0 comments on commit 4ec369d

Please sign in to comment.