Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-import from our Zenodo community #114

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ docs/29_algorithm_validation/ideas.ipynb
docs/29_algorithm_validation/solution for exercise - metrics to investigate segmentation results.ipynb
docs/22_feature_extraction/blobs_analysis.csv

Untitled*
Untitled*
notebooks/generate_link_lists.py
337 changes: 337 additions & 0 deletions notebooks/import_from_zenodo_communities.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4a41efb5-2f4a-41dd-98ba-eeecf01fdb68",
"metadata": {},
"source": [
"# Importing new entries from Zenodo communities\n",
"This notebook allows to import entries from Zenodo communities. It does not re-import entries that are already in our database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "22ccdd0e-5d18-448d-a343-1ab634771d26",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"import bia_bob\n",
"import shutil\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7065b8ba-f726-41f0-ab49-48c9767fcda1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'./generate_link_lists.py'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# workaround: Until our utilities are a python library, we need to copy it here.\n",
"shutil.copy('../scripts/generate_link_lists.py', './generate_link_lists.py')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a1863536-3d4e-4144-a86e-92eff0cca25a",
"metadata": {},
"outputs": [],
"source": [
"from generate_link_lists import load_dataframe\n",
"from generate_link_lists import update_yaml_file"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f6c69566-1d52-4ec0-8733-2cc11e002cac",
"metadata": {},
"outputs": [],
"source": [
"token = os.getenv('ZENODO_API_KEY')\n",
"community = 'nfdi4bioimage'\n",
"\n",
"response = requests.get('https://zenodo.org/api/records',\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no theoretical error. What about add some error test like API rate limits, or other HTTP-related errors?

maybe like:
try:
response = requests.get('https://zenodo.org/api/records',
params={'communities': community,
'access_token': token})

# Raises an HTTPError for bad responses
response.raise_for_status()  
online_data = response.json()

except requests.exceptions.RequestException as e:
print(f"Error fetching data: {e}")

# Ensuring online_data is defined in case of an error
online_data = {} 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

online_data = {}

If there is no online data, running the rest of the code is pointless.

But I agree some error handling would be nice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also mark this as viewed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I never marked any files as viewed. Let me know what it does to the file :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this, viewed and approve

" params={'communities': community,\n",
" 'access_token': token})"
]
},
{
"cell_type": "markdown",
"id": "ecd9050d-8dcd-4b69-9d1f-00fa1a37db80",
"metadata": {},
"source": [
"## That's what's listed in the community"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "efa9ffa0-159c-4996-a6bd-c5e1f4197c2a",
"metadata": {},
"outputs": [],
"source": [
"online_data = response.json()\n",
"hits = online_data[\"hits\"][\"hits\"]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1c59f531-ab9d-480a-81a2-7e5dbb9f8609",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(hits)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "28cbcc3a-a43c-47ee-b249-30032957411a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['https://zenodo.org/records/11548617',\n",
" 'https://zenodo.org/records/11501662',\n",
" 'https://zenodo.org/records/11350689',\n",
" 'https://zenodo.org/records/11235513',\n",
" 'https://zenodo.org/records/11109616',\n",
" 'https://zenodo.org/records/11031747',\n",
" 'https://zenodo.org/records/10939520',\n",
" 'https://zenodo.org/records/10886750',\n",
" 'https://zenodo.org/records/10808486',\n",
" 'https://zenodo.org/records/10793700',\n",
" 'https://zenodo.org/records/10730424',\n",
" 'https://zenodo.org/records/10687659',\n",
" 'https://zenodo.org/records/10389955',\n",
" 'https://zenodo.org/records/10083555',\n",
" 'https://zenodo.org/records/10008465',\n",
" 'https://zenodo.org/records/8414319',\n",
" 'https://zenodo.org/records/8349563',\n",
" 'https://zenodo.org/records/8340248',\n",
" 'https://zenodo.org/records/8329306',\n",
" 'https://zenodo.org/records/8139354',\n",
" 'https://zenodo.org/records/8070038',\n",
" 'https://zenodo.org/records/8019760',\n",
" 'https://zenodo.org/records/7928333',\n",
" 'https://zenodo.org/records/7890311',\n",
" 'https://zenodo.org/records/7394675']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urls = [u[\"links\"][\"self_html\"] for u in hits]\n",
"urls"
]
},
{
"cell_type": "markdown",
"id": "658e8be1-37bd-4830-8e0d-4a6d6b79452d",
"metadata": {},
"source": [
"## Checking what we already have"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1bba931f-1766-4b4c-b700-9550efdc0fec",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Adding blog_posts.yml\n",
"Adding events.yml\n",
"Adding imported.yml\n",
"Adding materials.yml\n",
"Adding nfdi4bioimage.yml\n",
"Adding papers.yml\n",
"Adding workflow-tools.yml\n",
"Adding youtube_channels.yml\n"
]
}
],
"source": [
"df = load_dataframe(\"../resources/\")\n",
"\n",
"all_urls = str(df[\"url\"].tolist())\n",
"#all_urls"
]
},
{
"cell_type": "markdown",
"id": "5183ae2c-5144-4bff-a08f-d7a81f89ccef",
"metadata": {},
"source": [
"## Identifying entries we are missing yet"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4d91a514-be3f-4629-85fa-fda961ebe275",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['https://zenodo.org/records/11501662',\n",
" 'https://zenodo.org/records/11350689',\n",
" 'https://zenodo.org/records/11235513',\n",
" 'https://zenodo.org/records/11031747',\n",
" 'https://zenodo.org/records/10939520',\n",
" 'https://zenodo.org/records/10886750',\n",
" 'https://zenodo.org/records/10808486',\n",
" 'https://zenodo.org/records/10793700',\n",
" 'https://zenodo.org/records/10730424',\n",
" 'https://zenodo.org/records/10687659',\n",
" 'https://zenodo.org/records/10389955',\n",
" 'https://zenodo.org/records/8414319',\n",
" 'https://zenodo.org/records/8349563',\n",
" 'https://zenodo.org/records/8340248',\n",
" 'https://zenodo.org/records/8139354',\n",
" 'https://zenodo.org/records/8070038',\n",
" 'https://zenodo.org/records/8019760',\n",
" 'https://zenodo.org/records/7928333',\n",
" 'https://zenodo.org/records/7890311',\n",
" 'https://zenodo.org/records/7394675']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_urls = []\n",
"for url in urls:\n",
" if url not in all_urls:\n",
" new_urls.append(url)\n",
"\n",
"new_urls"
]
},
{
"cell_type": "markdown",
"id": "11fe5ad3-b68f-46cb-b9d6-3c9e6875cb64",
"metadata": {},
"source": [
"## Saving new entries"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4eee8bc1-9d9e-481c-b3ac-5867407eed96",
"metadata": {},
"outputs": [],
"source": [
"with open('../resources/imported.yml', 'a') as file:\n",
" for url in new_urls:\n",
" file.write(\"- url: \" + url + '\\n')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c3e92c51-0b07-4b6b-a095-91b79252f96c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://zenodo.org/api/records/11548617\n",
"https://zenodo.org/api/records/11501662\n",
"https://zenodo.org/api/records/11350689\n",
"https://zenodo.org/api/records/11235513\n",
"https://zenodo.org/api/records/11031747\n",
"https://zenodo.org/api/records/10939520\n",
"https://zenodo.org/api/records/10886750\n",
"https://zenodo.org/api/records/10808486\n",
"https://zenodo.org/api/records/10793700\n",
"https://zenodo.org/api/records/10730424\n",
"https://zenodo.org/api/records/10687659\n",
"https://zenodo.org/api/records/10389955\n",
"https://zenodo.org/api/records/8414319\n",
"https://zenodo.org/api/records/8349563\n",
"https://zenodo.org/api/records/8340248\n",
"https://zenodo.org/api/records/8139354\n",
"https://zenodo.org/api/records/8070038\n",
"https://zenodo.org/api/records/8019760\n",
"https://zenodo.org/api/records/7928333\n",
"https://zenodo.org/api/records/7890311\n",
"https://zenodo.org/api/records/7394675\n"
]
}
],
"source": [
"update_yaml_file(\"../resources/imported.yml\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8fa55de8-b51a-45ce-934c-75c090166dab",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading