This script renames PDF files based on their citation in Chicago bibliography style and adds the citation to the PDF metadata. It utilizes OpenAI's GPT-4o to generate the citation from the text extracted from the PDF files. If there is a disagreement between the citation based on the sample and the embedded metatdata or if there is any missing data from the citation, the script will confirm the complete and accurate citation with the Crossref API.
- Extracts text from the first 5 pages of PDF files (but this variable can be changed since it affects the number of tokens processed)
- Generates a citation in JSON format using OpenAI's GPT-4o.
- Renames PDF files to their citation in Chicago bibliography style.
- Adds bibliographic metadata to PDF files.
- Supports user choice for renaming files, adding metadata, or both.
- Python 3.6+
openai
libraryPyMuPDF
(also known asfitz
)dotenv
libraryhabanero
library (for Crossref API acccess)
-
Clone the repository:
git clone https://github.com/yourusername/pdf-citation-renamer.git cd pdf-citation-renamer
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up your OpenAI API key:
Create a
.env
file in the root directory of the project and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
-
Run the script:
python renamepdf.py
-
Follow the prompts:
- Enter the directory path containing the PDF files.
- Choose how to add bibliographic information:
1
: Rename files only2
: Add metadata only3
: Both rename files and add metadata4
: Quit
Enter the directory path containing the PDF files: /path/to/pdf/files
Found 5 PDF files in the directory.
Do you want to proceed with processing the files? (y/n): y
What bibliographic information would you like to add:
1. File name
2. Metadata
3. Both
4. Quit
Enter your choice (1, 2, 3, 4): 3