A web application that scrapes web pages, extracts main content, and uses OpenLLaMA to convert the content into specified formats.
- Web interface for URL input and format selection
- Playwright-based web scraping
- Content extraction and HTML cleanup
- OpenLLaMA integration for content transformation
- Flask-based web server
- Install uv (if not already installed):
pip install uv
- Clone the repository:
git clone https://github.com/arman-bd/www2any.git
cd www2any
- Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate
# Windows: .venv\Scripts\activate
uv pip sync pyproject.toml
- Install Playwright browsers:
playwright install
- Install OpenLLaMA:
Follow the instructions in the OpenLLaMA website to install the OpenLLaMA API server.
For development, install additional development dependencies:
uv pip sync --editable ".[dev]"
- Start the Flask server:
uv run www2any
-
Open your browser and navigate to
http://localhost:5000
-
Enter a URL and select your desired output format
-
Click "Process" to get the transformed content
Create a .env
file in the project root with the following settings:
OPENLLAMA_API_URL=http://localhost:8080
FLASK_ENV=development
Run tests using pytest:
uv run pytest
Ruff is used for code formatting, linting, and import sorting. Here are the common commands:
- Format code:
ruff format src
- Lint and fix code:
ruff check --fix src
- Run tests:
pytest
Add these settings to your .vscode/settings.json
:
{
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
},
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff"
},
"python.analysis.typeCheckingMode": "basic",
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,
"python.testing.pytestArgs": [
"tests"
]
}
MIT License