An interactive CLI tool for browser automation using the browser-use library. This tool allows you to control your browser using natural language commands through an interactive command-line interface.
- π€ Multiple LLM Provider Support:
- OpenAI GPT-4o (default)
- Anthropic Claude 3.5 Sonnet (20241022)
- Azure OpenAI Services
- Gemini (coming soon)
- DeepSeek-V3 (coming soon)
- DeepSeek-R1 (coming soon)
- Ollama (coming soon)
- π Configurable System Behaviors:
- Default mode for standard automation
- Safety First mode with enhanced security
- Data Collection mode for comprehensive gathering
- Research mode for systematic exploration
- Wikipedia First mode for research tasks
- πΈ Advanced Logging and Recording:
- Automatic screenshots of elements
- Session recordings
- Comprehensive conversation logs
- Structured data storage
- Debug-level thought process logging
- π Customizable Browser Settings:
- Non-headless mode for visibility
- Optimized window sizing
- Network idle waiting
- Trace and debug capabilities
- Connect to existing Chrome instance
- Support for cloud browser providers
- π οΈ Custom Actions:
- User confirmations
- Search result saving
- Element screenshots
- Structured data handling
- Table data extraction
- File downloads
- Content extraction
The tool provides several built-in custom functions that can be enabled or disabled:
confirm
: Ask for user confirmation before actionssave_search
: Save structured search resultsscreenshot
: Take screenshots of specific elementsextract_content
: Save page contentextract_table
: Extract and save table data as CSVdownload
: Download files from URLs
You can exclude specific functions using the EXCLUDED_ACTIONS
environment variable:
# Exclude file downloads and table extraction
EXCLUDED_ACTIONS=["download", "extract_table"]
The tool supports structured output formats using Pydantic models. Currently available formats:
class Post:
post_title: str
post_url: str
num_comments: int
hours_since_post: int
Enable structured output by setting the OUTPUT_FORMAT
environment variable:
# Use structured posts format
OUTPUT_FORMAT=posts
- API Keys Required:
- OpenAI API Key (default provider, for GPT-4o)
- Anthropic API Key (optional, for Claude 3.5 Sonnet)
- Azure OpenAI credentials (optional)
- Browser Use API Key (optional but recommended)
- Clone this repository:
Windows:
git clone https://github.com/PierrunoYT/browser-use-script
cd browser-use-script
macOS/Linux:
git clone https://github.com/PierrunoYT/browser-use-script
cd browser-use-script
- Install dependencies:
Windows:
python -m pip install -r requirements.txt
macOS/Linux:
pip3 install -r requirements.txt
- Install playwright browsers:
All platforms:
playwright install
- Configure environment:
Windows:
copy .env.example .env
macOS/Linux:
cp .env.example .env
- Edit
.env
with your settings:
# Required: Choose your LLM provider and add API key
LLM_PROVIDER=openai # Options: openai, anthropic, azure
OPENAI_API_KEY=your_key_here
# Optional: Configure system behavior
SYSTEM_PROMPT=default # Options: default, safety, collection
# Optional: Alternative LLM providers
ANTHROPIC_API_KEY=your_key_here # Required for Claude 3.5 Sonnet
AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_KEY=your_key_here
# Optional: Telemetry settings
ANONYMIZED_TELEMETRY=true
- Start the CLI:
Windows:
python main.py
macOS/Linux:
python3 main.py
- The tool will display your current configuration:
Welcome to Browser Use CLI!
Using LLM Provider: OPENAI
System Prompt: DEFAULT
Enter your tasks and watch the browser automation in action.
Press Ctrl+C to exit.
- Enter your tasks in natural language. Examples:
- "Search for the latest AI news and save the results"
- "Go to Wikipedia and find information about quantum computing"
- "Visit a tech blog and take screenshots of interesting articles"
- Standard browser automation behavior
- Balanced between functionality and safety
- Enhanced security and privacy features
- Requires confirmation for form submissions
- Respects robots.txt and terms of service
- Prevents automated logins without permission
- Avoids suspicious or untrusted links
- Focused on comprehensive data gathering
- Automatic search result saving
- Screenshot capture of relevant content
- Organized data storage with timestamps
- Detailed URL documentation
The tool automatically creates and organizes various outputs:
logs/conversation_*.json
: Detailed conversation historylogs/results/*.json
: Structured search resultslogs/screenshots/*.png
: Element screenshotslogs/recordings/
: Browser session recordingslogs/traces/
: Debug trace files
Here are some example tasks you can try:
- "Go to Reddit, search for 'browser-use' and return the first post's title"
- "Search for flights on kayak.com from New York to London"
- "Go to Google Docs and create a new document titled 'Meeting Notes'"
- "Visit GitHub and star the browser-use repository"
- langchain-openai
- langchain-anthropic
- browser-use
- playwright
- python-dotenv
- pydantic
Contributions are welcome! Feel free to open issues for bugs or feature requests.
This project is licensed under the MIT License - see the LICENSE file for details.
The default configuration launches a new browser instance with customizable settings:
# .env configuration
BROWSER_HEADLESS=false
BROWSER_VIEWPORT_WIDTH=1280
BROWSER_VIEWPORT_HEIGHT=1100
Connect to your real Chrome browser with existing profiles and logged-in sessions:
# .env configuration
CHROME_INSTANCE_PATH=C:\Program Files\Google\Chrome\Application\chrome.exe # Windows
CHROME_INSTANCE_PATH=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome # macOS
CHROME_INSTANCE_PATH=/usr/bin/google-chrome # Linux
Connect to cloud-based browser services for enhanced reliability:
# .env configuration
# WebSocket connection (wss)
BROWSER_WSS_URL=wss://your-provider.com/browser
# Chrome DevTools Protocol (CDP)
BROWSER_CDP_URL=http://your-cdp-provider.com
Fine-tune browser behavior with these settings:
# .env configuration
# Page Load Settings
MIN_PAGE_LOAD_TIME=0.5
NETWORK_IDLE_TIME=1.0
MAX_PAGE_LOAD_TIME=5.0
# Security Settings
BROWSER_DISABLE_SECURITY=true
IGNORE_HTTPS_ERRORS=true
JAVASCRIPT_ENABLED=true
# Display Settings
HIGHLIGHT_ELEMENTS=true
VIEWPORT_EXPANSION=500
BROWSER_LOCALE=en-US
# URL Restrictions
ALLOWED_DOMAINS=["example.com","another-domain.com"]
# Debug and Recording
SAVE_RECORDING_PATH=logs/recordings
TRACE_PATH=logs/traces
BROWSER_HEADLESS=false
BROWSER_DISABLE_SECURITY=true
USE_VISION=true
BROWSER_HEADLESS=true
BROWSER_DISABLE_SECURITY=false
USE_VISION=true
ALLOWED_DOMAINS=["trusted-domain.com"]
CHROME_INSTANCE_PATH=/path/to/chrome
USE_PERSISTENT_CONTEXT=true
In addition to the basic configuration, you can customize:
# Exclude specific functions
EXCLUDED_ACTIONS=[] # JSON array of action IDs
# Output format
OUTPUT_FORMAT= # Options: posts, or leave empty for text
# Enable debug logging for model thoughts
LOG_LEVEL=DEBUG
# Save browser recordings
SAVE_RECORDING_PATH=logs/recordings
TRACE_PATH=logs/traces
The tool organizes outputs in the following structure:
logs/
βββ browser_use.log # Main log file
βββ conversation_*.json # Conversation history
βββ results/ # Structured search results
βββ screenshots/ # Element screenshots
βββ content/ # Extracted page content
βββ tables/ # CSV table data
βββ downloads/ # Downloaded files
βββ recordings/ # Browser session recordings
βββ traces/ # Debug trace files