web-data-extraction

Star

Here are 21 public repositories matching this topic...

MohamedHmini / iww

Star

AI based web-wrapper for web-content-extraction

python data-mining library ai information-extraction web-scraping web-mining web-content-extractor web-data-extraction

Updated Feb 6, 2023
Python

codercurious / crunchbase-scraper

Star

Scrape crunchbase companies, people, investors, acquisitions data including website urls, social urls, emails, phone numbers, employee count, funding information etc.

leads crunchbase investors web-scrapers web-data-extraction lead-generation scraping-web scraper-api crunchbase-api crunchbase-scraper company-scraper leads-scraper

Updated Jan 15, 2024

luminati-io / java-web-scraping

Star

Quick guide with code example how to use Java for web scraping

java maven scraping-websites web-data-extraction

Updated Sep 24, 2024

DemonMartin / scrappey-wrapper

Star

An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)

web-scraping data-extraction web-data-extraction scraping-framework scraping-tool cloudflare-bypass web-scraping-solution cloudflare-solver api-scraping scraping-solution website-data-extraction scraping-library cloudflare-anti-bot scraping-service data-scraping-tool website-scraping-tool turnstile-solver

Updated Jan 10, 2024
JavaScript

jjonescz / awe

Sponsor

Star

AI-based web extractor

deep-learning information-extraction web-scraping web-data-extraction structured-web-data

Updated Feb 25, 2023
Python

Boomslet / Web_Crawler

Star

Open-source web crawler

python url html open-source website opensource links web-crawler urls free data-extraction webcrawler web-crawling web-data-extraction urllib web-crawler-python

Updated Jul 21, 2018
Python

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

wbsg-uni-mannheim / WDCFramework

Star

Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.

schema-org json-ld microdata web-data-extraction

Updated Dec 13, 2022
Java

kaizenplatform / FacebookInsightsConnector

Star

The Tableau Web Data Connector for Facebook Insights API

facebook tableau facebook-insights web-data-extraction

Updated Jun 26, 2017
JavaScript

lekhmanrus / real-shot-pdf

Star

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Updated Mar 1, 2024
TypeScript

oxpath / oxpath

Star

OXPath from Oxford

scraper web ajax web-data-extraction

Updated May 20, 2022
Java

wbsg-uni-mannheim / schemaorg-tables

Star

This repository contains the code and data download links to reproduce the building process of the 2021 Schema.org Table Corpus.

schema-org web-data-extraction web-tables

Updated May 12, 2021
Python

hoxhaeris / get_muitiple

Star

Get and process multiple resources from web, using asyncio (aiohttp) to fetch the data and multiprocessing/multithreading for processing it.

python3 web-scraping asyncio web-data-extraction

Updated Mar 4, 2021
Python

ranajahanzaib / wdx

Star

A web data extraction library written in golang.

scraper mongodb nextjs web-data-extraction go-scraper

Updated Apr 19, 2024
Go

wbsg-uni-mannheim / wdc-page

Star

This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl

web-data-extraction

Updated Jul 3, 2024
HTML

sc10ntech / extract-site-metadata

Star

Metadata extractor for the sprawling web ⚙️

metadata-extraction web-data-extraction open-graph-protocol

Updated Jan 8, 2023
TypeScript

dariga-sm / Word-Frequency-in-Moby-Dick

Star

Scrape the novel Moby Dick from the website Project Gutenberg using the Python package requests. Then you'll extract words from this web data using BeautifulSoup. Finally, we'll dive into analyzing the distribution of words using the Natural Language ToolKit (nltk)

python requests beautifulsoup nlp-machine-learning case-study web-data-extraction

Updated Oct 21, 2019
HTML

mibrahimbashir / customer_reviews

Star

A Comprehensive Script To Extract Customer Reviews For Machine Learning

python scrapy web-data-extraction

Updated Sep 26, 2024
Python

gonzalopezgil / scraping-interface

Star

Python-based desktop app for effortless web scraping

desktop-app python cross-platform pyqt5 web-scraping xpath web-data-extraction browsing web-pages user-friendly-interface

Updated Jun 26, 2023
Python

wbsg-uni-mannheim / StructuredDataProfiler

Star

Java project for profiling the results of the yearly Web Data Commons extraction of structured data with RDFa, Microdata, Microformat, and Embedded JSON-LD annotations.

schema-org json-ld microdata profiling web-data-extraction

Updated Oct 17, 2022
Java

Improve this page

Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-data-extraction

Here are 21 public repositories matching this topic...

MohamedHmini / iww

codercurious / crunchbase-scraper

luminati-io / java-web-scraping

DemonMartin / scrappey-wrapper

jjonescz / awe

Boomslet / Web_Crawler

dstark5 / gnews-scraper

wbsg-uni-mannheim / WDCFramework

kaizenplatform / FacebookInsightsConnector

lekhmanrus / real-shot-pdf

oxpath / oxpath

wbsg-uni-mannheim / schemaorg-tables

hoxhaeris / get_muitiple

ranajahanzaib / wdx

wbsg-uni-mannheim / wdc-page

sc10ntech / extract-site-metadata

dariga-sm / Word-Frequency-in-Moby-Dick

mibrahimbashir / customer_reviews

gonzalopezgil / scraping-interface

wbsg-uni-mannheim / StructuredDataProfiler

Improve this page

Add this topic to your repo