-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable database names #420
Commits on Nov 14, 2019
-
WebURLTupleBinding, Fetcher and WebURL Post
Added POST to WebURL and PageFetcher. Included support for the new WebURL attributes in WebURLTupleBinding. I've suggested to deprecate newHttpUriRequest(String) in PageFetcher because it does not allow to pass post parameters.
Configuration menu - View commit details
-
Copy full SHA for e9004f8 - Browse repository at this point
Copy the full SHA e9004f8View commit details -
DocIDServer aware of POST data
DocIDServer is now aware of POST data and allows to visit the same URL if POST parameters are different. (Filling a form with different years, for instance). Suggested to deprecate getDocId, getNewDocID, addUrlAndDocId and isSeenBefore since they don't allow to pass post parameters. WebURL has the ability to encode itself into a single unique string. NOTE: This serialization SHOULD be improved.
Configuration menu - View commit details
-
Copy full SHA for 5ac44bb - Browse repository at this point
Copy the full SHA 5ac44bbView commit details -
WebCrawler uses new DocIDServer Post capabilities
The WebCrawler now passes the WebURL to the DocIDServer instead of passing a String URL. We assume GET on redirections. I'm not 100% sure if this is allways true. The case !curURL.getURL().equals(fetchResult.getFetchedUrl()) is still using old methods. Should be reviewed
Configuration menu - View commit details
-
Copy full SHA for f0b2219 - Browse repository at this point
Copy the full SHA f0b2219View commit details -
addSeenUrl and addSeed with WebURL parameter
Added addSeenUrl(WebURL) and addSeed(WebURL) methods to CrawlController. Did not touch original methods, although I'd suggest to make them create WebURLs and pass them to the new methods.
Configuration menu - View commit details
-
Copy full SHA for 20e646c - Browse repository at this point
Copy the full SHA 20e646cView commit details -
SUGGESTION: addSeed and addSeenUrl call the WebURL methods
addSeed and addSeenUrl now will create a WebURL and pass it to the newly created methods. It will make it easier for the user to override those methods.
Configuration menu - View commit details
-
Copy full SHA for 2eed8de - Browse repository at this point
Copy the full SHA 2eed8deView commit details -
PageFetchResult now contains the POST info
Suggedted to deprecate old fetchedUrl attribute and introduced fetchedWebUrl, which is a WebURL. Some tab style fixes.
Configuration menu - View commit details
-
Copy full SHA for 855cbc4 - Browse repository at this point
Copy the full SHA 855cbc4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 300c07f - Browse repository at this point
Copy the full SHA 300c07fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 72cd50f - Browse repository at this point
Copy the full SHA 72cd50fView commit details
Commits on Nov 15, 2019
-
Configuration menu - View commit details
-
Copy full SHA for dc96fd9 - Browse repository at this point
Copy the full SHA dc96fd9View commit details -
Configuration menu - View commit details
-
Copy full SHA for f768616 - Browse repository at this point
Copy the full SHA f768616View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d29aab - Browse repository at this point
Copy the full SHA 1d29aabView commit details -
Configuration menu - View commit details
-
Copy full SHA for 98aff77 - Browse repository at this point
Copy the full SHA 98aff77View commit details
Commits on Nov 16, 2019
-
It is possible to configure the database names from the CrawlControler constructor.
Configuration menu - View commit details
-
Copy full SHA for e2da488 - Browse repository at this point
Copy the full SHA e2da488View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d0cd15 - Browse repository at this point
Copy the full SHA 0d0cd15View commit details -
Extracted interfaces from Parser and PageFetcher
Extracted interfaces from Parser and PageFetcher in order to make it easier to create totally custom classes
Configuration menu - View commit details
-
Copy full SHA for 761513b - Browse repository at this point
Copy the full SHA 761513bView commit details -
Configuration menu - View commit details
-
Copy full SHA for a64580c - Browse repository at this point
Copy the full SHA a64580cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5ebc3 - Browse repository at this point
Copy the full SHA ad5ebc3View commit details
Commits on Dec 14, 2019
-
Now WebURLs can be used as seeds, so we enforce docId to be < 0 unless staten otherwise
Configuration menu - View commit details
-
Copy full SHA for 0497657 - Browse repository at this point
Copy the full SHA 0497657View commit details
Commits on Jan 9, 2020
-
Configuration menu - View commit details
-
Copy full SHA for a5d72ab - Browse repository at this point
Copy the full SHA a5d72abView commit details -
Configuration menu - View commit details
-
Copy full SHA for d76a045 - Browse repository at this point
Copy the full SHA d76a045View commit details -
Configuration menu - View commit details
-
Copy full SHA for 30007ca - Browse repository at this point
Copy the full SHA 30007caView commit details