-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error when trying to extend bundesanzeiger search #75
Comments
Just learned, that there is a new format, called ESEF. Reports using this new format do not have a captcha that needs to be solved, which is why the soup.find() function returns NoneType. |
Thanks for looking into this. Does this mean we need to adapt or extend our code? |
Hi, did you solve it? I guess I have the same problem.. would be super nice to adapt n the code! :) |
Well you could add an additional check here if a captcha is there: if soup.find("div", {"class": "captcha_wrapper"}) is not None:
//Solve the captcha here |
Hi, we kind of solved it/implemented a workaround for our use case: I don't have the code here with me but will provide it after the holidays. |
Hi, thank you very much, it would be great to share the function for the esef viewer, do you know if there are plans to make a PR for this feature? thanks in advance |
Hi @time4breakfast, |
Hi jurekmff, Current situation: I switched companies and do not have access to the code anymore. But I have found a test sample on my machine (which unfortunately imports my own, corrected version of the deutschland api that I don't have anymore -.-), that I will share with you. In theory, what you need to do to fix the error is:
The esef report itself works a little bit different than the old format: Using a standard browser (just as a normal user, looking at it) it will open inside its very own "viewer" implementation which took forever on my machine to load and usually also kind of slowed down the whole computer. The format itself subdivides into several (kind of) pages containing different contents and use-cases. Using the code sample I'll provide further down in this post you can start and try accessing the esef report(s) for yourself. The sample was done for Deutsche Bank. You should be fine replacing the first import "from handelsregister_updates import Bundesanzeiger" to just "from deutschland.bundesanzeiger import Bundesanzeiger" and make the necessary try-except adaption. Also, keep in mind that the domain bundesanzeiger.de will change in the future to unternehmensregister.de. In the background it is the same company but they are trying to separate the data, domains and everything more clearly. Hope that helps. If you have any further questions, don't hesitate to ask. I'll hope to be able to answer more quickly in the future. Best regards
|
I thought about contributing to your package by adding the extended search functionality (i.e. not only search for all documents but add the possibility to limit the search to certain types of documents).
Unfortunately, this is only working for certain companies while for certain other companies the captcha solver always fails. Any ideas why that might be?
(e.g. it works without errors for "Deutsche Bahn AG" but it keeps failing for "Deutsche Bank AG")
Change:
add the value 22 to the search request
response = self.session.get(
f"https://www.bundesanzeiger.de/pub/de/start?0-2.-top%7Econtent%7Epanel-left%7Ecard-form=&fulltext={company_name}&area_select=22&search_button=Suchen"
)
The text was updated successfully, but these errors were encountered: