Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Categories #274

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Update Categories #274

wants to merge 2 commits into from

Conversation

cooukiez
Copy link

Description of changes:
I removed old category.yaml file and replace with the complete category list, I scraped from kleinanzeigen.de.
These should be all categories as of March 2024.
I created a category tree therefore all leaf categories are included in the yaml-file (with name and path).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Delete old categories file for new file
I have collected all categories from kleinanzeigen.de
Format is:
category_name:
    subcategory_name:
         subsubcategory_name: path
...
@TylonHH
Copy link

TylonHH commented Apr 21, 2024

Is this request related with this? #281 (comment)

@cooukiez
Copy link
Author

cooukiez commented Apr 21, 2024

Yes, as I said, I directly scraped every category from kleinanzeigen.de

bdbf1ce#r1573870001

Babyspielzeug: 17/23/babyspielzeug
Barbie & Co: 17/23/barbie
Dreirad & Co: 17/23/dreirad
Gesellschaftsspiele: 17/23/gesellschaftsspiele
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is your category.

Copy link

@TylonHH TylonHH May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a little bit confused.
I replaced the category in the yaml to 17/23/babyspielzeug
But this category was not set.

So I published by hand in the same category and downloaded the add. There was the category set to 17l463/23l463
When I triy to publish this ad, this category also not set.

The Link on the web is: https://www.kleinanzeigen.de/p-kategorie-aendern.html#?path=17/23/babyspielzeug&isParent=undefined

What I'm doing wrong?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my base url when selecting categories:
https://www.kleinanzeigen.de/p-kategorie-aendern.html#?path=&isParent=true

image

then i can add the path and it works fine:
https://www.kleinanzeigen.de/p-kategorie-aendern.html#?path=17/23/babyspielzeug&isParent=true

image

(you cannot access the urls directly)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far as I see is the problem while saving/downloading an ad. It saves the category like category: 161l463/225l463 instead of 161/225/festplatten_laufwerke
Do we need a mapping table or something? Or are you able to download and (re)publish the downloaded ads?

@cooukiez
Copy link
Author

Someone would need to replace special german characters, then builds wont fail I think

@provinzio
Copy link
Contributor

@cooukiez I realized that the encoding of the new categories.yaml file wasn't utf-8. (perhaps used python open(...) without encoding="utf-8" as parameter?) I fixed the encoding and opened another PR with a fixed pipeline.

Have you used a script to scrape the new categories? Might be nice to save it somewhere near in case we need it again. Good work. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants