-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ottolenghi scraper #1209
base: main
Are you sure you want to change the base?
ottolenghi scraper #1209
Conversation
Hi @jacksgreen , thanks for the contribution! Just a few things to fix up here The tests failing require a few fixes
Improvements: There is additional information available on this page that would be great to include in the output! Some of which may require a custom implementation.
|
@jknndy Thanks for the feedback, I've updated everything in the pr! |
Just one last thing and then it should be good to go! For sites that have |
A note for anyone who gets a bit confused by this, as I did briefly: we recently added support for |
Hey @jknndy, |
@@ -0,0 +1,69 @@ | |||
from ._abstract import AbstractScraper | |||
from ._grouping_utils import group_ingredients | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ._utils import get_minutes, get_yields | |
def yields(self): | ||
return ( | ||
self.soup.find("div", class_="c-recipe-header__timings") | ||
.find("span") | ||
.get_text(strip=True) | ||
) | ||
|
||
def prep_time(self): | ||
return ( | ||
self.soup.find("div", class_="c-recipe-header__timings") | ||
.find_all("span")[1] | ||
.get_text(strip=True) | ||
) | ||
|
||
def cook_time(self): | ||
return ( | ||
self.soup.find("div", class_="c-recipe-header__timings") | ||
.find_all("span")[2] | ||
.get_text(strip=True) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def yields(self): | |
return ( | |
self.soup.find("div", class_="c-recipe-header__timings") | |
.find("span") | |
.get_text(strip=True) | |
) | |
def prep_time(self): | |
return ( | |
self.soup.find("div", class_="c-recipe-header__timings") | |
.find_all("span")[1] | |
.get_text(strip=True) | |
) | |
def cook_time(self): | |
return ( | |
self.soup.find("div", class_="c-recipe-header__timings") | |
.find_all("span")[2] | |
.get_text(strip=True) | |
) | |
def _extract_timing_elements(self, prefix): | |
timings = self.soup.find("div", class_="c-recipe-header__timings").find_all("span") | |
for timing in timings: | |
if prefix.lower() in timing.get_text().lower(): | |
return timing.get_text() | |
def yields(self): | |
yield_text = self._extract_timing_elements("serves") | |
return get_yields(yield_text) | |
def prep_time(self): | |
prep_text = self._extract_timing_elements("prep") | |
return get_minutes(prep_text) | |
def cook_time(self): | |
cook_text = self._extract_timing_elements("cook") | |
if cook_text and '5o' in cook_text: | |
cook_text = cook_text.replace('5o', '50') | |
return get_minutes(cook_text) |
What do you think about refactoring this to use a shared helper and using the starting text as the matching parameter instead of position?
Also I implemented two existing utils get_minutes
and get_yields
to normalize the fields outputs. this will require some changes to the test JSONs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in cook_time
I added coverage for an error on the recipe page where 50
is displayed as 5o
No description provided.