Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clinical trials search tool #777

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

mskarlin
Copy link
Collaborator

@mskarlin mskarlin commented Dec 23, 2024

This upstreams FutureHouse's clinical trials search tool to make it open source. The tool is not turned on by default, but I'm going to make more docs on how to use the tool as well.

I also found many typing regressions -- I've added those back in where necessary.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Dec 23, 2024
@@ -33,8 +39,20 @@

POPULATE_FROM_SETTINGS = None

DEFAULT_TOOL_NAMES: list[str] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, nice.

What do you think of moving this to be co-located with AVAILABLE_TOOL_NAME_TO_CLASS in tools?

class ClinicalTrialsSearch(NamedTool):
TOOL_FN_NAME = "clinical_trials_search"

model_config = ConfigDict(extra="forbid", arbitrary_types_allowed=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why arbitrary_types_allowed=True? I don't think we need it (may be wrong here tho)

"""
)

async def clinical_trials_search(self, query: str, state: EnvironmentState):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def clinical_trials_search(self, query: str, state: EnvironmentState):
async def clinical_trials_search(self, query: str, state: EnvironmentState) -> str:

Even though we don't use return type now, it's still nice to have for IDE usage

settings: Settings = Field(default_factory=Settings)

# Gather evidence tool must be modified to understand the new evidence
GATHER_EVIDENCE_TOOL_PROMPT_OVERRIDE: ClassVar[str] = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try printing this? I think it will have extra spaces unless you do """Gather evidence...

},
) as response:
if response.status == MALFORMATTED_QUERY_STATUS:
# the 400s from clinicaltrials.gov are not JSON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind moving this comment to live with MALFORMATTED_QUERY_STATUS, or renaming MALFORMATTED_QUERY_STATUS to be CLINICAL_TRIALS_DIDNT_GIVE_JSON = 400

Returns:
tuple[int, int, str | None]:
Total number of trials found, number of trials added, and error message if any.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change



# SEE: https://regex101.com/r/L0L5MH/1
CLINICAL_STATUS_SEARCH_REGEX_PATTERN: str = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you co-locate this with the status generation code? Makes sense to have next to each other in the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants