Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details of the AI3SD Proposal #2

Open
edwintse opened this issue Jul 22, 2019 · 1 comment
Open

Details of the AI3SD Proposal #2

edwintse opened this issue Jul 22, 2019 · 1 comment
Assignees
Labels

Comments

@edwintse
Copy link
Collaborator

edwintse commented Jul 22, 2019

The below text was used in an application submitted by @mattodd in Feb 2019 to the AI3SD network for funding. The title was "Predicting the Activity of Drug Candidates when there is No Target". The aim was to use an open approach to provide a real-world example of how new methods in AI/machine learning can actually impact drug discovery, and to do this by tackling a common and difficult problem: predicting actives in a phenotypic drug discovery project.

The Problem
We aim to use diverse AI approaches to develop new ways to solve one of the biggest challenges in drug discovery: the prediction of activity of drug candidates in the absence of a biological target. We aim to do this using a public competition and open data.

In modern drug discovery it is frequently the case that optimisation of drug candidates is undertaken in the absence of a known biological target – so called phenotypic drug discovery.[Nat. Rev. Drug Disc. 2017, 16, 531] In many therapeutic areas such an approach is seen as superior since it focuses efforts on those compounds known to be effective vs. whole cells or organisms. A common situation is that we have structure-activity relationship data (i.e. a collection of molecules and their associated biological activities) but we do not have information about the binding interactions of those molecules with a biological target. Yet we must be predictive of which molecules to make next, in order to allocate resources wisely. To date, the vast majority of such prediction has been based on the intuition of the medicinal chemists involved. This highly valuable resource has limitations of bias, or of imagination, or in some cases of resources: many small-scale drug discovery projects (particularly in academia, or in start-up companies) may have few people examining the data, meaning good hypotheses may be missed, or key insights overlooked. Manual organic synthesis of individual compounds designed in response to hypotheses – during the so-called Lead Optimisation phase – is among the most expensive areas of drug discovery. It is not unusual for the synthesis of one molecule to require two weeks of a postdoctoral researcher’s time, equating to ca. £2K per compound. If we are to identify the medicines society most needs, we must become significantly more efficient at the prediction of phenotypic potency.

Drug discovery is a complicated, multi-faceted process involving a range of expertise that varies according to the stage of a project. The design of a compound intended to achieve a biological end involves disciplines across organic chemistry, medicinal chemistry, pharmacokinetics, computational chemistry and, usually, biology of the relevant organism (be that a pathogen, or human biology). There is a requirement of strategic planning through project management and a delicate balance of resources vs. potential gain - i.e. when to “kill” a project based on perceived likely return on investment. All of these roles are required in the specific drug discovery project at the heart of this research proposal. Open Source Malaria (OSM) involves scientists at UCL and The University of Sydney, but also scientists from elsewhere in the world such as those from the 20 other institutions that contributed to OSM’s first research paper. The preliminary attempts at solving the present research problem (described below under preliminary data) have come from the US, UK and Australia, from both the public and private sectors but also citizen scientists. This project involves people from the broadest range of professional backgrounds.

The project concerns predicting biological activity for Open Source Malaria Series 4, the most current pressing research problem for OSM. Over 200 molecules are known in this series, with potencies against the malaria parasite ranging from inactive to sub-10 nM. Yet it is still the case that weeks of laboratory-based effort may beexpended in making reasonable-looking molecules that are found to have zero potency. Series 4 is highly promising: several members have cured malaria in the mouse model of the disease. This isthe closest an open source series has ever been to the clinical phase of investigation.

The biological target is thought to be the ion pump PfATP4, an essential part of the parasite’s machinery in maintaining ion balance when inside a red blood cell. Despite extensive effort, the structure of this large, complex membrane-bound protein remains unsolved. The target is implicated by genetic changes found in resistant mutants. Understanding PfATP4 is crucial because it is the supposed target of the newest antimalarial to reach Phase III clinical trials, KAE609 (Cipargamin), developed by Novartis. Mysteriously, PfATP4 is also inhibited by a bewildering array of unrelated chemotypes.[Int. J. Parasitol. 2015, 5, 149] It is unclear how this is possible.

A competition was announced by OSM in 2016, run and concluded. All available data were curated for the community and submissions of models, using any methodology, were encouraged from OSM’s contributor network. Six diverse, fully-fledged entries were accompanied by full details. These models were evaluated against a test dataset (the MMV Pathogen Box) that had not been disclosed. Evaluation of the entries by a scientific advisory panel led to the award of the prize to two equally well-performing models. These models are not yet highly predictive, despite the quality of the input data and the relevant expertise of the entrants. This proposal now aims to build on this significant preliminary work. All original submitters are willing to improve the models and wish to publish the work, providing an excellent community starting point for this proposal.

Our aim is to become more predictive of potencyby building new models by whatever means possible. This research project will apply new AI methods, as part of an open competition, to generate a high quality predictive model for this important antimalarial series. There have been clear and exciting advances in recent years in applying computational approaches (e.g. matched pairs analysis) to the prediction of biological activity. However, the time is right for a broader exploration of approaches to compound prediction, using newer methods of machine learning that are being trialled by some of the leading companies in the field of AI who have joined this application. It is time that there were available to the scientific community clear examples of the potential impact of machine learning methods in real drug discovery projects, in order that we might more clearly understand the impact of such new technologies, and to clearly distinguish current state of the art from hype. To achieve this, and to discover the best ideas from any quarter, we propose to mix the application of new methods from the private sector with a public competition to which anyone may submit solutions. What has been missing to date from the OSM team, and what is missing in the vast majority of drug discovery projects around the world, is AI. Essentially no drug discovery project other than those taking place in the largest pharma companies, or those taking place at new AI-centric companies, involve any significant element of AI at all. This is an astonishing situation, one that we hope to reverse through the outcomes of this public-facing project.


Project Aims
To develop a general AI-enabled approach to solving the prediction of biological activity in phenotypic drug discovery, through the use of a public competition. Objectives are as follows:

  • Data curation and sharing of everything that is known of the activity of molecules in OSM Series 4.
  • Invitation to our co-applicant partners to submit models, as well as an invitation for the public to contribute.
  • Evaluation of the predictive models, including via synthesis of several of the predicted compounds. Winners are determined and modest prizes awarded.
  • Post-mortem collaborative discussion of the outcomes at a physical workshop.
  • Publication of a summary article. Description of how the project can be continued

Project Method

  • Data on many OSM Series 4 compounds, an other compounds also targeting PfATP4, are already available. The dataset will be updated with new compounds from OSM, the general literature and the MMV Pathogen Box. A small subset will be held from public view. A Github post will be written to announce the project, along with the rules for participation. We will disseminate widely, including through the AI3SD network.
  • Details of working and methods can be uploaded to lab notebooks such as Labarchives, Zenodo. Commercial partners may need to keep back proprietary details (what they term “magic sauce”) but they have committed to keep this to a minimum.
  • An expert scientific advisory group will assess the intrinsic performance of the models and vs. the small unpublished dataset.
  • Results are announced, and predictions of new compounds are solicited from the best-performing models.
  • Synthesis is carried out of five new molecules predicted to be potent. This will use existing in-kind resources available in Todd’s lab (postdoctoral researcher as part of OSM 2019–20, plus chemicals and consumables). The molecules will be assessed for potency at the University of Dundee, and tested in an ion regulation model of PfATP4 at the Australian National University, Canberra (Prof. Kiaran Kirk). All activities funded in-kind.
  • A workshop is held in London to which competition participants will be invited (depending on the number of entries, costs will be covered), and the wider AI3SD community will be invited. The workshop will carry out a post-mortem of the competition. Two small prizes are awarded, one for a private sector entry and one for a public sector entry. Interested parties will collaborate on ways forward and funding.
  • A full write-up of this project will be submitted to a special issue of the Beilstein Journal of Organic Chemistry, in the thematic issue on Medicinal Chemistry, for which Prof. Todd is the invited Editor (to be completed 2019). The paper will be a new style of “Challenge Paper” the journal seeks to pioneer, in which the paper is framed as a “living project” to which others are invited to contribute the next stages
@mattodd
Copy link
Member

mattodd commented Sep 12, 2019

Added link to this Issue to the wiki, so can be closed when needed, but still useful for participants at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants