research.html

---
layout: default
title:  'Gymrek Lab - Research'
---

<h2>Research challenges</h2>

Our overall goal is to understand complex genetic variants that underlie human disease. We are particularly interested in repetitive DNA variants known as short tandem repeats (STRs) as a model for complex variation. We currently focus on the specific areas described below.

<table class="softwaretable">
<tbody>
<tr>
	<td width=30%>
		<img class="researchpic" src="images/research_figure1.jpeg" alt="research1">
	</td>
	<td width=70%>
		<h4>Analyzing and visualizing repetitive genetic variation</h4>
		Analyzing repeats from next-generation data is challenging due to the limitations of short reads and higher error rates at repeats. We previously published a tool, <a href="lobstr.teamerlich.org">lobSTR</a>, to overcome these challenges at short tandem repeats, or STRs. We are leveraging new sequencing technologies and novel bioinformatic methods to access longer and more complex repetitive regions that are traditionally filtered from sequencing studies. See the <a href="http://gymreklab.com/software.html">resources page</a> for more info!.
	</td>
</tr>
<tr>
	<td width=30%>
		<img class="researchpic" src="images/research_figure2.jpeg" alt="research2">
	</td>
	<td width=70%>
		<h4>Dissecting the contribution of repetitive regions to complex traits</h4>
		Although we are learning more and more about specific genetic variants involved in regulating gene expression or associated with certain diseases, repetitive variation such as STRs are often not well captured by such studies. We have shown that STRs play an important role in gene expression, and thus are likely to be important in complex traits. We are now building on this observation to develop statistical methods to incorporate analysis of repeats into genome-wide association studies, with a particular focus on psychiatric disease.
	</td>
</tr>
<tr>
	<td width=30%>
		<img class="researchpic" src="images/research_figure3.jpeg" alt="research3">
	</td>
	<td width=70%>
		<h4>Predicting the impact of non-coding variation</h4>
		The vast majority of genetic variants identified by genome-wide association studies lie in regions of the genome that do not code for proteins, and thus are difficult to interpret. We are leveraging machine learning techniques that predict the regulatory impact of non-coding variants, in combination with patterns of genetic variation in the population, to predict the impact of individual mutations.
	</td>
</tr>
</tbody>
</table>