Skip to content

Commit

Permalink
Merge branch 'CW-3072_duplicate_reads' into 'dev'
Browse files Browse the repository at this point in the history
CW-3072 Duplicate reads

Closes CW-3072

See merge request epi2melabs/workflows/wf-single-cell!142
  • Loading branch information
nrhorner committed Feb 2, 2024
2 parents 6124c4b + acb1db7 commit af379d5
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Fixed
- More informative error message upon read duplicate detection.
### Updated
- Remove duplicate fastcat call.

Expand Down
9 changes: 8 additions & 1 deletion bin/workflow_glue/cluster_umis.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,14 @@ def process_records(df, args):

def main(args):
"""Run entry point."""
df_tags = pd.read_csv(args.read_tags, sep='\t', index_col=0)
df_tags = pd.read_csv(args.read_tags, sep='\t', index_col='read_id')

dups = df_tags[df_tags.index.duplicated(keep='first')]
if not dups.empty:
raise ValueError(
f"One or more input reads are duplicated, please rectify.\n"
f"Duplicated reads: {list(set(dups.index))[:20]}")

df_features = pd.read_csv(
args.feature_assigns, sep='\t', index_col=0)
# Merge genes and transcripts onto tags.
Expand Down

0 comments on commit af379d5

Please sign in to comment.