Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in number of removed annotations #26

Open
Cogitarian opened this issue Apr 6, 2020 · 1 comment
Open

Difference in number of removed annotations #26

Cogitarian opened this issue Apr 6, 2020 · 1 comment

Comments

@Cogitarian
Copy link

I just wanted to remove empty annotations in one eaf. Surprisingly the method remove_annotation remove more annotations then ELAN does with TIER>REMOVE ANNOTATIONS>EMPTY ANNOTATIONS. I've tried removing rows in data.frame made of annotations from elan and it worked exactly as in ELAN. But still I'm not sure how the method remove_annotation works.

Try

#R
library(reticulate)
library(magrittr, lib.loc = "/Library/Frameworks/R.framework/Versions/3.6/Resources/library")
conda_list()[[1]][1] %>% 
  use_condaenv(required = TRUE)
#### PYTHON ####
# coding: utf-8
# -*- coding: utf-8 -*-
import codecs
import pympi    # Import pympi to work with elan files
import os, fnmatch
import glob
import json
import csv
import sys
import re
import numpy as np
import pandas as pd
setwd("/Volumes/MAXI RUGGED/Google Drive/2020UAM/INFORMATYKA/scRiPting/Py/!PYMPI!/TRANS2020")

eaffile02235 = "000-22-35-S1.mp3.audioenhance.eaf"
eaffile12235 = "001-22-35-S1.mp3.audioenhance.eaf"
eaf_file = pympi.Eaf(eaffile02235) 
eaf_tiers = eaf_file.get_tier_names()
eaf_tiers

t = 'COACH'
anotacje_COACH = eaf_file.get_annotation_data_for_tier(t)
len(anotacje_COACH)
eaf_file.to_file(eaffile02235)

for a in range(0,len(anotacje_COACH)):
  if len(anotacje_COACH[a][2])==0:
    eaf_file.remove_annotation(t,anotacje_COACH[a][0]+1,anotacje_COACH[a][1]-1)
    
anotacje_COACH = eaf_file.get_annotation_data_for_tier('COACH')
len(anotacje_COACH) #64

aupd = pd.DataFrame(anotacje_uczestnik)
aupu = filter(aupd,aupd[2]=='')
aupu = aupd[aupd[2].map(len) > 0]
aupu = aupu.to_records(index=False)
aupu = list(aupu)
len(aupu) # 181

eaf_file.remove_tier(t)
eaf_file.add_tier(t)

for a in range(0,len(aupu)):
  eaf_file.add_annotation(t,aupu[a][0],aupu[a][1], value= aupu[a][2])
eaf_file.to_file(eaffile02235)

000-22-35-S1.mp3.audioenhance.eaf.zip

@dopefishh
Copy link
Owner

pympi removes all annotations that have overlap with the given time. This overlap is inclusive, (<= and >= are used). Maybe ELAN uses exclusive overlaps (< and >)?
If I have time I'll check it soon but feel free to put it to the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants