NLP pipeline validation with CI tests #22

fmigneault · 2023-12-06T05:32:28Z

changes

add JSON typings
align JSON formatting across files
fix nlp imports
add GitHub CI tests workflow

…at-stac

…at-stac/pull/69\#issuecomment-1158937770) in favor of pystac-client

fmigneault · 2023-12-06T23:09:37Z

@TimeaBagosiCrim
Imports are now validated and working.
However, one test seems to fail:
tests/test_MetricsClasses.py::MetricsClassesTests::test_val - AssertionError: 1.0 != 0
https://github.com/crim-ca/pavics-jupyter-images/actions/runs/7121350381/job/19390424450?pr=22

Do you know what could be the cause?

fmigneault · 2023-12-06T23:35:29Z

@TimeaBagosiCrim

The error seems to be introduced by this step:

pavics-jupyter-images/nlp/notebooks/nl2q_eval/MetricsClasses.py

Lines 839 to 843 in 3301b62

    
           for value_type in VALUE_TYPES: 
        
               value_measures.get_value_metrics(value_type).perfect_value_match = \ 
        
                   value_measures.get_value_metrics(value_type).perfect_value_match \ 
        
                   / value_measures.get_value_metrics(value_type).total_matching_attributes \ 
        
                   if value_measures.get_value_metrics(value_type).total_matching_attributes > 0 else 0

Before running this loop, value_measures.get_value_metrics("numeric").perfect_value_match returns 1.0 as expected for the first annotation with "value":

pavics-jupyter-images/nlp/tests/gold_queries.json

Lines 72 to 80 in 3301b62

    
           { 
        
             "text": "cloud cover lower than 10%", 
        
             "position": [57, 83], 
        
             "type": "property", 
        
             "name": "cloud cover", 
        
             "value": 10, 
        
             "value_type": "percentage", 
        
             "operation": "lt" 
        
           }

The total_matching_attributes is always zero because this doesn't succeed (actual int is in value, not a numeric str) :

pavics-jupyter-images/nlp/notebooks/nl2q_eval/MetricsClasses.py

Lines 797 to 799 in 3301b62

    
           if 'value' in ann.keys() and isinstance(ann['value'], str): 
        
               # print(ann['value']) 
        
               if isnumeric(ann['value']):

This lead to resetting perfect_value_match to zero each time for "numeric" because of the loop.

Is it a problem to adjust the logic to allow int/float values as well, or will that break other code somewhere else that expects str only?

fmigneault added 10 commits December 5, 2023 21:30

typings + formatting + import fixes

4706a47

add GitHub CI tests workflow

63641a0

fix ci cache

c907360

fix typo in nlp conda env file

760a6e9

loosen nlp packages deps to support various python versions

31c3ea7

fix ci logic

202fd80

help EO resolve sat-search version

01d0ff6

fix conda envs to allow over-limiting python-dateutils<2.8 in older s…

67f8997

…at-stac

second pass resolve sat-stac/pandas conflict over python-dateutil

e36d4dd

attempt 3 to resolve deps

e9fe1cb

fmigneault requested a review from TimeaBagosiCrim December 6, 2023 05:32

fmigneault self-assigned this Dec 6, 2023

fmigneault added 5 commits December 6, 2023 11:36

more formatting

3ca7aa2

remove sat-search/sat-stac deprecated (https://github.com/sat-utils/s…

7cab8ff

…at-stac/pull/69\#issuecomment-1158937770) in favor of pystac-client

update intake-stac to avoid older/deprecated sat-stac dependency

a834672

fix module resolution for pytest

4134a5e

remove used typing

3301b62

fmigneault marked this pull request as ready for review December 6, 2023 23:08

fix eval metric to allow numeric value to consider float/int directly

704df69

fmigneault merged commit b190ca2 into DAC-524-baseline-V2 Dec 7, 2023
9 checks passed

fmigneault deleted the baseline-v2-tests branch December 7, 2023 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP pipeline validation with CI tests #22

NLP pipeline validation with CI tests #22

fmigneault commented Dec 6, 2023

fmigneault commented Dec 6, 2023

fmigneault commented Dec 6, 2023

NLP pipeline validation with CI tests #22

NLP pipeline validation with CI tests #22

Conversation

fmigneault commented Dec 6, 2023

changes

fmigneault commented Dec 6, 2023

fmigneault commented Dec 6, 2023