Current implementation has two measures:
- Demographic statistics: Computes the number of times a given demographic group appears in a given text (eg:
{'female': 7, 'male': 6}
). - Co-occurance matrix: Computes the number of times a given demographic group term (eg:
"she"
) co-occurs with a target term (eg:"caring"
) in a given input text (eg:{'('female', 'caring')': 2, ('male', 'caring'): 0, ..}
).
python3 bias_measure.py \
--text example_text.txt \
--target_group_name professions
- Values for
--target_group_name
can be currently chosen fromprofessions
oradjectives
. --text
should be set to path to the .txt file containing the text to be evaluated. (eg:example_text.txt
)
python3 bias_measure.py \
--text example_text.txt \
--target_group_name professions \
--folder_to_save_results "."
folder_to_save_results
must contain the path to the folder to save the results to as a .csv file. This is an optional argument. (eg:./src
)
- Update
bias_terms.py
with:List
containing target terms (eg:adjectives = ["reactive", "sweet", .. ]
)- Update
target_dictionary
with the newList
containing target terms (eg:target_dictionary = {"adjectives": adjectives, .. }
)
- Update
bias_terms.py
with:List
containing demographic terms (eg:female_terms = ["her", "she", "wife", .. ]
)- Update/Create
gender_dictionary
with the newList
containing demographic terms (eg:gender_dictionary = {"female": female_terms, "male": male_terms}
)
itertools
pandas
collections
nltk