Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

information bottleneck #50

Merged
merged 51 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
4504143
change dist to tuple
Feb 6, 2024
bd5b281
mypy-induced cleaning
shanest Feb 23, 2024
d5baad5
rename language.to_dict(), remove other to_dict() methods
shanest Feb 23, 2024
4b10e2c
roll back some to_dict to avoid yaml reading errors
shanest Feb 23, 2024
db19983
Automated black formatting
github-actions[bot] Feb 26, 2024
5d2a5c3
Merge pull request #36 from CLMBRs/to_dict-pretty_print
shanest Feb 26, 2024
d07b166
Trying the sampling to generate hypothetical languages, removed 15 la…
mickeyshi Mar 14, 2024
b43021f
generated per-language color maps for debug
mickeyshi Mar 20, 2024
a9a0a58
Added IB curve back + more customization
mickeyshi Mar 20, 2024
0afabe4
Added color term expansion to artificial languages, some cleanup
mickeyshi Apr 8, 2024
5264598
Added centroid color calculation, caching IB bound, settings for arti…
mickeyshi Apr 9, 2024
212582c
Filter out color chips that don't meet the threshold
mickeyshi Apr 25, 2024
60f86d5
Split main script up by functionality
mickeyshi May 21, 2024
125633c
Merge branch 'main' into color-categories
mickeyshi May 30, 2024
bec0f54
Mostly code cleanup for merge
mickeyshi Jun 1, 2024
1399d7e
Automated black formatting
github-actions[bot] Jun 1, 2024
3830973
Merge branch 'main' into color-categories
shanest Sep 12, 2024
8e33f88
beginning color refactor
shanest Sep 12, 2024
ac73f71
reorder universe by chip number
shanest Sep 12, 2024
b68da30
scripts for color_universe, begin reading natural languages
shanest Sep 12, 2024
7dfacf5
natural languages generated
shanest Sep 12, 2024
a4b4393
nat langs -> yaml, begin encoder
shanest Sep 13, 2024
6b971eb
nat langs -> yaml, begin encoder
shanest Sep 13, 2024
d1bb4a1
single language for testing
shanest Sep 13, 2024
2c3f568
rm color universe pkl
shanest Sep 13, 2024
9ae7307
color universe pkl -> yaml
shanest Sep 13, 2024
608539d
rename cols in universe, compute meaning dists
shanest Sep 13, 2024
b0395b8
remove unneeded variable
shanest Sep 13, 2024
640c475
starting probability in ULTK
shanest Sep 13, 2024
3ad8c8c
compute complexity and accuracy of nat langs
shanest Sep 13, 2024
d3ac8ed
some shape fixing stuff for lang -> info plane measurement
shanest Sep 20, 2024
37e3f15
evaluation of bound takes 2 hours with just 30 points
Nov 15, 2024
8117dba
set semantics _dist to False and add zkrt prior
Nov 15, 2024
85ef05c
fix shape error
Nov 20, 2024
38289e4
ultk informativity metrics seem to lose important info in Speaker Lis…
Nov 20, 2024
3d98afe
three days of BA is too much for color example
Nov 23, 2024
a1145f9
interesting
Nov 23, 2024
79036bc
remove signaling
Nov 23, 2024
9bef938
add some notes
Nov 23, 2024
9e00f3f
update colors model binary
Nov 23, 2024
eef58f3
Automated black formatting
github-actions[bot] Nov 23, 2024
194dbba
delete rate_distortion and clean up docs
Nov 24, 2024
934bbf1
Merge branch 'ib-naming-model' of github.com:CLMBRS/altk into ib-nami…
Nov 25, 2024
e158cdd
Automated black formatting
github-actions[bot] Nov 25, 2024
cfa573b
implement the minor requested PR changes
Dec 15, 2024
0867618
Merge branch 'ib-naming-model' of github.com:CLMBRS/altk into ib-nami…
Dec 15, 2024
6876c7d
Merge branch 'main' into ib-naming-model
nathimel Dec 15, 2024
4988cd7
Automated black formatting
github-actions[bot] Dec 15, 2024
92e3da4
small updates after mtg
Dec 17, 2024
c2d5698
Merge branch 'ib-naming-model' of github.com:CLMBRS/altk into ib-nami…
Dec 17, 2024
1613b1a
Automated black formatting
github-actions[bot] Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
.vscode/
.DS_Store
src/altk.egg-info

**/*.pkl

# Distribution/build
dist/
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ First, set up a virtual environment (e.g. via [miniconda](https://docs.conda.io/

## Getting started

- Check out the [examples](https://github.com/CLMBRs/ultk/tree/main/src/examples), starting with a basic signaling game. The examples folder also contains a simiple efficient communication analysis of [indefinites](https://github.com/CLMBRs/ultk/tree/main/src/examples/indefinites).
- Check out the [examples](https://github.com/CLMBRs/ultk/tree/main/src/examples), starting with a simiple efficient communication analysis of [indefinites](https://github.com/CLMBRs/ultk/tree/main/src/examples/indefinites) and a comparison of two approaches to efficient communication, with modals as a test case.
- To see more scaled up usage examples, visit the codebase for an efficient communication analysis of [modals](https://github.com/nathimel/modals-effcomm) or [sim-max games](https://github.com/nathimel/rdsg).
- For an introduction to efficient communication research, here is a [survey paper](https://www.annualreviews.org/doi/abs/10.1146/annurev-linguistics-011817-045406) of the field.
- For an introduction to the RSA framework, see [this online textbook](http://www.problang.org/).
Expand Down Expand Up @@ -64,7 +64,6 @@ Unit tests are written in [pytest](https://docs.pytest.org/en/7.3.x/) and execut
<details>
<summary>Links:</summary>

> Imel, N. (2023). The evolution of efficient compression in signaling games. PsyArXiv. https://doi.org/10.31234/osf.io/b62de

> Imel, N., & Steinert-Threlkeld, S. (2022). Modal semantic universals optimize the simplicity/informativeness trade-off. Semantics and Linguistic Theory, 1(0), Article 0. https://doi.org/10.3765/salt.v1i0.5346

Expand Down
2 changes: 1 addition & 1 deletion docs/search.js

Large diffs are not rendered by default.

17 changes: 7 additions & 10 deletions docs/ultk.html

Large diffs are not rendered by default.

44 changes: 20 additions & 24 deletions docs/ultk/effcomm.html

Large diffs are not rendered by default.

1,702 changes: 816 additions & 886 deletions docs/ultk/effcomm/agent.html

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions docs/ultk/effcomm/analysis.html

Large diffs are not rendered by default.

286 changes: 286 additions & 0 deletions docs/ultk/effcomm/information_bottleneck.html

Large diffs are not rendered by default.

1,305 changes: 1,305 additions & 0 deletions docs/ultk/effcomm/information_bottleneck/ba.html

Large diffs are not rendered by default.

1,174 changes: 1,174 additions & 0 deletions docs/ultk/effcomm/information_bottleneck/ib.html

Large diffs are not rendered by default.

1,928 changes: 1,928 additions & 0 deletions docs/ultk/effcomm/information_bottleneck/modeling.html

Large diffs are not rendered by default.

729 changes: 729 additions & 0 deletions docs/ultk/effcomm/information_bottleneck/tools.html

Large diffs are not rendered by default.

399 changes: 221 additions & 178 deletions docs/ultk/effcomm/informativity.html

Large diffs are not rendered by default.

102 changes: 54 additions & 48 deletions docs/ultk/effcomm/optimization.html

Large diffs are not rendered by default.

532 changes: 532 additions & 0 deletions docs/ultk/effcomm/probability.html

Large diffs are not rendered by default.

255 changes: 133 additions & 122 deletions docs/ultk/effcomm/sampling.html

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions docs/ultk/effcomm/tradeoff.html

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions docs/ultk/language.html

Large diffs are not rendered by default.

2,970 changes: 1,677 additions & 1,293 deletions docs/ultk/language/grammar.html

Large diffs are not rendered by default.

735 changes: 340 additions & 395 deletions docs/ultk/language/language.html

Large diffs are not rendered by default.

1,498 changes: 753 additions & 745 deletions docs/ultk/language/sampling.html

Large diffs are not rendered by default.

920 changes: 431 additions & 489 deletions docs/ultk/language/semantics.html

Large diffs are not rendered by default.

280 changes: 280 additions & 0 deletions docs/ultk/util.html

Large diffs are not rendered by default.

500 changes: 500 additions & 0 deletions docs/ultk/util/frozendict.html

Large diffs are not rendered by default.

519 changes: 519 additions & 0 deletions docs/ultk/util/io.html

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ dependencies = [
"plotnine",
"pathos",
"pytest",
"rdot",
]

[project.urls]
Expand Down
1 change: 0 additions & 1 deletion src/examples/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
"""Minimal examples demonstrating how to use ULTK.

See `examples.signaling_game`.
"""
56 changes: 56 additions & 0 deletions src/examples/colors/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Analyzing the Relationship between Complexity and Informativity across the World's Languages

Based off [Zaslavsky, Kemp et al's paper on color complexity ](https://www.pnas.org/doi/full/10.1073/pnas.1800521115) and [the corresponding original repo](https://github.com/nogazs/ib-color-naming).

This example creates a "conceptual" / miniature replication of the above paper using the tools provided by the ULTK library. Right now, the final analysis produces the following plot:
![a plot showing communicative cost and complexity of natural, explored, and dominant languages](https://github.com/CLMBRs/altk/blob/main/src/examples/colors/outputs/plot.png?raw=true)

This README first explains the contents of this example directory, focusing on what the user has to provide that's specific to the color case study, before then explaining the concrete steps taken to produce the above plot. After that, there is some discussion of what's missing from the above paper and other next steps for this example.

## Contents
`data` consists of language and color data provided by the [World Color Survey](https://linguistics.berkeley.edu/wcs/data.html). Certain files have been slightly edited in order for simplicity of parsing, such as providing a header row.

`outputs` contains outputs of various scripts, as outlined below.


`lang_colors` consists of per-language color distributions. Major color terms are graphed per language.

`analyze_data.py` contains functions for graphing the distribution of color terms across language expressions and languages themselves.

`color_grammar.py` contains class definitions for the ColorLanguage and other utility structures.

`generate_wcs_languages.py` contains the function for reading and converting the WCS data to ULTK language structures. It also generates

`complexity.py` calculates the complexity and informativity of the various color WCS color languages, passed in as a pandas DataFrame.

`graph_colors.py` contains functions for graphing the distribution of color terms across language expressions and languages themselves.

`util.py` contains utility functions, including the argument parser for running this tool from shell.

## Usage

From `ultk/examples` base directory:
1. Run `python -m colors.scripts.read_color_universe`: this generates the color universe (the 330 Munsell chips) to be re-used throughout. It does very light processing of the WCS data to generate a CSV file that can be easily read by ULTK.
- Consumes: `data/cnum-vhcm-lab-new.txt`
- Produces: `outputs/color_universe.csv`
2. Run `python -m colors.scripts.read_natural_languages`: this reads the natural language WCS data and produces ULTK `Language` objects. (NOTE: still a work-in-progress)
- Consumes: `data/data/term.txt`, `outputs/color_universe.csv`
- Produces: `outputs/natural_languages.yaml`
3. Run `python -m colors.scripts.measure_natural_languages`: this reads the ULTK natural languages and calculates the complexity and informativity of each language.
- Consumes: `outputs/natural_languages.yaml`
- Produces: `outputs/natural_language_information_plane.csv`


NOTE: below this is
Run `python analyze_data.py` from the `colors` folder. This calls `generate_wcs_languages` to generate the language data, then `complexity.py` to generate the complexity, then Several options are available as command-line settings.:


## Remaining Tasks

At the moment, the density of the probability function per major color term is not factored into the final graphs generated.

Additionally, the mutual information when probability is taken into account using an assigned probability to the weight matrix gives a large negative value, which should be impossible given the prior is entirely uniform.

At the moment


File renamed without changes.
14 changes: 7 additions & 7 deletions src/examples/colors/data/lang.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ LNUM LNAME LGEO LFW
18 Ucayali Campa Peru Allene Heitzman Jason D. Patent * Campa_DAT_new.txt new
19 Camsa * * * * Camsa_DAT_new.txt new
20 Candoshi * * * * Candoshi_DAT_new.txt new
21 Cavine{\x96}a * * * * Cavinena_DAT_new.txt new
21 Cavinena * * * * Cavinena_DAT_new.txt new
22 Cayapa Ecuador Neil Wiebe Scott Merrifield William R. Merrifield Cayapa_DAT_new.txt new
23 Ch{\x87}cobo * * * * Chacobo_DAT_new.txt new
23 Chacobo * * * * Chacobo_DAT_new.txt new
24 Chavacano * * * * Chavacano_DAT_new.txt new
25 Chayahuita * * * * Chayahuita_DAT_new.txt new
26 Chinanteco Mexico Al & Jeff Anderson Jason D. Patent * Chinantec_DAT_new.txt new
27 Chiquitano Bolivia M. Kr{\x9F}si, L. Rodriguez, E. Lyn (?) Jason Patent * Chiquitano_DAT_new.txt new
28 Chumburu * Hansford Scott Merrifield William R. Merrifield Chumburu_DAT_new.txt new
29 Cof{\x87}n * * * * Cofan_DAT_new.txt new
29 Cofan * * * * Cofan_DAT_new.txt new
30 Colorado * * * * Colorado_DAT_new.txt new
31 Eastern Cree Canada Lieselotte Bartlett Scott Merrifield William R. Merrifield Cree_DAT_new.txt new
32 Culina Peru, Brazil P. Adams and T. Fern{\x87}ndez Jason Patent * Culina_DAT_new.txt new
Expand All @@ -40,7 +40,7 @@ LNUM LNAME LGEO LFW
39 Guahibo Colombia Riena Kondo Kenneth J. Merrifield William R. Merrifield Guahibo_DAT_new.txt new
40 Guambiano * * * * Guambiano_DAT_new.txt new
41 Guarijio Mexico Ron and Sharon Stoltzfus Kenneth J. Merrifield William R. Merrifield Guarijio_DAT_new.txt new
42 Ng{\x8A}bere Panama Arosemena Patent, Jason * Guaymi_DAT_new.txt new
42 Ngbere Panama Arosemena Patent, Jason * Guaymi_DAT_new.txt new
43 Gunu Cameroon D. Heath Ken Merrifield Ken Merrifield Gunu_DAT_new.txt new
44 Halbi India F. Woods and P. Hopple Jason Patent * Halbi_DAT_new.txt new
45 Huasteco * * * * Huastec_DAT_new.txt new
Expand Down Expand Up @@ -72,11 +72,11 @@ LNUM LNAME LGEO LFW
71 Mikasuki U S A David West Scott Merrifield Scott Merrifield Mikasuki_DAT_new.txt new
72 Mixteco * * * * Mixtec_DAT_new.txt new
73 Mundu * * * * Mundu_DAT_new.txt new
74 M{\x9C}ra Pirah{\x8B} * * * * Mura-Piraha_DAT_new.txt new
74 Mura Piraha * * * * Mura-Piraha_DAT_new.txt new
75 Murle * * * * Murle_DAT_new.txt new
76 Murinbata * * * * Murrinh-Patha_DAT_new.txt new
77 Nafaanra * * * * Nafaanra_DAT_new.txt new
78 N{\x87}huatl * * * * Nahuatl_DAT_new.txt new
78 Nahuatl * * * * Nahuatl_DAT_new.txt new
79 Ocaina * * * * Ocaina_DAT_new.txt new
80 Papago * * * * Oodham_DAT_new.txt new
81 Patep * * * * Patep_DAT_new.txt new
Expand All @@ -85,7 +85,7 @@ LNUM LNAME LGEO LFW
84 Saramaccan * * * * Saramaccan_DAT_new.txt new
85 Seri * * * * Seri_DAT_new.txt new
86 Shipibo Peru Guillermo Ramirez Ken Merrifield Ken Merrifield Shipibo_DAT_new.txt new
87 Sirion{\x97} * * * * Siriono_DAT_new.txt new
87 Siriono * * * * Siriono_DAT_new.txt new
88 Slave Canada Monus Jason D. Patent * Slave_DAT_new.txt new
89 Sursurunga * * * * Sursurunga_DAT_new.txt new
90 Tabla * * * * Tabla_DAT_new.txt new
Expand Down
Binary file added src/examples/colors/data/zkrt18_prior.npy
Binary file not shown.
355 changes: 355 additions & 0 deletions src/examples/colors/demo.ipynb

Large diffs are not rendered by default.

Loading