Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KoLMafia ASH language #5401

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

midgleyc
Copy link

Adds the new language "KoLMafia ASH".

Closes #5022.

Description

ASH is a currently unrecognised language used for scripting KoLMafia, a program for playing the online game "Kingdom of Loathing". There is documentation available at the KoLMafia wiki.

All ASH scripts have the ".ash" extension, currently owned by AGS Scripts.

As this is a conflicting extension, I should run the "Bayesian classifier" and add a heuristic if it doesn't classify well enough. How do I do this? Compiling into docker and running that against the files in the samples directory found them as "AGS Script", which makes me think that's not the right thing to do.

Checklist:

@midgleyc midgleyc requested a review from a team as a code owner May 31, 2021 10:18
@lildude
Copy link
Member

lildude commented Jul 12, 2021

As this is a conflicting extension, I should run the "Bayesian classifier" and add a heuristic if it doesn't classify well enough. How do I do this?

You can do this in a couple of ways:

  1. Checkout this branch locally and run bundle exec bin/github-linguist --breakdown against this repo
  2. Checkout this branch locally and run the classifier test directly: bundle exec script/cross-validation --test - this will produce a lot of output so you'll need to look through is closely to confirm it correctly classifies your files.

That said, if get_property is expected to appear in every file as per your search query, then the current usage is a long way from the 200 unique :user/:repo requirement we have:

Total files found: 604
Unique public user/repos: 49
Unique owners: 33

As an aside, two of the samples in this PR was waaaay too big for inclusion. relay_Guide.ash and synthesis.ash will need to be replaced with smaller samples before we merge this PR, once the popularity requirement has been met.

@midgleyc
Copy link
Author

That said, if get_property is expected to appear in every file as per your search query

It's not, but I was hoping that would be enough alone. How do I check the unique public user / repos given a search term?

This one is problematic because the extension is shared with "AGS script" (and some other, less used languages, like Acorn script). get_property was chosen because it is almost certain to not appear in another language's file. If I search for something generic like "string":
https://github.com/search?p=1&q=extension%3A.ash+string&type=Code

then on the first two pages, 16 are KoLMafia ASH and 4 are AGS script.

For something more specific like "item":
https://github.com/search?p=1&q=extension%3A.ash+item&type=Code

I have to go further before I get to an AGS script, but I can see some matches on the final page.

As an aside, two of the samples in this PR was waaaay too big for inclusion

Sorry, I had assumed that larger was better. What size should I aim for?

@lildude
Copy link
Member

lildude commented Jul 12, 2021

How do I check the unique public user / repos given a search term?

You can use Harvester or the script I've been testing with for a while at https://github.com/lildude/linguist/blob/lildude/download-corpus/script/github-ext-pop - warning: this last one will poke GitHub's API abuse limits.

Regarding the search: you can add multiple qualifiers and keywords which might help get a better result. I don't know if void is valid in "AGS Script", but it appears to occur in all this language's file from what I can see. A quick search for this returns only 1092 files which I suspect may still be too low for inclusion.

Sorry, I had assumed that larger was better. What size should I aim for?

We don't have a specific size limit (though 2.48MB is massive!!) rather we aim for overall general representation of the language as it is used. Repeated syntax only bloats the file and doesn't add much value to training the classifier.

@midgleyc
Copy link
Author

midgleyc commented Jul 12, 2021

void is valid in AGS script. All runnable ASH files should have a void main function (except those that just run everything at base level), but that's also valid in AGS.

There are non-runnable ASH files (libraries), but I'd expect every repo to have at least one runnable file.

@midgleyc
Copy link
Author

Yeah, "void main item" gives 64 uniques (including some that only show up in smola's language-dataset), so nowhere near enough.

@lildude lildude changed the title Add KoLMafia ASH language (fixes #5022) Add KoLMafia ASH language Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASH scripts misidentified as AGS
2 participants