-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KoLMafia ASH language #5401
base: main
Are you sure you want to change the base?
Conversation
You can do this in a couple of ways:
That said, if Total files found: 604 As an aside, two of the samples in this PR was waaaay too big for inclusion. |
It's not, but I was hoping that would be enough alone. How do I check the unique public user / repos given a search term? This one is problematic because the extension is shared with "AGS script" (and some other, less used languages, like Acorn script). then on the first two pages, 16 are KoLMafia ASH and 4 are AGS script. For something more specific like "item": I have to go further before I get to an AGS script, but I can see some matches on the final page.
Sorry, I had assumed that larger was better. What size should I aim for? |
You can use Harvester or the script I've been testing with for a while at https://github.com/lildude/linguist/blob/lildude/download-corpus/script/github-ext-pop - warning: this last one will poke GitHub's API abuse limits. Regarding the search: you can add multiple qualifiers and keywords which might help get a better result. I don't know if
We don't have a specific size limit (though 2.48MB is massive!!) rather we aim for overall general representation of the language as it is used. Repeated syntax only bloats the file and doesn't add much value to training the classifier. |
There are non-runnable ASH files (libraries), but I'd expect every repo to have at least one runnable file. |
Yeah, "void main item" gives 64 uniques (including some that only show up in smola's language-dataset), so nowhere near enough. |
Adds the new language "KoLMafia ASH".
Closes #5022.
Description
ASH is a currently unrecognised language used for scripting KoLMafia, a program for playing the online game "Kingdom of Loathing". There is documentation available at the KoLMafia wiki.
All ASH scripts have the ".ash" extension, currently owned by AGS Scripts.
As this is a conflicting extension, I should run the "Bayesian classifier" and add a heuristic if it doesn't classify well enough. How do I do this? Compiling into docker and running that against the files in the samples directory found them as "AGS Script", which makes me think that's not the right thing to do.
Checklist:
I am adding a new language.
I am adding new or changing current functionality