Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve Prolog vs. IDL detection (.pro files) #5053

Open
slayoo opened this issue Oct 17, 2020 · 3 comments
Open

improve Prolog vs. IDL detection (.pro files) #5053

slayoo opened this issue Oct 17, 2020 · 3 comments
Labels
Good First Issue This is a great opportunity to start contributing to Linguist Misidentified Language

Comments

@slayoo
Copy link

slayoo commented Oct 17, 2020

Apparently some .pro files are detected as Prolog, while others are given the IDL label.
The latter is related with the IDL/GDL/PV-WAVE language family.

In the GDL project, all .pro files are IDL source code, while they fall in both categories according to github:

Would be great to improve consistency (i.e., so that all are detected as IDL).

Likely relevant helper info:
https://github.com/blackducksoftware/ohcount/blob/master/src/parsers/idl_pvwave.rl

HTH,
Sylwester (originaly reported by @EdwardEisenhauer)

@lildude
Copy link
Member

lildude commented Oct 17, 2020

The .pro extension is associated with quite a few different languages and thus relies upon the heuristic:

https://github.com/github/linguist/blob/1df78c248cafa6b414651673846e26e388710df9/lib/linguist/heuristics.yml#L378-L391

... and samples to identify the language based on the content so in order to make things more consistent, we'd need to improve the heuristics and add a few more representative samples.

Please feel free to open a PR to help improve things.

@lildude lildude added Good First Issue This is a great opportunity to start contributing to Linguist Misidentified Language labels Oct 17, 2020
@slayoo
Copy link
Author

slayoo commented Oct 17, 2020

So the current rules are as follows:

- extensions: ['.pro']
  rules:
  - language: Proguard
    pattern: '^-(include\b.*\.pro$|keep\b|keepclassmembers\b|keepattributes\b)'
  - language: Prolog
    pattern: '^[^\[#]+:-'
  - language: INI
    pattern: 'last_client='
  - language: QMake
    and:
    - pattern: HEADERS
    - pattern: SOURCES
  - language: IDL
pattern: '^\s*function[ \w,]+$'

@slayoo
Copy link
Author

slayoo commented Oct 18, 2020

Some notes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue This is a great opportunity to start contributing to Linguist Misidentified Language
Projects
None yet
Development

No branches or pull requests

2 participants