New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for generic file extensions #4936
Conversation
I'd like to know why these tests didn't fail when I ran them locally... |
I think I'm okay with this approach, assuming:
- We don't hardcode the list of "generic" extensions.
- We add a lot more tests in
fixtures/, with files from other languages using.sol, and actual Solidity samples.
I'd also like to better understand where we use the samples/ files, apart from end-to-end tests and training of the Bayesian classifier. Those two are fine because we use fixture files for end-to-end tests and we don't use the Bayesian classifier. If we usually rely on the samples/ files for other things, we might have problems...
We currently don't have any Solidity "samples", only test "fixtures" (our nomenclature really needs improvement…). Adding That being said, we can certainly add more real-world examples to the |
I understand that. My point was that we don't and can't have Solidity files in
Yes, that's my point 2 above. |
|
I've also added
Plus, adding this proves that more than one language can share a generic extension. |
|
Sorry @Alhadis. Things are crazy busy for me at the moment as we're reading the next release of GitHub Enterprise. I'll try take a look when I've got a chance when things quieten down a bit. |
|
I understand. @pchaigno What if I added a # generic.yml
# (Some concise but informative description of what this file does and how to update it)
extensions:
- ".1"
- ".2"
- ".3"
- ".4"
- ".5"
- ".6"
- ".7"
- ".8"
- ".9"
- ".in"
- ".mod"
- ".sol"
filenames:
- unoriginal_but_common_filename |
|
I'm looking at it right now. It might take me a few days to process it and respond, but I've got time this week for Linguist stuff. |
Some initial thoughts from a quick review, but I don't have any major objections from my initial quick review.
Additionally, do we really need quite so many fixtures as they're only used for testing the heuristics and aren't used for generating the samples.json used by the classifier?
I'll ponder this a bit more this week.
|
Thanks for the quick answer, @lildude. This is great news :) |
Co-authored-by: Colin Seymour <colin@github.com>
I'll leave those for a future PR. At the moment, I'm just anxious to get this under the door in time for the next Linguist release. |
|
I've pondered this for the week, and I comfortable with the approach taken here. This approach means we lose any further assistance from the XML and classifier strategies but I think this is OK as this we already expect the heuristics to be pretty specific to reduce the chances of falling through to the classifier. I think this will also be a rarely used "exception" (famous last words My only request for this PR is to remove the unneeded examples from the test fixtures. I think we only need fixtures to confirm each of the heuristic scenarios, eg we only need one for "ignored" and Gerber and a few to match the various combos of the Solidity regex. |
That was my line of thinking as well, but @pchaigno requested the extra samples for some reason, so… Will nuke 'em, hang 5. |
Only cos they weren't really testing the heuristic so I didn't see much need for them.
Only if you say please once more It'll mean I have to do a little more work curating the release notes, but that's fine in this case. |
@pchaigno requested this before I renamed and rewrote the PR description to place more emphasis on the generic-extension feature, rather than the reasons for adding it (Solidity). I feel what I've written nicely encapsulates everything that's been changed here… I'm still waiting on @pchaigno to give his approval. |
|
In general not a big fan of squashing non-trivial PRs, but should be okay here with the new description. |
|
Awesome. Now all that's left to restore balance to the universe is getting rid of that f\*(&#g colour proximity requirement. |
|
Oh. You merged it already. I just allowed merge commits and was going to then merge. Oh well. |
|
Uh, sorry, I mistook the sudden silence as "go ahead, you merge it". Sorry. |
|
Great news!
Does this mean that in about a week it will be live on github? |
You bet your sweet arse it does. |
|
Awesome! |
|
Will this have any other impact apart from automatically associating the .sol extension to Solidity? I wonder if Linguist is used for other things at GitHub |
|
No, it won't. I was careful to ensure the existing filename logic wasn't affected. As for what else Linguist is used for… I'm not staff, so I can't tell you, but I do know that Linguist is used by GitLab, and possibly other VCS-hosting services too. |
|
Thanks for your reply, @Alhadis |
|
Yep, GitLab is the only other consumer of Linguist I'm aware of, but they tend to lack a bit behind in terms of releases. So it may take some time before the change is visible on gitlab.com and related software. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Description
This pull-request adds non-trivial enhancements to Linguist's
HeuristicandExtensionstrategies:Generic extensions require files to satisfy a heuristic in order to match a candidate language.
languages.ymlentry.Linguist's tests will now fail if such a rule is found in
heuristics.yml(see #4904 for rationale).Textremoved from.buildsheuristic.It's assumed the author intended for it to return nothing.
INIremoved from.propsheuristic.Only 32 files are affected by this change, so it shouldn't have been added in the first place.
.solhas been added as a generic extension.Affected languages are
Gerber ImagesandSolidity. Support for the latter was the original motivator for this pull-request; see #4904 for prior discussion./cc @alcuadrado, @pchaigno