Github is wrong and maybe it's your fault
Yes, you read correct. It could be your fault.
Probably you have already noted that your repository language on Github is classified in a wrong way. Sometimes you publish a dotnet application and the repository detect it as a javascript repository. Have you noted that sometime?
In most cases it is not a relevant information and almost everybody ignores it. But this is the way Github creates all the statics and determine the code adoption by their users. Now imagine how much a language is considered adopted by the platform, but it is not the truth?
You can imagine how this report is impacted by this statistics?
https://octoverse.github.com/
But how the Github do that?
The platform uses a a Ruby library called Linguist to determine the code percentage in the repository based on multiple rules. With that said, sometimes the tool reads the documentation or the "modules" folder and consider that files as binary files, wich changes the statistics and obviously the "repository language" and consequently the reports about code adoption and usage.
Very impressive, no?
Ok, but how to fix that?
This is pretty simple. According to the Linguist documentation, the rules used to calculate the statistics could be stored on the .gitattributes file. On that, you can override or create new rules and configures the linguist to ignore some files or folders.
And this is the most important thing for me. Ignore files that are listed on the project but are not so relevant to the statistics.
Recommended by LinkedIn
An example? The swagger files. Despite it being part of the project and being inside the repository, that doesn't make the project or repository based on javascript, right?
In my scenario, the application is entirely dotnet and has a frontend with swagger and due to the amount of javascript files present in this library that I use - but do not do maintenance -, the linguist mistakenly inferred that my repository is Javascript.
So, what I did? I simply created the .gitattributes file with the following content and few seconds after pushing the code to the repository, the statistics were correctly updated.
# Example of a `.gitattributes` file which reclassifies the files as csharp
*.cs linguist-language=csharp
*.csproj linguist-language=csharp
*.sln linguist-language=csharp
# And ignore those repositories considering them as documentation
src/*/wwwroot/* linguist-documentation
wwwroot/** linguist-documentation
*/bin/* linguist-documentation
And this is the report after uploading the .gitattributes file.
Much better, right?
And you? Have you detected this issue on your repositories too?
Please comment below and share this article with your colleagues.
Thanks,
Lucas Massena
Fantastic! My repo for my biggest webdev project so far is roughly 50/50 python and JavaScript. Imagine my frustration upon seeing that it's telling me 99.3% Python! The whole point of it is to be a demonstration of not only my Django skills, but my Fullstack skills, which includes JS and React. So thanks for this, I'll try this out when I get home today.
Thank you! I've been watching this scenario last week.
This is very important! Thanks!
Thanks for sharing.