Dear Open Hub Users,
We’re excited to announce that we will be moving the Open Hub Forum to
https://community.blackduck.com/s/black-duck-open-hub.
Beginning immediately, users can head over,
register,
get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.
On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at
[email protected]
Hey,
the LOC statistics for Drupal are definitely flawed somehow.
Ohloh reports 21,103 LOC for Drupal core but there are definitely more.
The reason might be that many PHP files in Drupal have a different extension - namely .module, .engine and .theme.
Without counting these, the statistics for Drupal are worthless, unfortunately, as more than half of Drupal's code lives in .module files.
As of today's Drupal CVS HEAD:
find drupal -type f ( -name *.php -o -name *.inc ) -exec egrep -vh '^$' {} \; | wc -l
27220
(= number of non-blank lines in .php and .inc files)
find drupal -type f ( -name *.module -o -name *.theme -o -name *.engine ) -exec egrep -vh '^$' {} \; | wc -l
27351
(= number of non-blank lines in .module, .theme and .engine files)
So, Ohloh is basically ignoring half of Drupal's code.
One solution would be to make either the file types that are used to calculate the LOC or the filetype->languate mapping a project-specific setting.
I'm afraid making file extensions project specific would allow many people to cheat
. Since PHP files always contain <?php
could it be a solution to search for the begin tag in non-.php files? Of course, there's also <? and <% but these are disabled by default and are no guarantee the file contains PHP (it could also be XML or ASP).
Greetings all,
Our detector uses file extensions and their contents to try to determine the language contained. As Frando suspects, we do NOT currently recognize .module, .theme and .engine files as php.
Dietrich - we have some disambiguation logic to try and tell if a file should be treated as X or Y. So, the rule COULD be something like:
if extension =~ /.module|.theme|.engine/ AND file.contents =~ /<?php/
I'm willing to try it out. These changes are always tricky cause we run this stuff against millions of files - there's always outliers that make life difficult. Frando, Dietrich - what do you think?
I think this would be a nice solution. :)
Yup, that should work. All PHP files must contain <?php
, so checking against that sounds like the best thing to do.
Here's a complete list of file endings that Drupal uses at the moment for PHP files:
.php
.inc
.module
.theme
.engine
.schema
.install
.profile
This applies to both Drupal (core) and Drupal (contributions).
Maybe just checking all text files against <?php
would be the easiest and most future-proof?
Thanks for your efforts in fixing this!
Wouldn't it be easier to use some mime magic on the non-binary files to figure out what they are?
The unix 'file' utility does a good job in figuring out the file type:
file index.php includes/common.inc modules/system/system.module
Gives
index.php: PHP script text
includes/common.inc: PHP script text
modules/system/system.module: PHP script text
The file utility does exactly what the introduced fix in Ohloh does, it reads the file looking for a PHP open tag. I tried this out myself.
$ file tagadelic.module
tagadelic.module: PHP script text
After removing the PHP open tag:
$ file tagadelic.module
tagadelic.module: ASCII C++ program text, with very long lines
So, I think the original solution is the best (no need to use third-party and *NIX only binaries).
Any news here?
bump