Dear Open Hub Users,
We’re excited to announce that we will be moving the Open Hub Forum to
https://community.blackduck.com/s/black-duck-open-hub.
Beginning immediately, users can head over,
register,
get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.
On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at
[email protected]
Is it possible to ignore parts of a repository, for instance, we (http://www.ohloh.net/projects/3946) have our javadoc tree checked into svn above the root, and this gives the project an unhealthy weighting towards HTML.
Is there a way to ignore the /docs/ folder, or does this need to go into the wishlist forum?
Another question.. should I want to ignore the javadocs folder? It's mostly meaningless, as it's all generated, but it does add significant value to the project (but maybe not as much as writing proper docs of that length by hand), etc.
Also, there's some stuff generated by JavaCC/JJTree (http://javacc.dev.java.net/) and friends (ie. http://trac.uwcs.co.uk/choob/cgi-bin/trac.cgi/browser/trunk/src/uk/co/uwcs/choob/support/ObjectDBClauseParser.jj), they all have /*... Generated.. */ on the first line, and don't contribute to the project at all, is there a way to ignore them?
(For anyone who's curious, both of those are in SVN such that a user checking stuff out won't have to do random code/docs generation themselves.)
Hi Faux,
The ability to ignore folders is a common request, and it's one we've been thinking seriously about implementing. It's pretty common for a project to include a lot of 3rd party libraries or build tools in their source control, and it's not correct to attribute these things to the project. It's really a question of developer resources at this point.
Personally, I feel that you shouldn't be so eager to ignore your docs folder. A lot of Ohloh users seem to be concerned about having a lot of XML or HTML in their projects, but I'm not sure where this concern comes from. Enlighten me?
Another feature we've tossed around is the ability to label directories as containing documentation or test code, although our ideas for this are a little more vague. This would help identify developers who don't write documentation or tests, and we could generate independent reports for the separate sections of code.
Ignoring source code that was generated by a tool is another feature we've been thinking about. It's not as high on our priority list right now, but we would like to filter out this type of code. I think it's doable by simply looking for some common phrases in the first comment block of a file.
You're not alone with these requests, and as we have resources available we'll be addressing them.
Thanks,
Robin
Enlighten me?
The documentation generated by javadoc is just a transform
of the sourcecode (and associated comments) into another form.
To pick a silly example (I don't feel this strongly about ignoring the javadocs, but..), say we had a subversion branch where we replaced all the tabs with eight-spaces to keep some people who dislike tabs happy? Should that branch be included?
This would help identify developers who don't write documentation..
Comments (ignoring ones that aren't just removed code, giving extra credit to appropriately formatted comments (ie. javadoc (/*) and doxygen (/!) etc.)) are probably an extra indication of documentation than HTML, especially if the HTML is measured by line?
Ignoring folders/files of code simply included from other projects would be a very welcome addition, especially for the scripting language apps, where it is customary to pack all the components / libraries used into the application bundle (in the forum thread about PHP eats Ruby etc...
some people complained about the amazing LOC count of php applications.I think removing 'included' code would help a lot in normalizing those cases, i.e. lots of stuff included makes php development real fast).
As a side note, a very useful metric would be (mostly for libraries / components / frameworks, I guess) the number of projects that bundle a given application i their distribution.
I have no idea how this could be gathered, though. Maybe checksumming every file and especially the directory listings (file names + sizes), and comparing them across projects?
I would say that excluding directories manually from the normal statistics is significantly more important than trying to classify
directories (e.g. as docs, generated, etc.). It'd still be nice to see the stats for them in individual commits, and such, but not in overall project stats or in overall user stats - one project I work on has attributed 93k lines of JS and 19k lines of CSS to one person because they checked in a JavaScript and HTML toolkit we use (the project's also now claiming to be 77% JS). If you classify things, you then get into the situation of trying to decide what is counted and what isn't, which is likely a minefield.
I like the idea of it being related to the RCS in some way, but (at least, for subversion), how about using properties?
These could be on either the root or specific directories, I'd think that individual directories (or files) would be better.
I personally think that, for legacy RCSes (where properties aren't avaliable), the robots.txt way would be better, but you'll have to be careful about defining where the root
is (ie. it'd need to be the root of the import, as apposed to the root of the repository).
If you want to implement something to ignore certain folders, I wouldn't use a file called ohloh.txt
or something like that. Keep it generic. There are other sites out there who provide statistical data for projects.
I think the best way to go would be to define something publicly and then let others use those specs too if needed.
For example, call the file statrobot.txt
and use the same specs as the robots.txt file used by web search engines.
As robin mentioned, this has to be a high-request item (for many various reasons) and would be a great feature to have. Every project I work on that has an ohloh project page could actually use this feature (mostly 3rd party dependency sources that need to be ignored). That said, I'm sure there'd be some dissention on how to go about specifying paths to ignore or classify them.
Personally, I wouldn't want to have a file in my project's SCM system (whether it be CVS or SVN or otherwise) that was specific to ohloh if I didn't have to. A .ohlohignore or something similar to a .cvsignore might be fine, but it would seem better to keep the metadata with the context that needs it -- i.e., as part of the ohloh project page through the web interface. Especially given that it seems like project enlistment updates are progressing more automatic now, the stats would eventually sort themselves out per any ignore/classification settings.
I hope this hasn't been ignored, as I can't find it anywhere on the project admin page. Our project has about twice as much 3D model data stored in XML files as code, and it grossly distorts all the otherwise useful statistics Ohloh gathers.
I have a project just added that is basically only one files. But there's some upstream
files in there, which totally skew a measurements. The project is a rewrite of some PHP code, with the PHP code still being included for reference... but that makes the project a mostly PHP
project. Dang. ;-P
I would also like this ignoring folders option. developing a cms and just adding fckeditor makes the project look like js when its actualy php.
Agree. My project is full of VS project files, CBP project files and Codewarrior project files and is being classified as XML project and not C++! I would like to have a way to ingore certain files by masks and certain paths too.
Not much to add other than +1
I also would love to see this feature. Especially for the original poster's Javadoc. It would be nice if project administrators could deselect certain languages from appearing as a part of their statistics.
anse's suggestion for an ohloh.txt would also make a lot of sense for our project as well.
This would be great for third party libraries.
Christoffer and all,
See: https://www.ohloh.net/blog/LatestUpdatesToIgnoringFilesandDirectories
Many projects now use this to good effect. Just remember, it takes a while for the request to be processed. It usually is in effect on the next update.
Thanks!
@ssnow-blackduck: I actually found this seconds after I posted to this thread. Great feature.