Forums : Technical Issue Help

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.blackduck.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.

On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Ignoring files/folders, ie. javadocs?

Is it possible to ignore parts of a repository, for instance, we (http://www.ohloh.net/projects/3946) have our javadoc tree checked into svn above the root, and this gives the project an unhealthy weighting towards HTML.

Is there a way to ignore the /docs/ folder, or does this need to go into the wishlist forum?

Another question.. should I want to ignore the javadocs folder? It's mostly meaningless, as it's all generated, but it does add significant value to the project (but maybe not as much as writing proper docs of that length by hand), etc.

Also, there's some stuff generated by JavaCC/JJTree (http://javacc.dev.java.net/) and friends (ie. http://trac.uwcs.co.uk/choob/cgi-bin/trac.cgi/browser/trunk/src/uk/co/uwcs/choob/support/ObjectDBClauseParser.jj), they all have /*... Generated.. */ on the first line, and don't contribute to the project at all, is there a way to ignore them?

(For anyone who's curious, both of those are in SVN such that a user checking stuff out won't have to do random code/docs generation themselves.)

Faux over 18 years ago

Hi Faux,

The ability to ignore folders is a common request, and it's one we've been thinking seriously about implementing. It's pretty common for a project to include a lot of 3rd party libraries or build tools in their source control, and it's not correct to attribute these things to the project. It's really a question of developer resources at this point.

Personally, I feel that you shouldn't be so eager to ignore your docs folder. A lot of Ohloh users seem to be concerned about having a lot of XML or HTML in their projects, but I'm not sure where this concern comes from. Enlighten me?

Another feature we've tossed around is the ability to label directories as containing documentation or test code, although our ideas for this are a little more vague. This would help identify developers who don't write documentation or tests, and we could generate independent reports for the separate sections of code.

Ignoring source code that was generated by a tool is another feature we've been thinking about. It's not as high on our priority list right now, but we would like to filter out this type of code. I think it's doable by simply looking for some common phrases in the first comment block of a file.

You're not alone with these requests, and as we have resources available we'll be addressing them.

Thanks,
Robin

Robin Luckey over 18 years ago

Enlighten me?

The documentation generated by javadoc is just a transform of the sourcecode (and associated comments) into another form.

To pick a silly example (I don't feel this strongly about ignoring the javadocs, but..), say we had a subversion branch where we replaced all the tabs with eight-spaces to keep some people who dislike tabs happy? Should that branch be included?

This would help identify developers who don't write documentation..

Comments (ignoring ones that aren't just removed code, giving extra credit to appropriately formatted comments (ie. javadoc (/*) and doxygen (/!) etc.)) are probably an extra indication of documentation than HTML, especially if the HTML is measured by line?

Faux over 18 years ago

Ignoring folders/files of code simply included from other projects would be a very welcome addition, especially for the scripting language apps, where it is customary to pack all the components / libraries used into the application bundle (in the forum thread about PHP eats Ruby etc... some people complained about the amazing LOC count of php applications.I think removing 'included' code would help a lot in normalizing those cases, i.e. lots of stuff included makes php development real fast).

As a side note, a very useful metric would be (mostly for libraries / components / frameworks, I guess) the number of projects that bundle a given application i their distribution.
I have no idea how this could be gathered, though. Maybe checksumming every file and especially the directory listings (file names + sizes), and comparing them across projects?

Gaetano Giunta over 18 years ago

I would say that excluding directories manually from the normal statistics is significantly more important than trying to classify directories (e.g. as docs, generated, etc.). It'd still be nice to see the stats for them in individual commits, and such, but not in overall project stats or in overall user stats - one project I work on has attributed 93k lines of JS and 19k lines of CSS to one person because they checked in a JavaScript and HTML toolkit we use (the project's also now claiming to be 77% JS). If you classify things, you then get into the situation of trying to decide what is counted and what isn't, which is likely a minefield.

James Ross over 18 years ago

I like the idea of it being related to the RCS in some way, but (at least, for subversion), how about using properties?

These could be on either the root or specific directories, I'd think that individual directories (or files) would be better.

I personally think that, for legacy RCSes (where properties aren't avaliable), the robots.txt way would be better, but you'll have to be careful about defining where the root is (ie. it'd need to be the root of the import, as apposed to the root of the repository).

Faux about 18 years ago

If you want to implement something to ignore certain folders, I wouldn't use a file called ohloh.txt or something like that. Keep it generic. There are other sites out there who provide statistical data for projects.

I think the best way to go would be to define something publicly and then let others use those specs too if needed.

For example, call the file statrobot.txt and use the same specs as the robots.txt file used by web search engines.

Stefan Küng about 18 years ago

As robin mentioned, this has to be a high-request item (for many various reasons) and would be a great feature to have. Every project I work on that has an ohloh project page could actually use this feature (mostly 3rd party dependency sources that need to be ignored). That said, I'm sure there'd be some dissention on how to go about specifying paths to ignore or classify them.

Personally, I wouldn't want to have a file in my project's SCM system (whether it be CVS or SVN or otherwise) that was specific to ohloh if I didn't have to. A .ohlohignore or something similar to a .cvsignore might be fine, but it would seem better to keep the metadata with the context that needs it -- i.e., as part of the ohloh project page through the web interface. Especially given that it seems like project enlistment updates are progressing more automatic now, the stats would eventually sort themselves out per any ignore/classification settings.

sean about 18 years ago

I hope this hasn't been ignored, as I can't find it anywhere on the project admin page. Our project has about twice as much 3D model data stored in XML files as code, and it grossly distorts all the otherwise useful statistics Ohloh gathers.

Calder almost 16 years ago

I have a project just added that is basically only one files. But there's some upstream files in there, which totally skew a measurements. The project is a rewrite of some PHP code, with the PHP code still being included for reference... but that makes the project a mostly PHP project. Dang. ;-P

Jürgen A. Erhard over 15 years ago

I would also like this ignoring folders option. developing a cms and just adding fckeditor makes the project look like js when its actualy php.

dogmatic69 over 15 years ago

Agree. My project is full of VS project files, CBP project files and Codewarrior project files and is being classified as XML project and not C++! I would like to have a way to ingore certain files by masks and certain paths too.

Danny Angelo Ca... over 15 years ago

Not much to add other than +1

Graeme over 15 years ago

I also would love to see this feature. Especially for the original poster's Javadoc. It would be nice if project administrators could deselect certain languages from appearing as a part of their statistics.

anse's suggestion for an ohloh.txt would also make a lot of sense for our project as well.

david_jurgens about 15 years ago

This would be great for third party libraries.

Christoffer Niska over 13 years ago

Christoffer and all,

See: https://www.ohloh.net/blog/LatestUpdatesToIgnoringFilesandDirectories

Many projects now use this to good effect. Just remember, it takes a while for the request to be processed. It usually is in effect on the next update.

Thanks!

ssnow-blackduck over 13 years ago

@ssnow-blackduck: I actually found this seconds after I posted to this thread. Great feature.

Christoffer Niska over 13 years ago