Posted
about 15 years
ago
by
Arto Bendiken
The N-Triples format is the lowest common denominator for RDF
serialization formats, and turns out to be a very good fit to the Unix
paradigm of line-oriented, whitespace-separated data processing. In this
tutorial we'll see how to process
... [More]
N-Triples data by pipelining
standard Unix tools such as grep, wc, cut, awk, sort, uniq,
head and tail.
To follow along, you will need access to a Unix box (Mac OS X, Linux, or
BSD) with a Bash-compatible shell. We'll be using curl to
fetch data over HTTP, but you can substitute wget or fetch if necessary.
A couple of the examples require a modern AWK version such as
gawk or mawk; on Linux distributions you should be okay
by default, but on Mac OS X you will need to install gawk or mawk from
MacPorts as follows:
$ sudo port install mawk
$ alias awk=mawk
Grokking N-Triples
Each N-Triples line encodes one RDF statement, also known as a triple.
Each line consists of the subject (a URI or a blank node identifier), one or
more characters of whitespace, the predicate (a URI), some more whitespace,
and finally the object (a URI, blank node identifier, or literal) followed
by a dot and a newline. For example, the following N-Triples statement
asserts the title of my website:
<http://ar.to/> <http://purl.org/dc/terms/title> "Arto Bendiken" .
This is an almost perfect format for Unix tooling; the only possible further
improvement would have been to define the statement component separator to
be a tab character, which would have simplified obtaining the object
component of statements -- as we'll see in a bit.
Getting N-Triples
Many RDF data dumps are made available as compressed N-Triples files.
DBpedia, the RDFization of Wikipedia, is a prominent example. For
purposes of this tutorial I've prepared an N-Triples dataset containing all
Drupal-related RDF statements from DBpedia 3.4, which is the latest
release at the moment and reflects Wikipedia as of late September 2009.
I prepared the sample dataset by downloading all English-language core
datasets (20 N-Triples files totaling 2.1 GB when compressed) and crunching
through them as follows:
$ bzgrep Drupal *.nt.bz2 > drupal.nt
To save you from gigabyte-sized downloads and an hour of data crunching, you
can just grab a copy of the resulting drupal.nt file as
follows:
$ curl http://blog.datagraph.org/2010/03/grepping-ntriples/drupal.nt > drupal.nt
The sample dataset totals 294 RDF statements and weighs in at 70 KB.
Counting N-Triples
The first thing we want to do is count the number of triples in an N-Triples
dataset. This is straightforward to do, since each triple is represented by
one line in an N-Triples input file and there are a number of Unix tools
that can be used to count input lines. For example, we could use either of
the following commands:
$ cat drupal.nt | wc -l
294
$ cat drupal.nt | awk 'END { print NR }'
294
Since we'll be using a lot more of AWK throughout this tutorial, let's
stick with awk and define a handy shell alias for this operation:
$ alias rdf-count="awk 'END { print NR }'"
$ cat drupal.nt | rdf-count
294
Note that, for reasons of comprehensibility, the previous examples as well
as most of the subsequent ones assume that we're dealing with "clean"
N-Triples datasets that don't contain comment lines or other miscellania.
The DBpedia data dumps fit this bill very well. However, further onwards I
will give "fortified" versions of these commands that can correctly deal
with arbitrary N-Triples files.
Measuring N-Triples
We at Datagraph frequently use the N-Triples representation as the canonical
lexical form of an RDF statement, and work with content-addressable storage
systems for RDF data that in fact store statements using their
N-Triples representation. In such cases, it is often useful to know some
statistical characteristics of the data to be loaded in a mass import, so as
to e.g. be able to fine-tune the underlying storage for optimum space
efficiency.
A first useful statistic is to know the typical size of a datum, i.e. the
line length of an N-Triples statement, in the dataset we're dealing with.
AWK yields us N-Triples line lengths without much trouble:
$ alias rdf-lengths="awk '{ print length }'"
$ cat drupal.nt | rdf-lengths | head -n5
162
150
155
137
150
Note that N-Triples is an ASCII format, so the numbers above reflect both
the byte sizes of input lines as well as the ASCII character count of input
lines. All non-ASCII characters are escaped in N-Triples, and for present
purposes we'll be talking in terms of ASCII characters only.
The above list of line lengths in and of itself won't do us much good; we
want to obtain aggregate information for the whole dataset at hand, not for
individual statements. It's too bad that Unix doesn't provide commands for
simple numeric aggregate operations such as the minimum, maximum and average
of a list of numbers, so let's see if we can remedy that.
One way to define such operations would be to pipe the above output to an
RPN shell calculator such as dc and have it perform the needed
calculations. The complexity of this would go somewhat beyond mere shell
aliases, however. Thankfully, it turns out that AWK is well-suited to
writing these aggregate operations as well. Here's how we can extend
our earlier pipeline to boil the list of line lengths down to an average:
$ alias avg="awk '{ s += \$1 } END { print s / NR }'"
$ cat drupal.nt | rdf-lengths | avg
242.517
The above, incidentally, is an example of a simple map/reduce
operation: a sequence of input values is mapped through a function, in
this case length(line), to give a sequence of output values (the line
lengths) that is then reduced to a single aggregate value (the average
line length). Though I won't go further into this just now, it is worth
mentioning in passing that N-Triples is an ideal format for massively
parallel processing of RDF data using Hadoop and the like.
Now, we can still optimize and simplify the above some by combining both
steps of the operation into a single alias that outputs an average line
length for the given input stream, like so:
$ alias rdf-length-avg="awk '\
{ s += length }
END { print s / NR }'"
Likewise, it doesn't take much more to define an alias for obtaining the
maximum line length in the input dataset:
$ alias rdf-length-max="awk '\
BEGIN { n = 0 } \
{ if (length > n) n = length } \
END { print n }'"
Getting the minimum line length is only slightly more complicated. Instead
of comparing against a zero baseline like above, we need to instead define a
"roof" value to compare against. In the following, I've picked an
arbitrarily large number, making the (at present) reasonable assumption that
no N-Triples line will be longer than a billion ASCII characters, which
would amount to somewhat less than a binary gigabyte:
$ alias rdf-length-min="awk '\
BEGIN { n = 1e9 } \
{ if (length > 0 && length < n) n = length } \
END { print (n < 1e9 ? n : 0) }'"
Now that we have some aggregate operations to crunch N-Triples data with,
let's analyze our sample DBpedia dataset using the three aliases defined
above:
$ cat drupal.nt | rdf-length-avg
242.517
$ cat drupal.nt | rdf-length-max
2179
$ cat drupal.nt | rdf-length-min
84
We can see from the output that N-Triples line lengths in this dataset vary
considerably: from less than a hundred bytes to several kilobytes, but being
on average in the range of two hundred bytes. This variability is to be
expected for DBpedia data, given that many RDF statements in such a dataset
contain a long textual description as their object literal whereas others
contain merely a simple integer literal.
Many other statistics, such as the median line length or the standard
deviation of the line lengths, could conceivably be obtained in a manner
similar to what I've shown above. I'll leave those as exercises for the
reader, however, as further stats regarding the raw N-Triples lines are
unlikely to be all that generally interesting.
Parsing N-Triples
It's time to move on to getting at the three components -- the subject, the
predicate and the object -- that constitute RDF statements.
We have two straightforward choices for obtaining the subject and predicate:
the cut command and good old awk. I'll show both aliases:
$ alias rdf-subjects="cut -d' ' -f 1 | uniq"
$ alias rdf-subjects="awk '{ print \$1 }' | uniq"
While cut might shave off some microseconds compared to awk here, AWK is
still the better choice for the general case, as it allows us to expand the
alias definition to ignore empty lines and comments, as we'll see later. On
our sample data, though, either form works fine.
You may have noticed and wondered about the pipelined uniq after cut and
awk. This is simply a low-cost, low-grade deduplication filter: it drops
consequent duplicate values. For an ordered dataset (where the input
N-Triples lines are already sorted in lexical order), it will get rid of all
duplicate subjects. In an unordered dataset, it won't do much good, but it
won't do much harm either (what's a microsecond here or there?)
To fully deduplicate the list of subjects for a (potentially) unordered
dataset, apply another uniq filter after a sort operation as follows:
$ cat drupal.nt | rdf-subjects | sort | uniq | head -n5
<http://dbpedia.org/resource/Acquia_Drupal>
<http://dbpedia.org/resource/Adland>
<http://dbpedia.org/resource/Advomatic>
<http://dbpedia.org/resource/Apadravya>
<http://dbpedia.org/resource/Application_programming_interface>
I've not made sort an integral part of the rdf-subjects alias because
sorting the subjects is an expensive operation with resource usage
proportional to the number of statements processed; when processing a
billion-triple N-Triples stream, it is usually simply better to not care too
much about ordering.
Getting the predicates from N-Triples data works exactly the same way as
getting the subjects:
$ alias rdf-predicates="cut -d' ' -f 2 | uniq"
$ alias rdf-predicates="awk '{ print \$2 }' | uniq"
Again, you can apply sort in conjunction with uniq to get the list of
unique predicate URIs in the dataset:
$ cat drupal.nt | rdf-predicates | sort | uniq | tail -n5
<http://www.w3.org/2000/01/rdf-schema#label>
<http://www.w3.org/2004/02/skos/core#subject>
<http://xmlns.com/foaf/0.1/depiction>
<http://xmlns.com/foaf/0.1/homepage>
<http://xmlns.com/foaf/0.1/page>
Obtaining the object component of N-Triples statements, however, is somewhat more
complicated than getting the subject or the predicate. This is due to the
fact that object literals can contain whitespace that will throw off the
whitespace-separated field handling of cut and awk that we've relied on
so far. Not to worry, AWK can still get us the results we want, but I won't
attempt to explain how the following alias works; just be happy that it
does:
$ alias rdf-objects="awk '{ ORS=\"\"; for (i=3;i<=NF-1;i++) print \$i \" \"; print \"\n\" }' | uniq"
The output of rdf-objects is the N-Triples encoded object URI, blank node
identifier or object literal. URIs are output in the same format as
subjects and predicates, with enclosing angle brackets; language-tagged
literals include the language tag, and datatyped literals include the
datatype URI:
$ cat drupal.nt | rdf-objects | sort | uniq | head -n5
"09"^^<http://www.w3.org/2001/XMLSchema#integer>
"16"^^<http://www.w3.org/2001/XMLSchema#integer>
"2001-01"^^<http://www.w3.org/2001/XMLSchema#gYearMonth>
"2009"^^<http://www.w3.org/2001/XMLSchema#integer>
"6.14"^^<http://www.w3.org/2001/XMLSchema#decimal>
Another very useful operation to have is getting the list of object literal
datatypes used in an N-Triples dataset. This is also a somewhat involved
alias definition, and requires a modern AWK version such as gawk
or mawk:
$ alias rdf-datatypes="awk -F'\x5E' '/\"\^\^</ { print substr(\$3, 1, length(\$3)-2) }' | uniq"
$ cat drupal.nt | rdf-datatypes | sort | uniq
<http://www.w3.org/2001/XMLSchema#decimal>
<http://www.w3.org/2001/XMLSchema#gYearMonth>
<http://www.w3.org/2001/XMLSchema#integer>
As we can see, most object literals in this dataset are untyped strings, but
there are some decimal and integer values as well as year + month literals.
Aliasing N-Triples
As promised, here follow more robust versions of all the aforementioned Bash
aliases. Just copy and paste the following code snippet into your
~/.bash_aliases or ~/.bash_profile file, and you will always have these
aliases available when working with N-Triples data on the command line.
# N-Triples aliases from http://blog.datagraph.org/2010/03/grepping-ntriples
alias rdf-count="awk '/^\s*[^#]/ { n += 1 } END { print n }'"
alias rdf-lengths="awk '/^\s*[^#]/ { print length }'"
alias rdf-length-avg="awk '/^\s*[^#]/ { n += 1; s += length } END { print s/n }'"
alias rdf-length-max="awk 'BEGIN { n=0 } /^\s*[^#]/ { if (length>n) n=length } END { print n }'"
alias rdf-length-min="awk 'BEGIN { n=1e9 } /^\s*[^#]/ { if (length>0 && length<n) n=length } END { print (n<1e9 ? n : 0) }'"
alias rdf-subjects="awk '/^\s*[^#]/ { print \$1 }' | uniq"
alias rdf-predicates="awk '/^\s*[^#]/ { print \$2 }' | uniq"
alias rdf-objects="awk '/^\s*[^#]/ { ORS=\"\"; for (i=3;i<=NF-1;i++) print \$i \" \"; print \"\n\" }' | uniq"
alias rdf-datatypes="awk -F'\x5E' '/\"\^\^</ { print substr(\$3, 2, length(\$3)-4) }' | uniq"
I should also note that though I've spoken throughout only in terms of
N-Triples, most of the above aliases will work fine also for input in
N-Quads format.
In the next installments of RDF for Intrepid Unix Hackers, we'll attempt
something a little more ambitious: building a rdf-query alias to perform
subject-predicate-object queries on N-Triples input. We'll also see
what to do if your RDF data isn't already in N-Triples format,
learning how to install and use the Raptor RDF Parser Library to
convert RDF data between the various popular RDF serialization formats.
Stay tuned.
Lest there be any doubt, all the code in this tutorial is hereby
released into the public domain using the Unlicense. You are free to
copy, modify, publish, use, sell and distribute it in any way you please,
with or without attribution. [Less]
|
Posted
about 15 years
ago
by
Arto Bendiken
The N-Triples format is the lowest common denominator for RDF
serialization formats, and turns out to be a very good fit to the Unix
paradigm of line-oriented, whitespace-separated data processing. In this
tutorial we'll see how to process
... [More]
N-Triples data by pipelining
standard Unix tools such as grep, wc, cut, awk, sort, uniq,
head and tail.
To follow along, you will need access to a Unix box (Mac OS X, Linux, or
BSD) with a Bash-compatible shell. We'll be using curl to
fetch data over HTTP, but you can substitute wget or fetch if necessary.
A couple of the examples require a modern AWK version such as
gawk or mawk; on Linux distributions you should be okay
by default, but on Mac OS X you will need to install gawk or mawk from
MacPorts as follows:
$ sudo port install mawk
$ alias awk=mawk
Grokking N-Triples
Each N-Triples line encodes one RDF statement, also known as a triple.
Each line consists of the subject (a URI or a blank node identifier), one or
more characters of whitespace, the predicate (a URI), some more whitespace,
and finally the object (a URI, blank node identifier, or literal) followed
by a dot and a newline. For example, the following N-Triples statement
asserts the title of my website:
<http://ar.to/> <http://purl.org/dc/terms/title> "Arto Bendiken" .
This is an almost perfect format for Unix tooling; the only possible further
improvement would have been to define the statement component separator to
be a tab character, which would have simplified obtaining the object
component of statements -- as we'll see in a bit.
Getting N-Triples
Many RDF data dumps are made available as compressed N-Triples files.
DBpedia, the RDFization of Wikipedia, is a prominent example. For
purposes of this tutorial I've prepared an N-Triples dataset containing all
Drupal-related RDF statements from DBpedia 3.4, which is the latest
release at the moment and reflects Wikipedia as of late September 2009.
I prepared the sample dataset by downloading all English-language core
datasets (20 N-Triples files totaling 2.1 GB when compressed) and crunching
through them as follows:
$ bzgrep Drupal *.nt.bz2 > drupal.nt
To save you from gigabyte-sized downloads and an hour of data crunching, you
can just grab a copy of the resulting drupal.nt file as
follows:
$ curl http://blog.datagraph.org/2010/03/grepping-ntriples/drupal.nt > drupal.nt
The sample dataset totals 294 RDF statements and weighs in at 70 KB.
Counting N-Triples
The first thing we want to do is count the number of triples in an N-Triples
dataset. This is straightforward to do, since each triple is represented by
one line in an N-Triples input file and there are a number of Unix tools
that can be used to count input lines. For example, we could use either of
the following commands:
$ cat drupal.nt | wc -l
294
$ cat drupal.nt | awk 'END { print NR }'
294
Since we'll be using a lot more of AWK throughout this tutorial, let's
stick with awk and define a handy shell alias for this operation:
$ alias rdf-count="awk 'END { print NR }'"
$ cat drupal.nt | rdf-count
294
Note that, for reasons of comprehensibility, the previous examples as well
as most of the subsequent ones assume that we're dealing with "clean"
N-Triples datasets that don't contain comment lines or other miscellania.
The DBpedia data dumps fit this bill very well. However, further onwards I
will give "fortified" versions of these commands that can correctly deal
with arbitrary N-Triples files.
Measuring N-Triples
We at Datagraph frequently use the N-Triples representation as the canonical
lexical form of an RDF statement, and work with content-addressable storage
systems for RDF data that in fact store statements using their
N-Triples representation. In such cases, it is often useful to know some
statistical characteristics of the data to be loaded in a mass import, so as
to e.g. be able to fine-tune the underlying storage for optimum space
efficiency.
A first useful statistic is to know the typical size of a datum, i.e. the
line length of an N-Triples statement, in the dataset we're dealing with.
AWK yields us N-Triples line lengths without much trouble:
$ alias rdf-lengths="awk '{ print length }'"
$ cat drupal.nt | rdf-lengths | head -n5
162
150
155
137
150
Note that N-Triples is an ASCII format, so the numbers above reflect both
the byte sizes of input lines as well as the ASCII character count of input
lines. All non-ASCII characters are escaped in N-Triples, and for present
purposes we'll be talking in terms of ASCII characters only.
The above list of line lengths in and of itself won't do us much good; we
want to obtain aggregate information for the whole dataset at hand, not for
individual statements. It's too bad that Unix doesn't provide commands for
simple numeric aggregate operations such as the minimum, maximum and average
of a list of numbers, so let's see if we can remedy that.
One way to define such operations would be to pipe the above output to an
RPN shell calculator such as dc and have it perform the needed
calculations. The complexity of this would go somewhat beyond mere shell
aliases, however. Thankfully, it turns out that AWK is well-suited to
writing these aggregate operations as well. Here's how we can extend
our earlier pipeline to boil the list of line lengths down to an average:
$ alias avg="awk '{ s += \$1 } END { print s / NR }'"
$ cat drupal.nt | rdf-lengths | avg
242.517
The above, incidentally, is an example of a simple map/reduce
operation: a sequence of input values is mapped through a function, in
this case length(line), to give a sequence of output values (the line
lengths) that is then reduced to a single aggregate value (the average
line length). Though I won't go further into this just now, it is worth
mentioning in passing that N-Triples is an ideal format for massively
parallel processing of RDF data using Hadoop and the like.
Now, we can still optimize and simplify the above some by combining both
steps of the operation into a single alias that outputs an average line
length for the given input stream, like so:
$ alias rdf-length-avg="awk '\
{ s += length }
END { print s / NR }'"
Likewise, it doesn't take much more to define an alias for obtaining the
maximum line length in the input dataset:
$ alias rdf-length-max="awk '\
BEGIN { n = 0 } \
{ if (length > n) n = length } \
END { print n }'"
Getting the minimum line length is only slightly more complicated. Instead
of comparing against a zero baseline like above, we need to instead define a
"roof" value to compare against. In the following, I've picked an
arbitrarily large number, making the (at present) reasonable assumption that
no N-Triples line will be longer than a billion ASCII characters, which
would amount to somewhat less than a binary gigabyte:
$ alias rdf-length-min="awk '\
BEGIN { n = 1e9 } \
{ if (length > 0 && length < n) n = length } \
END { print (n < 1e9 ? n : 0) }'"
Now that we have some aggregate operations to crunch N-Triples data with,
let's analyze our sample DBpedia dataset using the three aliases defined
above:
$ cat drupal.nt | rdf-length-avg
242.517
$ cat drupal.nt | rdf-length-max
2179
$ cat drupal.nt | rdf-length-min
84
We can see from the output that N-Triples line lengths in this dataset vary
considerably: from less than a hundred bytes to several kilobytes, but being
on average in the range of two hundred bytes. This variability is to be
expected for DBpedia data, given that many RDF statements in such a dataset
contain a long textual description as their object literal whereas others
contain merely a simple integer literal.
Many other statistics, such as the median line length or the standard
deviation of the line lengths, could conceivably be obtained in a manner
similar to what I've shown above. I'll leave those as exercises for the
reader, however, as further stats regarding the raw N-Triples lines are
unlikely to be all that generally interesting.
Parsing N-Triples
It's time to move on to getting at the three components -- the subject, the
predicate and the object -- that constitute RDF statements.
We have two straightforward choices for obtaining the subject and predicate:
the cut command and good old awk. I'll show both aliases:
$ alias rdf-subjects="cut -d' ' -f 1 | uniq"
$ alias rdf-subjects="awk '{ print \$1 }' | uniq"
While cut might shave off some microseconds compared to awk here, AWK is
still the better choice for the general case, as it allows us to expand the
alias definition to ignore empty lines and comments, as we'll see later. On
our sample data, though, either form works fine.
You may have noticed and wondered about the pipelined uniq after cut and
awk. This is simply a low-cost, low-grade deduplication filter: it drops
consequent duplicate values. For an ordered dataset (where the input
N-Triples lines are already sorted in lexical order), it will get rid of all
duplicate subjects. In an unordered dataset, it won't do much good, but it
won't do much harm either (what's a microsecond here or there?)
To fully deduplicate the list of subjects for a (potentially) unordered
dataset, apply another uniq filter after a sort operation as follows:
$ cat drupal.nt | rdf-subjects | sort | uniq | head -n5
<http://dbpedia.org/resource/Acquia_Drupal>
<http://dbpedia.org/resource/Adland>
<http://dbpedia.org/resource/Advomatic>
<http://dbpedia.org/resource/Apadravya>
<http://dbpedia.org/resource/Application_programming_interface>
I've not made sort an integral part of the rdf-subjects alias because
sorting the subjects is an expensive operation with resource usage
proportional to the number of statements processed; when processing a
billion-triple N-Triples stream, it is usually simply better to not care too
much about ordering.
Getting the predicates from N-Triples data works exactly the same way as
getting the subjects:
$ alias rdf-predicates="cut -d' ' -f 2 | uniq"
$ alias rdf-predicates="awk '{ print \$2 }' | uniq"
Again, you can apply sort in conjunction with uniq to get the list of
unique predicate URIs in the dataset:
$ cat drupal.nt | rdf-predicates | sort | uniq | tail -n5
<http://www.w3.org/2000/01/rdf-schema#label>
<http://www.w3.org/2004/02/skos/core#subject>
<http://xmlns.com/foaf/0.1/depiction>
<http://xmlns.com/foaf/0.1/homepage>
<http://xmlns.com/foaf/0.1/page>
Obtaining the object component of N-Triples statements, however, is somewhat more
complicated than getting the subject or the predicate. This is due to the
fact that object literals can contain whitespace that will throw off the
whitespace-separated field handling of cut and awk that we've relied on
so far. Not to worry, AWK can still get us the results we want, but I won't
attempt to explain how the following alias works; just be happy that it
does:
$ alias rdf-objects="awk '{ ORS=\"\"; for (i=3;i<=NF-1;i++) print \$i \" \"; print \"\n\" }' | uniq"
The output of rdf-objects is the N-Triples encoded object URI, blank node
identifier or object literal. URIs are output in the same format as
subjects and predicates, with enclosing angle brackets; language-tagged
literals include the language tag, and datatyped literals include the
datatype URI:
$ cat drupal.nt | rdf-objects | sort | uniq | head -n5
"09"^^<http://www.w3.org/2001/XMLSchema#integer>
"16"^^<http://www.w3.org/2001/XMLSchema#integer>
"2001-01"^^<http://www.w3.org/2001/XMLSchema#gYearMonth>
"2009"^^<http://www.w3.org/2001/XMLSchema#integer>
"6.14"^^<http://www.w3.org/2001/XMLSchema#decimal>
Another very useful operation to have is getting the list of object literal
datatypes used in an N-Triples dataset. This is also a somewhat involved
alias definition, and requires a modern AWK version such as gawk
or mawk:
$ alias rdf-datatypes="awk -F'\x5E' '/\"\^\^</ { print substr(\$3, 1, length(\$3)-2) }' | uniq"
$ cat drupal.nt | rdf-datatypes | sort | uniq
<http://www.w3.org/2001/XMLSchema#decimal>
<http://www.w3.org/2001/XMLSchema#gYearMonth>
<http://www.w3.org/2001/XMLSchema#integer>
As we can see, most object literals in this dataset are untyped strings, but
there are some decimal and integer values as well as year + month literals.
Aliasing N-Triples
As promised, here follow more robust versions of all the aforementioned Bash
aliases. Just copy and paste the following code snippet into your
~/.bash_aliases or ~/.bash_profile file, and you will always have these
aliases available when working with N-Triples data on the command line.
# N-Triples aliases from http://blog.datagraph.org/2010/03/grepping-ntriples
alias rdf-count="awk '/^\s*[^#]/ { n += 1 } END { print n }'"
alias rdf-lengths="awk '/^\s*[^#]/ { print length }'"
alias rdf-length-avg="awk '/^\s*[^#]/ { n += 1; s += length } END { print s/n }'"
alias rdf-length-max="awk 'BEGIN { n=0 } /^\s*[^#]/ { if (length>n) n=length } END { print n }'"
alias rdf-length-min="awk 'BEGIN { n=1e9 } /^\s*[^#]/ { if (length>0 && length<n) n=length } END { print (n<1e9 ? n : 0) }'"
alias rdf-subjects="awk '/^\s*[^#]/ { print \$1 }' | uniq"
alias rdf-predicates="awk '/^\s*[^#]/ { print \$2 }' | uniq"
alias rdf-objects="awk '/^\s*[^#]/ { ORS=\"\"; for (i=3;i<=NF-1;i++) print \$i \" \"; print \"\n\" }' | uniq"
alias rdf-datatypes="awk -F'\x5E' '/\"\^\^</ { print substr(\$3, 2, length(\$3)-4) }' | uniq"
I should also note that though I've spoken throughout only in terms of
N-Triples, most of the above aliases will work fine also for input in
N-Quads format.
In the next installments of RDF for Intrepid Unix Hackers, we'll attempt
something a little more ambitious: building a rdf-query alias to perform
subject-predicate-object queries on N-Triples input. We'll also see what to
do if your RDF data isn't already in N-Triples format, learning how to
install and use the Raptor RDF Parser Library to convert RDF data
between the various popular RDF serialization formats. Stay
tuned.
Lest there be any doubt, all the code in this tutorial is hereby
released into the public domain using the Unlicense. You are free to
copy, modify, publish, use, sell and distribute it in any way you please,
with or without attribution. [Less]
|
Posted
about 15 years
ago
by
Ben Lavender
RDF.rb is easily the most fun RDF
library I've used. It uses Ruby's dynamic system of mixins to create
a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other
languages--Scala traits, for
example, are
... [More]
almost exactly functionally equivalent. They're
distinctly more powerful than Java interfaces or abstract classes. A
mixin is basically an interface and an abstract class rolled into one.
Rather than extend an abstract class, one includes a mixin into your
own class. A mixin will usually require that a given class implement
a particular method. Ruby's own Enumerable class, for example,
requires that implementing classes implement #each. For that tiny
bit of trouble, you get a ton of methods (listed
here), including
iterators, mapping, partitions, conversion to arrays, and more. (If
you're new to Ruby, it might also help you to know that #method_name
means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact,
little more than an in-memory reference implementation for 4 traits:
RDF::Enumerable, RDF::Mutable, RDF::Queryable, and
RDF::Durable. RDF::Sesame::Repository has the exact same
interface as the in-memory representation, but is based entirely on a
Sesame server. In order to work as a repository,
RDF::Sesame::Repository only had to extend the reference
implementation and implement #each, #insert_statement, and
#delete_statement. Nice! Of course, implementing those took some
doing, but it's still exceedingly easy.
RDF::Enumerable is
the key here. For implementing an #each that yields
RDF::Statement objects, one gains a ton of functionality: #each_subject,
#each_predicate, #each_object, #each_context, #has_subject?,
#has_triple?, and more. It's a key abstraction that provides huge
amounts of functionality.
But the module system goes the other way--not only is it easy to
implement new RDF models, existing ones are easily extended. I
recently wrote RDF::Isomorphic,
which extends RDF::Enumerable with #bijection_to and
#isomorphic_with? methods. The module-based system provided by
RDF.rb means that my isomorphic methods are now available on
RDF::Sesame::Repositories, and indeed anything which includes
RDF::Enumerable. This is everything from repositories to graphs to
query results! In fact, query results themselves implement
RDF::Enumerable, and thus implement RDF::Queryable and can be
checked for isomorphism, or whatever else you want to add. This is
functionality that Sesame does not have natively, and which I wrote
for a completely different purpose (testing parsers). Every
RDF::Enumerable gets it for free because I wanted to compare 2 textual
formats. Neat!
For example, here's what it takes to extend any RDF collection, from
RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey
patching. Mixins and
monkey patching together make for a powerful toolkit. To my
knowledge, this is the first RDF library that takes advantage of these
features.
It's possible to provide powerful features to a wide range of
implementations with this. RDF.rb does not yet have a inference
layer, but any such layer would instantly work for any store which
implements RDF::Enumerable. Want to prototype some custom business
logic that operates over existing RDF data? Copy it into a local
repository and hack away. No need for the production RDF store to be
the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There
are some excellent implementations (RDF::Isomorphic is heavily in
debt to Jena), but they're all incompatible. Jena's check for
isomorphism is not really translatable to Sesame, or anything else.
RDF.rb, in addition to providing a reference implementation, acts as
an abstraction layer for underlying RDF implementations. The
difference is night and day--with RDF.rb, you only need to implement a
feature once, at the API layer, to have it apply to any
implementation. This is not a knock at the very talented people
behind those Java implementations; making this happen is a lot of work
in a language without monkey patching, and RDF.rb is only as good as
it is because of the significant influences those projects have been
on Arto's design.
The end result of the mixin-based approach is a system that is
incredibly easy to extend, and just downright fun. It would be a
fairly simple task to extend a Ruby class completely unrelated to RDF
with an #each method that yields statements, allowing it to work in
RDF::Enumerable.
Voila, your existing classes now have an RDF representation. Along
the same lines, if one is bothered by the statement-oriented nature of
RDF.rb, building a system which took a resource-oriented view would
not require one to 'break away' from the RDF.rb ecosystem. Just build
your subject-oriented model objects and implement #each, and away you
go--you can now run RDF queries and test isomorphism on your model.
Build it to accept an RDF::Enumerable in the constructor and you can
use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy
development and already quite useful. Give it a shot. You can post
any issues in the GitHub issue
queue. [Less]
|
Posted
about 15 years
ago
by
Ben Lavender
RDF.rb is easily the most fun RDF
library I've used. It uses Ruby's dynamic system of mixins to create
a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other
languages--Scala traits, for
example, are
... [More]
almost exactly functionally equivalent. They're
distinctly more powerful than Java interfaces or abstract classes. A
mixin is basically an interface and an abstract class rolled into one.
Rather than extend an abstract class, one includes a mixin into your
own class. A mixin will usually require that a given class implement
a particular method. Ruby's own Enumerable class, for example,
requires that implementing classes implement #each. For that tiny
bit of trouble, you get a ton of methods (listed
here), including
iterators, mapping, partitions, conversion to arrays, and more. (If
you're new to Ruby, it might also help you to know that #method_name
means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact,
little more than an in-memory reference implementation for 4 traits:
RDF::Enumerable, RDF::Mutable, RDF::Queryable, and
RDF::Durable. RDF::Sesame::Repository has the exact same
interface as the in-memory representation, but is based entirely on a
Sesame server. In order to work as a repository,
RDF::Sesame::Repository only had to extend the reference
implementation and implement #each, #insert_statement, and
#delete_statement. Nice! Of course, implementing those took some
doing, but it's still exceedingly easy.
RDF::Enumerable is
the key here. For implementing an #each that yields
RDF::Statement objects, one gains a ton of functionality: #each_subject,
#each_predicate, #each_object, #each_context, #has_subject?,
#has_triple?, and more. It's a key abstraction that provides huge
amounts of functionality.
But the module system goes the other way--not only is it easy to
implement new RDF models, existing ones are easily extended. I
recently wrote RDF::Isomorphic,
which extends RDF::Enumerable with #bijection_to and
#isomorphic_with? methods. The module-based system provided by
RDF.rb means that my isomorphic methods are now available on
RDF::Sesame::Repositories, and indeed anything which includes
RDF::Enumerable. This is everything from repositories to graphs to
query results! In fact, query results themselves implement
RDF::Enumerable, and thus implement RDF::Queryable and can be
checked for isomorphism, or whatever else you want to add. This is
functionality that Sesame does not have natively, and which I wrote
for a completely different purpose (testing parsers). Every
RDF::Enumerable gets it for free because I wanted to compare 2 textual
formats. Neat!
For example, here's what it takes to extend any RDF collection, from
RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey
patching. Mixins and
monkey patching together make for a powerful toolkit. To my
knowledge, this is the first RDF library that takes advantage of these
features.
It's possible to provide powerful features to a wide range of
implementations with this. RDF.rb does not yet have a inference
layer, but any such layer would instantly work for any store which
implements RDF::Enumerable. Want to prototype some custom business
logic that operates over existing RDF data? Copy it into a local
repository and hack away. No need for the production RDF store to be
the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There
are some excellent implementations (RDF::Isomorphic is heavily in
debt to Jena), but they're all incompatible. Jena's check for
isomorphism is not really translatable to Sesame, or anything else.
RDF.rb, in addition to providing a reference implementation, acts as
an abstraction layer for underlying RDF implementations. The
difference is night and day--with RDF.rb, you only need to implement a
feature once, at the API layer, to have it apply to any
implementation. This is not a knock at the very talented people
behind those Java implementations; making this happen is a lot of work
in a language without monkey patching, and RDF.rb is only as good as
it is because of the significant influences those projects have been
on Arto's design.
The end result of the mixin-based approach is a system that is
incredibly easy to extend, and just downright fun. It would be a
fairly simple task to extend a Ruby class completely unrelated to RDF
with an #each method that yields statements, allowing it to work in
RDF::Enumerable.
Voila, your existing classes now have an RDF representation. Along
the same lines, if one is bothered by the statement-oriented nature of
RDF.rb, building a system which took a resource-oriented view would
not require one to 'break away' from the RDF.rb ecosystem. Just build
your subject-oriented model objects and implement #each, and away you
go--you can now run RDF queries and test isomorphism on your model.
Build it to accept an RDF::Enumerable in the constructor and you can
use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy
development and already quite useful. Give it a shot. You can post
any issues in the GitHub issue
queue. [Less]
|
Posted
about 15 years
ago
by
Ben Lavender
RDF.rb is easily the most fun RDF
library I've used. It uses Ruby's dynamic system of mixins to create
a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other
languages--Scala traits, for
example, are
... [More]
almost exactly functionally equivalent. They're
distinctly more powerful than Java interfaces or abstract classes. A
mixin is basically an interface and an abstract class rolled into one.
Rather than extend an abstract class, one includes a mixin into your
own class. A mixin will usually require that a given class implement
a particular method. Ruby's own Enumerable class, for example,
requires that implementing classes implement #each. For that tiny
bit of trouble, you get a ton of methods (listed
here), including
iterators, mapping, partitions, conversion to arrays, and more. (If
you're new to Ruby, it might also help you to know that #method_name
means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact,
little more than an in-memory reference implementation for 4 traits:
RDF::Enumerable, RDF::Mutable, RDF::Queryable, and
RDF::Durable. RDF::Sesame::Repository has the exact same
interface as the in-memory representation, but is based entirely on a
Sesame server. In order to work as a repository,
RDF::Sesame::Repository only had to extend the reference
implementation and implement #each, #insert_statement, and
#delete_statement. Nice! Of course, implementing those took some
doing, but it's still exceedingly easy.
RDF::Enumerable is
the key here. For implementing an #each that yields
RDF::Statement objects, one gains a ton of functionality: #each_subject,
#each_predicate, #each_object, #each_context, #has_subject?,
#has_triple?, and more. It's a key abstraction that provides huge
amounts of functionality.
But the module system goes the other way--not only is it easy to
implement new RDF models, existing ones are easily extended. I
recently wrote RDF::Isomorphic,
which extends RDF::Enumerable with #bijection_to and
#isomorphic_with? methods. The module-based system provided by
RDF.rb means that my isomorphic methods are now available on
RDF::Sesame::Repositories, and indeed anything which includes
RDF::Enumerable. This is everything from repositories to graphs to
query results! In fact, query results themselves implement
RDF::Enumerable, and thus implement RDF::Queryable and can be
checked for isomorphism, or whatever else you want to add. This is
functionality that Sesame does not have natively, and which I wrote
for a completely different purpose (testing parsers). Every
RDF::Enumerable gets it for free because I wanted to compare 2 textual
formats. Neat!
For example, here's what it takes to extend any RDF collection, from
RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey
patching. Mixins and
monkey patching together make for a powerful toolkit. To my
knowledge, this is the first RDF library that takes advantage of these
features.
It's possible to provide powerful features to a wide range of
implementations with this. RDF.rb does not yet have a inference
layer, but any such layer would instantly work for any store which
implements RDF::Enumerable. Want to prototype some custom business
logic that operates over existing RDF data? Copy it into a local
repository and hack away. No need for the production RDF store to be
the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There
are some excellent implementations (RDF::Isomorphic is heavily in
debt to Jena), but they're all incompatible. Jena's check for
isomorphism is not really translatable to Sesame, or anything else.
RDF.rb, in addition to providing a reference implementation, acts as
an abstraction layer for underlying RDF implementations. The
difference is night and day--with RDF.rb, you only need to implement a
feature once, at the API layer, to have it apply to any
implementation. This is not a knock at the very talented people
behind those Java implementations; making this happen is a lot of work
in a language without monkey patching, and RDF.rb is only as good as
it is because of the significant influences those projects have been
on Arto's design.
The end result of the mixin-based approach is a system that is
incredibly easy to extend, and just downright fun. It would be a
fairly simple task to extend a Ruby class completely unrelated to RDF
with an #each method that yields statements, allowing it to work in
RDF::Enumerable.
Voila, your existing classes now have an RDF representation. Along
the same lines, if one is bothered by the statement-oriented nature of
RDF.rb, building a system which took a resource-oriented view would
not require one to 'break away' from the RDF.rb ecosystem. Just build
your subject-oriented model objects and implement #each, and away you
go--you can now run RDF queries and test isomorphism on your model.
Build it to accept an RDF::Enumerable in the constructor and you can
use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy
development and already quite useful. Give it a shot. You can post
any issues in the GitHub issue
queue. [Less]
|
Posted
about 15 years
ago
by
Ben Lavender
RDF.rb is easily the most fun RDF
library I've used. It uses Ruby's dynamic system of mixins to create
a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other
languages--Scala traits, for
example, are almost
... [More]
exactly functionally equivalent. They're
distinctly more powerful than Java interfaces or abstract classes. A
mixin is basically an interface and an abstract class rolled into one.
Rather than extend an abstract class, one includes a mixin into your
own class. A mixin will usually require that a given class implement
a particular method. Ruby's own Enumerable class, for example,
requires that implementing classes implement #each. For that tiny
bit of trouble, you get a ton of methods (listed
here), including
iterators, mapping, partitions, conversion to arrays, and more. (If
you're new to Ruby, it might also help you to know that #method_name
means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact,
little more than an in-memory reference implementation for 4 traits:
RDF::Enumerable, RDF::Mutable, RDF::Queryable, and
RDF::Durable. RDF::Sesame::Repository has the exact same
interface as the in-memory representation, but is based entirely on a
Sesame server. In order to work as a repository,
RDF::Sesame::Repository only had to extend the reference
implementation and implement #each, #insert_statement, and
#delete_statement. Nice! Of course, implementing those took some
doing, but it's still exceedingly easy.
RDF::Enumerable is
the key here. For implementing an #each that yields
RDF::Statement objects, one gains a ton of functionality: #each_subject,
#each_predicate, #each_object, #each_context, #has_subject?,
#has_triple?, and more. It's a key abstraction that provides huge
amounts of functionality.
But the module system goes the other way--not only is it easy to
implement new RDF models, existing ones are easily extended. I
recently wrote RDF::Isomorphic,
which extends RDF::Enumerable with #bijection_to and
#isomorphic_with? methods. The module-based system provided by
RDF.rb means that my isomorphic methods are now available on
RDF::Sesame::Repositories, and indeed anything which includes
RDF::Enumerable. This is everything from repositories to graphs to
query results! In fact, query results themselves implement
RDF::Enumerable, and thus implement RDF::Queryable and can be
checked for isomorphism, or whatever else you want to add. This is
functionality that Sesame does not have natively, and which I wrote
for a completely different purpose (testing parsers). Every
RDF::Enumerable gets it for free because I wanted to compare 2 textual
formats. Neat!
For example, here's what it takes to extend any RDF collection, from
RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey
patching. Mixins and
monkey patching together make for a powerful toolkit. To my
knowledge, this is the first RDF library that takes advantage of these
features.
It's possible to provide powerful features to a wide range of
implementations with this. RDF.rb does not yet have a inference
layer, but any such layer would instantly work for any store which
implements RDF::Enumerable. Want to prototype some custom business
logic that operates over existing RDF data? Copy it into a local
repository and hack away. No need for the production RDF store to be
the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There
are some excellent implementations (RDF::Isomorphic is heavily in
debt to Jena), but they're all incompatible. Jena's check for
isomorphism is not really translatable to Sesame, or anything else.
RDF.rb, in addition to providing a reference implementation, acts as
an abstraction layer for underlying RDF implementations. The
difference is night and day--with RDF.rb, you only need to implement a
feature once, at the API layer, to have it apply to any
implementation. This is not a knock at the very talented people
behind those Java implementations; making this happen is a lot of work
in a language without monkey patching, and RDF.rb is only as good as
it is because of the significant influences those projects have been
on Arto's design.
The end result of the mixin-based approach is a system that is
incredibly easy to extend, and just downright fun. It would be a
fairly simple task to extend a Ruby class completely unrelated to RDF
with an #each method that yields statements, allowing it to work in
RDF::Enumerable.
Voila, your existing classes now have an RDF representation. Along
the same lines, if one is bothered by the statement-oriented nature of
RDF.rb, building a system which took a resource-oriented view would
not require one to 'break away' from the RDF.rb ecosystem. Just build
your subject-oriented model objects and implement #each, and away you
go--you can now run RDF queries and test isomorphism on your model.
Build it to accept an RDF::Enumerable in the constructor and you can
use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy
development and already quite useful. Give it a shot. You can post
any issues in the GitHub issue
queue. [Less]
|
Posted
about 15 years
ago
by
Ben Lavender
RDF.rb is easily the most fun RDF
library I've used. It uses Ruby's dynamic system of mixins to create
a library that's very easy to use.
If you're new at Ruby, you might know about mixins in other
languages--Scala traits, for
example, are almost
... [More]
exactly functionally equivalent. They're
distinctly more powerful than Java interfaces or abstract classes. A
mixin is basically an interface and an abstract class rolled into one.
Rather than extend an abstract class, one includes a mixin into your
own class. A mixin will usually require that a given class implement
a particular method. Ruby's own Enumerable class, for example,
requires that implementing classes implement #each. For that tiny
bit of trouble, you get a ton of methods (listed
here), including
iterators, mapping, partitions, conversion to arrays, and more. (If
you're new to Ruby, it might also help you to know that #method_name
means 'an instance method named method_name').
RDF.rb uses the principle extensively. RDF::Repository is, in fact,
little more than an in-memory reference implementation for 4 traits:
RDF::Enumerable, RDF::Mutable, RDF::Queryable, and
RDF::Durable. RDF::Sesame::Repository has the exact same
interface as the in-memory representation, but is based entirely on a
Sesame server. In order to work as a repository,
RDF::Sesame::Repository only had to extend the reference
implementation and implement #each, #insert_statement, and
#delete_statement. Nice! Of course, implementing those took some
doing, but it's still exceedingly easy.
RDF::Enumerable is
the key here. For implementing an #each that yields
RDF::Statement objects, one gains a ton of functionality: #each_subject,
#each_predicate, #each_object, #each_context, #has_subject?,
#has_triple?, and more. It's a key abstraction that provides huge
amounts of functionality.
But the module system goes the other way--not only is it easy to
implement new RDF models, existing ones are easily extended. I
recently wrote RDF::Isomorphic,
which extends RDF::Enumerable with #bijection_to and
#isomorphic_with? methods. The module-based system provided by
RDF.rb means that my isomorphic methods are now available on
RDF::Sesame::Repositories, and indeed anything which includes
RDF::Enumerable. This is everything from repositories to graphs to
query results! In fact, query results themselves implement
RDF::Enumerable, and thus implement RDF::Queryable and can be
checked for isomorphism, or whatever else you want to add. This is
functionality that Sesame does not have natively, and which I wrote
for a completely different purpose (testing parsers). Every
RDF::Enumerable gets it for free because I wanted to compare 2 textual
formats. Neat!
For example, here's what it takes to extend any RDF collection, from
RDF::Isomorphic:
require 'rdf'
module RDF
##
# Isomorphism for RDF::Enumerables
module Isomorphic
def isomorphic_with(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
def bijection_to(other)
# code that uses #each, or any other method from RDF::Enumerable goes here
...
end
end
# re-open RDF::Enumerable and add the isomorphic methods
module Enumerable
include RDF::Isomorphic
end
end
Of course, this just can't be done without monkey
patching. Mixins and
monkey patching together make for a powerful toolkit. To my
knowledge, this is the first RDF library that takes advantage of these
features.
It's possible to provide powerful features to a wide range of
implementations with this. RDF.rb does not yet have a inference
layer, but any such layer would instantly work for any store which
implements RDF::Enumerable. Want to prototype some custom business
logic that operates over existing RDF data? Copy it into a local
repository and hack away. No need for the production RDF store to be
the same at all, but you can still apply the same code.
As a counter-example, compare this to the Java RDF ecosystem. There
are some excellent implementations (RDF::Isomorphic is heavily in
debt to Jena), but they're all incompatible. Jena's check for
isomorphism is not really translatable to Sesame, or anything else.
RDF.rb, in addition to providing a reference implementation, acts as
an abstraction layer for underlying RDF implementations. The
difference is night and day--with RDF.rb, you only need to implement a
feature once, at the API layer, to have it apply to any
implementation. This is not a knock at the very talented people
behind those Java implementations; making this happen is a lot of work
in a language without monkey patching, and RDF.rb is only as good as
it is because of the significant influences those projects have been
on Arto's design.
The end result of the mixin-based approach is a system that is
incredibly easy to extend, and just downright fun. It would be a
fairly simple task to extend a Ruby class completely unrelated to RDF
with an #each method that yields statements, allowing it to work in
RDF::Enumerable.
Voila, your existing classes now have an RDF representation. Along
the same lines, if one is bothered by the statement-oriented nature of
RDF.rb, building a system which took a resource-oriented view would
not require one to 'break away' from the RDF.rb ecosystem. Just build
your subject-oriented model objects and implement #each, and away you
go--you can now run RDF queries and test isomorphism on your model.
Build it to accept an RDF::Enumerable in the constructor and you can
use any existing repository or query to initialize your model.
RDF.rb is not yet ready for production use, but it's under heavy
development and already quite useful. Give it a shot. You can post
any issues in the GitHub issue
queue. [Less]
|
Posted
about 15 years
ago
by
Josh Huckabee
One of the most talked about features in Rails 3 is its plug & play architecture with various frameworks like Datamapper in place of ActiveRecord for the ORM or jQuery for javascript. However, I've yet to see much info on how to actually do this
... [More]
with the javascript framework.
Fortunately, it looks like a lot of the hard work has already been done. Rails now emits HTML that is compatible with the unobtrusive approach to javascript. Meaning, instead of seeing a delete link like this:
Delete
you'll now see it written as
Delete
This makes it very easy for a javascript driver to come along, pick out and identify the relevant pieces, and attach the appropriate handlers.
So, enough blabbing. How do you get jQuery working with Rails 3? I'll try to make this short and sweet.
Grab the jQuery driver at http://github.com/rails/jquery-ujs and put it in your javascripts directory. The file is at src/rails.js
Include jQuery (I just use the google hosted version) and the driver in your application layout or view. In HAML it would look something like.
= javascript_include_tag "http://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js"
= javascript_include_tag 'rails'
Rails requires an authenticity token to do form posts back to the server. This helps protect your site against CSRF attacks. In order to handle this requirement the driver looks for two meta tags that must be defined in your page's head. This would look like:
In HAML this would be:
%meta{:name => 'csrf-token', :content => form_authenticity_token}
%meta{:name => 'csrf-param', :content => 'authenticity_token'}
Update: Jeremy Kemper points out that the above meta tags can written out with a single call to "csrf_meta_tag".
That should be all you need. Remember, this is still a work in progress, so don't be surprised if there's a few bugs. Please also note this has been tested with Rails 3.0.0.beta. [Less]
|
Posted
about 15 years
ago
by
Josh Huckabee
One of the most talked about features in Rails 3 is its plug & play architecture with various frameworks like Datamapper in place of ActiveRecord for the ORM or jQuery for javascript. However, I've yet to see much info on how to actually do this
... [More]
with the javascript framework.
Fortunately, it looks like a lot of the hard work has already been done. Rails now emits HTML that is compatible with the unobtrusive approach to javascript. Meaning, instead of seeing a delete link like this:
<a href="/users/1" onclick="if (confirm('Are you sure?')) { var f = document.createElement('form'); f.style.display = 'none'; this.parentNode.appendChild(f); f.method = 'POST'; f.action = this.href;var m = document.createElement('input'); m.setAttribute('type', 'hidden'); m.setAttribute('name', '_method'); m.setAttribute('value', 'delete'); f.appendChild(m);f.submit(); };return false;">Delete</a>
you'll now see it written as
<a rel="nofollow" data-method="delete" data-confirm="Are you sure?" class="delete" href="/user/1">Delete</a>
This makes it very easy for a javascript driver to come along, pick out and identify the relevant pieces, and attach the appropriate handlers.
So, enough blabbing. How do you get jQuery working with Rails 3? I'll try to make this short and sweet.
Grab the jQuery driver at http://github.com/rails/jquery-ujs and put it in your javascripts directory. The file is at src/rails.js
Include jQuery (I just use the google hosted version) and the driver in your application layout or view. In HAML it would look something like.
= javascript_include_tag "http://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js"
= javascript_include_tag 'rails'
Rails requires an authenticity token to do form posts back to the server. This helps protect your site against CSRF attacks. In order to handle this requirement the driver looks for two meta tags that must be defined in your page's head. This would look like:
<meta name="csrf-token" content="<%= form_authenticity_token %>" />
<meta name="csrf-param" content="authenticity_token" />
In HAML this would be:
%meta{:name => 'csrf-token', :content => form_authenticity_token}
%meta{:name => 'csrf-param', :content => 'authenticity_token'}
Update: Jeremy Kemper points out that the above meta tags can written out with a single call to "csrf_meta_tag".
That should be all you need. Remember, this is still a work in progress, so don't be surprised if there's a few bugs. Please also note this has been tested with Rails 3.0.0.beta. [Less]
|
Posted
about 15 years
ago
by
Josh Huckabee
One of the most talked about features in Rails 3 is its plug & play architecture with various frameworks like Datamapper in place of ActiveRecord for the ORM or jQuery for javascript. However, I've yet to see much info on how to actually do this
... [More]
with the javascript framework.
Fortunately, it looks like a lot of the hard work has already been done. Rails now emits HTML that is compatible with the unobtrusive approach to javascript. Meaning, instead of seeing a delete link like this:
<a href="/users/1" onclick="if (confirm('Are you sure?')) { var f = document.createElement('form'); f.style.display = 'none'; this.parentNode.appendChild(f); f.method = 'POST'; f.action = this.href;var m = document.createElement('input'); m.setAttribute('type', 'hidden'); m.setAttribute('name', '_method'); m.setAttribute('value', 'delete'); f.appendChild(m);f.submit(); };return false;">Delete</a>
you'll now see it written as
<a rel="nofollow" data-method="delete" data-confirm="Are you sure?" class="delete" href="/user/1">Delete</a>
This makes it very easy for a javascript driver to come along, pick out and identify the relevant pieces, and attach the appropriate handlers.
So, enough blabbing. How do you get jQuery working with Rails 3? I'll try to make this short and sweet.
Grab the jQuery driver at http://github.com/rails/jquery-ujs and put it in your javascripts directory. The file is at src/rails.js
Include jQuery (I just use the google hosted version) and the driver in your application layout or view. In HAML it would look something like.
= javascript_include_tag "http://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js"
= javascript_include_tag 'rails'
Rails requires an authenticity token to do form posts back to the server. This helps protect your site against CSRF attacks. In order to handle this requirement the driver looks for two meta tags that must be defined in your page's head. This would look like:
<meta name="csrf-token" content="<%= form_authenticity_token %>" />
<meta name="csrf-param" content="authenticity_token" />
In HAML this would be:
%meta{:name => 'csrf-token', :content => form_authenticity_token}
%meta{:name => 'csrf-param', :content => 'authenticity_token'}
Update: Jeremy Kemper points out that the above meta tags can written out with a single call to "csrf_meta_tag".
That should be all you need. Remember, this is still a work in progress, so don't be surprised if there's a few bugs. Please also note this has been tested with Rails 3.0.0.beta. [Less]
|