If you have Javascript enabled, clicking on the links above will copy the link to the "Try Cognition Now" form.
* Note that Cognition's built in document structure parser generates the dcterms:title triple from the <title> attribute — the GRDDL adds a legacy dc:title predicate.
Cognition is a parser for both “upper case Semantic Web” (RDF, RDFa) and “lower case semantic web” (microformats) technologies. It includes modules for exporting parsed data in a variety of formats, including RDF, vCard, iCalendar, Atom and KML.
Cognition is written in Perl 5 and licensed under the GNU GPL (v3).
Cognition internally represents all parsed data in an RDF-like triple format. Microformats don't usually contain as much information as is required by RDF — they usually don't have an explicit subject, and predicates aren't namespaced.
Cognition's microformat parsing process assigns explicit URIs to the subjects, prefixes microformat class names with a relevant URI (e.g. urn:ietf:rfc:2426# for vCard). This allows so called “lower case semantic web” data to mix in with data gleaned from the “upper case Semantic Web” (e.g. RDF). hCards converted to vCards can thus gain information from other sources. This is “gainy” conversion, as against lossy conversion.
It essentially combines data from three different sources:
Many of these technologies make use of namespaces. Standard XML namespaces are
mostly understood, and namespaces may also be linked to using RFC 2731. (You may run
into problems if you define the same prefix differently in different
parts of the document.) A number of namespaces are also predefined, so
that stuff like <meta name="DC.creator"> will "just
work" even if the author never explicitly defined the DC prefix.
Note that both HTML and XHTML are supported equally. The stuff that strictly speaking should not work in HTML (e.g. XML namespaces, RDFa) does work: HTML is treated as if it were funny-looking XHTML.
rel=transformation links and use them to glean extra
data from the page. When strict GRDDL is on, it will ignore these
links unless the GRDDL profile is found — this is the
correct GRDDL behaviour.<head profile> for GRDDLrel=profileTransformation
links to use for GRDDL. For most pages, this will be very slow. Note
that Cognition includes built-in parsing for most microformats,
generally better than GRDDL is able to provide, so for most pages
you will not need to use GRDDL anyway.The command-line version of Cognition includes many more options, but these have not (yet) been exposed in the web interface. Run Cognition with a parameter of --help for more information.
Cognition is currently available in three forms. The first is the Cognition library for Perl. This library is capable of parsing an HTML file as RDFa, eRDF, microformats, etc, etc and making the data available to the calling application as a rudimentary RDF triple store. Export functions are also available to retrieve the data in RDF/XML, RDF/JSON, iCalendar, vCard, etc.
Another form is the cognitiond.pl daemon based on the Perl library. This daemon listens on a TCP port (by default, 24646) waiting to be parsed URIs and outputting data in a chosen format (i.e. RDF/XML, RDF/JSON, iCalendar, vCard, etc). A small command-line client is included which is able to connect to the daemon and output the data to STDOUT. The command-line client is also linked to the Cognition Perl library, and so is able to output data even when it is unable to connect to the daemon.
Lastly, Cognition can be used through a web service at srv.buzzword.org.uk. This web service is still in an experimental stage.
The web service only accepts HTTP URIs (and a special URI of http://referer which indicates that Cognition should process the referring URI). The command-line client supports a wider variety of URIs including file:// URIs.
Cognition supports a special syntax for fragment identifiers. By requesting Cognition to process the URI:
http://example.org/foo#subject(http://example.org/bar)
Cognition will process http://example.org/foo and return all the information it can find about the subject http://example.org/bar. Also, given the input URI:
http://example.org/foo#bar
Cognition will process http://example.org/foo and return all the information is can find about the subject http://example.org/foo#bar.
Tonnes and tonnes of bugfixes, little improvements, and refactoring, particularly in RDFa parsing and handling nested microformats. Turtle output; M3U output; intelligent parsing and output of durations and intervals.
Improved microformats parsing across the board. Add support for hAudio, hResume, hMeasure, species and XEN. Datetime parsing improvements.
Document structure parsing overhaul; improvements to rel=tag; better support for some RDF nuances like rdf:value and rdfs:subPropertyOf.
Switch to client/server model. Add support for hReview.
xFolk support; ICBM; OpenURL COinS.
hCard extensions using vCard 4.0; XFN support; jCard export; RDF/XML output is refactored; RDF/JSON export; improved @lang handling; BNodes.
Profile URIs; Support for hAtom; Improved GRDDL; Atom and iCalendar output; Improved stringification.
vCard export; KML export; improved command-line client; support commented-out RDF in (X)HTML.
Rudimentary GRDDL; better charset handling; better support for tag soup.
Use GNOME XML library; support for CURIEs; use RDF triples to internally represent data; RDFa support!
Bugfixes.
Stop using XML::XPath; support for @xmlns; support hCalendar, rel=tag, rel=license, figure, XOXO; parse document structure from headings.
Initial release.
To run Cognition, you will need Perl 5.8 or above, plus a number of Perl modules installed. (All available from CPAN.) The modules marked with an alternative bullet point are used not by Cognition's parsing library, but by "infrastructure" code such as the daemon. The modules in italics are core Perl modules, included with the base Perl distribution (so you shouldn't need to download them).
Cognition has been tested on Mac OS 10.4 and Mandriva Linux 2008. (There are some bugs in some recent versions of LibXSLT which cause crashes on Mac. You can fix this by disabling GRDDL support using the -o p_grddl=0 option.) It will probably work on Windows too.
Powered by…

