Cognition is a parser for both “upper case Semantic Web” (RDF, RDFa) and “lower case semantic web” (microformats) technologies. It includes modules for exporting parsed data in a variety of formats, including RDF, vCard, iCalendar, Atom and KML.
Cognition is written in Perl 5 and licensed under the GNU GPL (v3).
Cognition internally represents all parsed data in an RDF-like triple format. Microformats don't usually contain as much information as is required by RDF — they usually don't have an explicit subject, and predicates aren't namespaced.
The microformat parsing process assigns explicit URIs to the subjects, prefixes microformat class names with a relevant URI (e.g. urn:ietf:rfc:2426# for vCard). This allows so called “lower case semantic web” data to mix in with data gleaned from the “upper case Semantic Web” (e.g. RDF). hCards converted to vCards can thus gain information from other sources. This is “gainy” conversion, as against lossy conversion.
It supports metadata embedded using the following methods:
<title>, <meta> and <link> tags<hX> tags@role module<link rel="meta">, hidden in HTML <!--comments--> or embedded in XHTML using namespacesMany of these technologies make use of namespaces. Standard XML namespaces are
mostly understood, and namespaces may also be linked to using RFC 2731. (You may run
into problems if you define the same prefix differently in different
parts of the document.) A number of namespaces are also predefined, so
that stuff like <meta name="DC.creator"> will "just
work" even if the author never explicitly defined the DC prefix.
Note that both HTML and XHTML are supported equally. The stuff that strictly speaking should not work in HTML (e.g. XML namespaces, RDFa) does work: HTML is treated as if it were funny-looking XHTML.
Cognition supports various extensions to these microformats.
To a certain extent, Cognition can understand or make use of:
<meta scheme> attributerel and
rev.Cognition is currently available in two forms. One is cognition.cgi, a web-based interface, which you can try online here. It can spit out the parsed data as a Perl structure, beautifully syntax-highlighted RDF, KML, vCard, Atom or iCalendar.
The other is cognition.pl, a command line client. It is able to read various options from the command line (see --help for details) to control parsing behaviour and specify the output format.
To run Cognition, you will need Perl 5.8 or above, plus a number of Perl modules installed. (Modules marked with a hollow list marker are used by cognition.cgi, but not cognition.pl.) These are all available from CPAN:
Cognition has been tested on Mac OS 10.4 and Mandriva Linux 2008. (There are some bugs in some recent versions of LibXSLT which cause crashes on Mac. You can fix this by disabling GRDDL support using the -o p_grddl=0 option.)
Powered by…

