Extensions to Microformats

Cognition implements a number of proposed extensions to existing Microformats.

  1. hCard
    1. Additional Organisation Properties
    2. Kind Optimisation
  2. hCalendar
  3. adr
  4. geo
  5. Design Patterns
    1. ABBR design pattern
    2. Datetime design pattern
    3. Include pattern
    4. Microformat Opacity
  6. rel-license
  7. rel-tag
  8. hAtom
  9. XFN
  10. xFolk & hReview
  11. species
    1. Nesting

hCard

The following additional properties are supported, taken from the vCard 4.0 draft:

kind
Type of contact. Usually "individual", "org" or "group". See kind optimisation.
gender
Gender of a contact. Usually "male" or "female".
birth
Place of birth. May be a nested hCard, adr or geo; or plain text.
dday
Date of death.
death
Place of death. May be a nested hCard, adr or geo; or plain text.
impp
Extension to vCard for instant messaging and presence ptotocols, defined in RFC 4770. Similar syntax to "email" and "tel", with "type" and "value" subproperties.
lang
Language(s) spoken by this contact.
member
Where the hCard represents a group or organisation, the "member" property may be used to indicate someone who is a member of the group. The member should be either a URL or a nested hCard.

See also: adr, geo.

Additional Organisation Properties

The following additional organisation sub-properties are supported:

x-vat-number
VAT registration number for an organisation. (Alias "vat-number".)
x-company-number
Registration number for a company registered with an appropriate regulatory body. (Alias "company-number".)
x-charity-number
Registration number for a charity or other non-profit organisation registered with an appropriate body. (Alias "charity-number".)

If there is only one organisation listed as part of an hCard, then organisation-name and other organisation sub-properties may be used without an org wrapper element.

Kind Optimisation

The hCard specification offers a method for determining whether an hCard refers to an individual or an organisation. Cognition extends this to allow hCards to also refer to organisation units (e.g. departments, working groups) and any address properties (buildings, cities, regions, countries, etc). This is done by setting the "fn" property identically to the other property. e.g.

<div class="hcard">
  <a class="org url" href="http://la.ctu.gov.invalid">
    <span class="organization-name">Counter-Terrorist Unit</span>:
    <span class="organization-unit fn">Los Angeles Division</span>
  </a>
</div>

The "kind" property is automatically set (unless a "kind" has been specified explicitly); in the case above it is set to "group".

Explanation of kinds inferred by Cognition
Property equal to FN Inferred KIND
organization-name org
organization-unit group
post-office-box x-post-office-box
extended-address x-extended-address
street-address x-street-address
locality x-locality
region x-region
postal-code x-postal-code
country-name x-country-name

Otherwise, "kind" is assumed to be "individual".

hCalendar

The hCalendar specification is hopelessly incomplete. As a result, I have drafted hCalendar 1.1, and Cognition more-or-less supports that.

adr

The type property is parsed, even when an address is given outside an hCard.

An address may contain embedded geo microformats.

See also: geo.

geo

If a geo is missing its latitude or longitude, then the raw XML string for the entire element is searched for the following regular expression which represents two semicolon/comma-delimited decimal numbers:

/ \s* (\-?[0-9\.]+) \s* [\;\,] \s* (\-?[0-9\.]+) \s* /x

The first number is taken to be the latitude; the second, the longitude. This will allow the parsing of constructs like:

<a class="geo" href="http://maps.google.com/maps?q=50.8730,0.0005">home</a>

The following additional optional properties are supported:

body
The planet or astronomical body to which the co-ordinates apply. If not specified, then "Earth" is assumed. Names should be taken from the International Astronomical Union's Gazetteer of Planetary Nomenclature.
reference-frame
The co-ordinate system used. For Earth, the default co-ordinate system is "WGS84". Other appropriate values include "EtrS89" and "ITRF2005".
altitude
An altitude, above (negative values: below) sea level on Earth, or an agreed zero elevation on other planets. Units should be specified. (Currently there is no microformat for dealing with weights and measures.) When no unit is specified, metres are assumed.

Design Patterns

ABBR Pattern

Due to accessibility problems with the ABBR pattern, an alternative syntax is also supported: the title attribute may be used on non-ABBR elements, but only if the value is prefixed with the string "data:". Human-readable information may be included in the title, before the "data:" prefix. The data prefix and value following it may be wrapped in brackets. For example, the following are considered equivalent:

Brackets, braces and parentheses are considered bracketing characters. Left and right do not have to match.

Experimental support is also included for the data-* class pattern, but to take advantage of it, publishers must include the profile URI.

Include Pattern

The proposed non-verbose class-based solution is supported in addition to the standard method.

Datetime Pattern

The datetime design pattern specifies two profiles (subsets) of ISO8601 datetime format for use in microformats. Cognition additionally supports other ISO8601 datetimes, as parsed by the DateTime::Format::ISO8601 Perl module.

As a last ditch attempt, datetimes that cannot be parsed as above are attempted to be parsed as natural language dates using DateTime::Format::Natural. Ambiguous dates are assumed to be in the future (e.g. does "Sunday" refer to last Sunday or next Sunday?) and specified in UTC. Authors should not rely on natural language parsing, as it is not particularly predictable.

Microformat Opacity

The author of Cognition is following the MFO effort with interest. Currently Cognition implements this algorithm to deal with nested and embedded microformats:

rel-license

Licences are found as specified. The term "license" is also identified using namespace-prefixed rel values within the DC (Dublin Core Metadata Initiative Terms) and CC (Creative Commons) namespaces. For example:

<a href="/licence.html" rel="CC:license">Licence</a>

The namespace must have been predeclared-using an xmlns:FOO attribute or RFC 2731. (Though Cognition pre-declares the "DC" namespace.)

rel-tag

The rel-tag specification should be fully supported as specified.

As an alternative to rel="tag", Cognition also supports class="tag". While rel values are defined as case-insensitive by the HTML 4.01 spec, classes are not, so lower-case must be used. class="tag" tags are parsed differently from rel="tag" tags, in that the link text (subject to the abbr pattern and value excerpting) is used instead of the final URL component, in order to accomodate alternative URL formats. When both rel="tag" and class="tag" are found on the same element, then the element is parsed using standard rel-tag rules. As with rel="tag", class="tag" must only be used on <a> and <area> elements.

The following examples are all parsed as the tag "Example":

hAtom

hAtom is mostly implemented as per the hAtom 0.1 spec. In addition, there is support for zero or more rel-enclosure links within each hEntry, which is predicted to appear in the hAtom 0.2 spec. The nearest-in-parent algorithm for discovering authors for authorless entries is not fully implemented as it is predicted that this algorithm will be simplified for hAtom 0.2. Instead, the following algorithm is implemented:

  1. Look for the author in an element with class "author".
  2. If not found, look for an hCard within an <address> element found within the entry.
  3. If not found, look for an hCard within an <address> element found within the feed.
  4. Otherwise, no author has been specified.

hSlice is supported as a synonym for hEntry.

XFN

XFN is supported as per the spec and parsed into RDF using the guidelines that I published on the microformats wiki. This includes a procedure for working out the “representative hCard” for the page being parsed. The following rules are followed to determine the hCard:

  1. If a representative hCard has been explicitly declared using RDFa through a triple of <pageURI> <http://purl.org/uF/hCard/terms/representative> _:foo then that is taken to be the hCard;
  2. Otherwise if a foaf:primaryTopic exists for the page and the object represents a person, then that is the hCard;
  3. Otherwise, the first hCard with rel="me" specified on a link with class="url";
  4. Otherwise, the first hCard with a class="url" link back to the page being parsed;
  5. Otherwise, the first hCard on the page.

The rev attribute is properly supported, and inverse and symmetric relationships are fully understood. For example, if using rev="child", Cognition knows that this is the same as rel="parent".

Cognition has specific support for XFN 1.0 (i.e. it will ignore the new properties defined in XFN 1.1), but only if you explicitly include the XFN 1.0 profile URI in your document head. Cognition includes support for the XHTML Enemies Network 1.0 (XEN), but again, only if you include the profile URI.

xFolk & hReview

Support for xFolk entries was introduced in Cognition 0.1-α8; hReview followed in 0.1-α9. As of 0.1-α10, the parsers for both have been united: xFolk is treated as funny-looking hReview. As a consequence, xFolk entries may include additional classes from hReview, such as dtreviewed and reviewer.

species

Cognition 0.1-α11 includes experimental support for the species microformat using the root class name biota. As some taxonomic ranks are used differently by botanists and zoologists, you may use the additional class names botany and zoology to resolve any ambiguities. For example, <i class="biota zoology">...</i>.

Within the root element, the following singular properties are allowed for marking up the various taxonomic ranks. (If you're using a CSS-capable browser, you should see that core terms are in bold, zoology-only terms in red, and biology-only in green.) If the rank you wanted is not on the list, then use the generic (plural) class="rank" instead.

Further plural classes binomial and trinomial are supported for marking up the binomial or trinomial name, and common-name (a.k.a. vernacular, cname) for the common name of a species. The plural class authority is supported for marking up the classification authority.

For convenience, as many of these properties use such generic names (class, form, section, etc) you may prefix any of these classes with taxo or taxo-. For example, instead of class="tribe" you could equivalently use class="taxotribe" or class="taxo-tribe". This is not a namespacing mechanism, but a simple method for you to avoid clashes with class names.

Lastly, as an optimisation, if none of the above properties are found within the root element, then the entire string contents of the root element are taken to be a binomial/trinomial name.

Two examples of species parsed by Cognition follow:

<span class="biota" lang="zxx">Homo sapiens</span>
	
<p class="biota zoology" lang="en">
  He is a <span class="common-name">human</span>, or as they say
  in Basque, a <span lang="eu" class="common-name">Gizakia</span>.
  What scientists would classify as a
  <i class="trinomial" lang="zxx">
    <span class="binomial">
      <span class="genus">Homo</span>
      <span class="species">sapiens</span>
    </span>
    <span class="subspecies">sapiens</span>
  </i>,
  a member of the <span class="family" lang="zxx">hominidae</span> family
  of <span lang="zxx" class="taxo-order">primates</span>.
</p>

[Note the use of lang="zxx" (language code for "no linguistic content") rather than lang="la" (language code for Latin). Despite the fact that these scientific terms are often called "Latin names", in reality they are often derived from a mixture of Latin, Greek, English and other sources — they are not usually even close to the Latin terms for the forms of life described. lang="zxx" is a better way of marking up these terms and also indicates that a translation of the terms should not be attempted.]

Nesting

Cognition applies special meaning to instances of the species microformat found nested within hCard and hCalendar events. It is strongly suggested that for either of these purposes, you should supply at least one of these properties (which are normally optional):

hCard

When class="biota" is found nested inside an hCard, then it is implied that the person/thing described by the hCard is a member of the species.

hCalendar Events

When class="biota attendee" is found within an hCalendar event, at least one member of the species described is taken to have been present at the event. Combined with location/geo and dtstart this is roughly equivalent to a "sighting" of the species.

The above combination with attendee may cause problems with some naive parsers, especially ones with no support for the species microformat. Because of this, an alternative syntax is supported to avoid triggering bugs: class="biota x-sighting-of"

Additionally, to specify a sighting of a species class="vcard attendee" may be used in conjunction with the hCard nesting described above to record additional information such as the name or date of birth of the creature sighted. For example:

<p class="vevent">
  <abbr class="dtstart" title="20080706">Yesterday</abbr> I saw a
  <span class="attendee vcard">
    <span class="biota"><span class="common-name">goat</span></span>
    called <span class="fn">Steve</span>
  </span>
</p>

Example iCalendar output:

BEGIN:VEVENT
DTSTART:20080706T000000Z
X-SIGHTING-OF:goat
ATTENDEE;CN=Steve;CUTYPE=INDIVIDUAL;VALUE=TEXT:Steve
END:VEVENT
Toby Inkster
http://tobyinkster.co.uk
Last modified: 2008-07-24

Valid XHTML + RDFa