Version History =============== cognition/0.1-alpha1 (2008-02-15) :- * initial release * metadata: , , , @role, eRDF * eRDF does not support rdf:type syntax * RFC 2731 is supported for namespaces * microformats: hcard, hcalendar, adr, geo - hcalendar support assumes page is one giant calendar - no support for rel-tag, so no support for categories in hcard or hcalendar - geo support includes body, altitiude and reference-frame extensions - microformats patterns: include-pattern, abbr-pattern, extensions + include-pattern supports my alternative syntax + abbr-pattern supports Andy Mabbett's alternative * RDF output of namespaced metadata cognition/0.1-alpha2 (2008-02-20) :- * drop usage of XML::XPath module, using XML::DOM instead - might use XML::DOM::XPath in future if XPath support is needed * support XML namespaces used as metadata namespaces. * microformats: hcalendar (complete), rel-tag, rel-license, figure, xoxo - rel-licence extended to support searches for 'license' in CC or DCTERMS namespaces; or 'rights.license' in DC or DCTERMS namespaces - experimental figure microformat based on current brainstorming * parse document structure (headings + semantic tables + semantic images/figures microformat? + xoxo lists) cognition/0.1-alpha2.1 (2008-02-21) :- * Fix handling for entities. * Fix delay on LWP::RobotUA. cognition/0.1-alpha3 (2008-03-01) :- * Switch from XML::DOM to XML::LibXML. Should be my last big parser change! * Restructure object to be more tuple-like. * URLs: - Support for CURIEs. - support for geo: and tag: URIs - use XPointer to provide URLs for document fragments without identifiers * RDF: - use <rdf:Bag> to wrap multiple tuples with the same subject and property - Remove duplicate values within bags - add support for microformats to RDF output - RDF subjects may have multiple URIs defined to help match up properties that actually belong to the same subject (e.g. some properties might be attached to a fragment identifier, and others to an hcard, but if we know that the hcard root element has an id attribute which matches the fragment identifier, then we can equate the subjects) - support "vocabularies" for RDF - convert document structure to RDF <http://purl.org/dc/terms/hasPart>, <http://purl.org/dc/terms/isPartOf>. * Improve STRINGIFY to prevent all these leading and trailing spaces * Recognise (X)HTML predefined link types and put them in XHTML namespace. * More reliable support for namespaces. * Microformats: - Properly parse DateTimes found in microformats. - support table cell header pattern - support hcalendar 1.1 draft * Complete support for RDFa * Much improved support for eRDF, support rdf:type. Any bugs? * Improved support for XHTML role attribute cognition/0.1-alpha4 (2008-03-07) :- * Support rel=meta: retrieve additional document metadata, parse as RDF * GRDDL: - Beginnings of GRDDL support. - Support for rel=transformation linking to XSLT to transform doc to RDF - Support for grddl:transformation="" style transformations. - No support for <head profile> yet. * Microformats: - Table cell header pattern has been changed on wiki. Implement changes. - Better microformat nesting handling. * Improvements in charset handling and support for tag-soup HTML. * Comment out pre-RDFa <link rel>, <a rel> support. It's not really useful. * Disable eRDF by default as it seems to generate too many false positives. cognition/0.1-alpha5 (2008-03-16) :- * Various minor improvements to hCard and hCalendar parsing. * Export framework - Add vCard export option. + Parses data: URIs and outputs as base64 embedded data. + Pulls in data from full gamut of supported semantics, so that, say, RDFa FOAF data may end up as part of the vCard output. + Test input: <http://examples.tobyinkster.co.uk/hcard>. - Add KML export option. + Data can come from hCard, (e)RDF(a) vCard, (e)RDF(a) GeoRSS, etc. * Re-enabled eRDF by default, but eRDF parsing is now stricter. It *requires* a profile of <http://purl.org/NET/erdf/profile> to be found on the <head> element. * Improved command-line client. Use GetOpt::Long, Pod::Usage. * Support RDF embedded in HTML <!-- comments -->. (Trackback uses this.) cognition/0.1-alpha6 (2008-03-29) :- * Microformats: - Add option (disabled by default) to require <head profile> for microformat support. Microformat profiles are treated as OPAQUE STRINGS! Supports th following profiles: + http://purl.org/uF/2008/03/ + http://www.w3.org/2006/03/hcard or http://purl.org/uF/hCard/1.0/ + http://dannyayers.com/microformats/hcalendar-profile or http://purl.org/uF/hCalendar/1.0/ + http://purl.org/uF/hAtom/0.1/ + http://purl.org/uF/rel-tag/1.0/ + http://purl.org/uF/rel-license/1.0/ + No profiles required for rel-enclosure, adr or geo (yet). - Support for hAtom, WebSlices. + In addition to hAtom 0.1, rel-enclosure is supported within hEntries. - Improve include-pattern support to prevent some infinite loops. * GRDDL: - Add option (disabled by default) to require <head profile> for GRDDL. - Add option to check profile URLs for profileTransformation links. * Export: - Atom output. (Supports RDF/RSS and hAtom as input.) - iCalendar export option. + hCalendar 1.1 events. + hCalendar 1.1 todo items + hCalendar 1.1 freebusy info. + hCalendar 1.1 alarms. + hAtom entries (as VJOURNAL). + W3C's iCal RDF vocab (but see note in Cognition/Export/Calendar.pm) + RSS Event Module <http://web.resource.org/rss/1.0/modules/event/> * Added a "--nofollow" option to prevent secondary fetching from particular hosts. (Secondary fetching = requesting <head profile>, <link rel="meta">, <link rel="transformation">.) * Support <rdf:RDF> elements found directly in (X)HTML. * Much improved HTML->Text convertion. Namely: word wrapping, line breaks added after block elements, quote marks around <q> elements, bullet points and numbers before <li> elements in unordered and ordered lists, brackets around superscript text, parentheses around subscripts, tab characters between table cells, usenet-style quoting for <blockquote>, alt text from <img> and <input type="img">, values from other <input> tags. Should be able to handle nested elements like //ul/li/ol/li/dl/dd/blockquote/img[@alt]. Won't be completely foolproof, but should be an improvement over what was there before! * Fix so that the entire page is not given a rdf:type of ical:vcalendar unless it contains some bona fide vevent/vtodo/valarm/vfreebusy nodes. cognition/0.1-alpha7 (2008-04-21) :- * Set '_xmllang' attribute on all elements, a la '_xpath'. * Microformats: - hCard: + Rename date-of-death "dday", and implement other properties from vCard 4.0 draft <http://www.ietf.org/internet-drafts/draft-resnick-vcarddav- vcardrev-01.txt>. + Empty TEL, EMAIL and IMPP no longer parsed. (e.g. telephone numbers with usages but no actual number.) + Automatically detect the representative hCard and contact hCard. <http://microformats.org/wiki/representative-hcard> - hCalendar: + support rel="vcalendar-(parent|sibling|child)" and class="related-to". + support implicit relationships gleaned from nesting. + Explicitly set RDF datatype for integers. + Better support for vfreebusys. + @title on root element parsed as dc:title. + Support x-wr-calname/x-wr-caldesc/calscale/prodid/method. - XFN: <http://microformats.org/wiki/xfn-to-foaf>. * Exports: - Cognition::Export::findSubject - I won't go into an explanation of why this is important, but it is. - jCard export. - vCard improvements: + Set TYPE parameter when ENCODING=b. + Output vCard 4.0 properties. Detect instant messaging protocols which have been forced into the URLs and output them as IMPP properties. - iCalendar improvements: + Set TYPE parameter when ENCODING=b. + Add RELATED-TO properties. + Support X-WR-CALDESC/CALSCALE/PRODID/METHOD/VERSION. + Big improvements for ATTENDEE/CONTACT/ORGANIZER. - RDF output no longer handled by HTMLParser -- it is in an Export module: + Output RDF datatypes (e.g. <http://www.w3.org/2001/XMLSchema#date>). + Output xml:lang where we can. + s/rdf:Description/FOO/ where FOO is the rdf:type. + Improved output for rdf:XMLLiterals. + Instead of <foo:bar rdf:nodeID="X">, nest the RDF description for X. - RDF JSON <http://n2.talis.com/wiki/RDF_JSON_Specification> export. * RDFa: - RDFa DTD has s/instanceof/typeof/. Cognition supports both (for now), but prefers @typeof. Fixed this attribute to allow whitespace-delimited list of (CURIE|URI)s. - In accordance with RDFa rules, drop resolution of absolute URIs from relative URIs specified in @xmlns. This actually makes parsing dumber, but it's in the recommended algorithm. - Improved parsing of rdf:XMLLiterals. - Extension to RDFa: @title parsed as rdfs:label. * When parsing and outputting dates, retain "resolution". * Create a data type Cognition::MagicString used in place of strings in many places which retains the language and XML representation of a string. MagicString-aware code can then pick up this data and use it if required. non-MagicString-aware code should usually be able to treat the MagicString as if it were a string, and not notice any difference, as MagicString overloads the stringify function. * More improvements to STRINGIFY: - Better algorithm for inserting whitespace between CDATA and inline element nodes. Should prevent words from accidentally running together. - Implement @start and @type for lists. For unordered lists, disc markers are implemented as asterisks, circle markers as hyphens, and square markers as plus signs. (Much like the markers used in this ChangeLog.) For ordered lists, roman numeral markers work up to 3999, and alphabetical markers up to 26 -- after that, the list will revert to numeric markers. - Better support for microformats "value excerpting". - Stringify now takes care of value excerpting and the ABBR pattern. * Better HTML->XHTML conversion routine. * Better framework for namespaces. Old system didn't handle scoped namespaces (e.g. xmlns attribute on a non-root element). * Introduce a BNode concept into the Cognition RDF model. Stored in the RDF triple store with dummy URIs like <bnode:///string>. This pretty much eliminates those ugly XPointers which littered the RDF output previously. As a deliberate change, <div class="vcard vcalendar"> will now result in two different RDF subjects, however they can be united into one subject by giving that node an ID attribute (because then they have proper URIs, not node IDs). - Adjust "->uri" methods for microformats. - Adjust RDFa parser to create BNodes instead of #fakeid URIs. - Adjust RDF export to use rdf:nodeID instead of rdf:resource/rdf:about. * Document structure parsing was disabled in alpha4 as it made the RDF output ugly. Because of improvements in RDF output, and ability to use BNodes, it is now re-enabled by default without uglying everything up. It can still be disabled via options. cognition/0.1-alpha8 (2008-05-04) :- * Microformats: - XFN: + Fix XFN rel values to match case-insensitively. + Smarter support for "mailto:", "urn:sha1:" and pictorial link targets. - hCalendar: + Fix Cognition::uF::hFreebusy::fb::uri to issue BNodes instead of XPointers. + Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey. + Fix bug with documents being given rdf:type of ical:Vcalendar, even if they do not use hCalendar. - hCard: + Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey. + Implement Andy Mabbett's suggestion allowing the "fn" class to be attached to address sub-properties, thus allowing hCards to easily represent places rather than organisations or people. - xFolk: introduce support for this microformat. Using a similar internal representation to the model used by Digg's new RDFa -- i.e. dc:source, dc:title and dc:abstract. Perhaps should extend xFolk to allow for dc:date and dc:creator? - Rel-Tag: restructured RDF output to mostly use Dublin Core. - figure: + Improvements to title/legend minimisation. + Restructured RDF output to use Dublin Core and FOAF. - geo: parse <meta name="ICBM"> as if it were an instance of geo. * Exports: - Corrections to support for both of the W3C RDF vocabs, and also the W3C iCalendar vocab. * Fix white space trimming bug in STRINGIFY. * Fix contact exporters to use foaf:name when no better name is available. * Support for COinS <http://ocoins.info/>, including obsolete rel="Z3988". cognition/0.1-alpha9 (2008-06-01) :- * Introduce (optional) client/server model for Cognition. cognitiond.pl runs in the background; cognition.pl attempts to connect to it, asks the daemon to parse the URL, consumes the result and returns it. In many cases this significantly speeds up results. By default cognition.pl looks for a server using TCP on localhost:26464, but --host, --port and --proto parameters may be used to configure a different daemon to connect to. cognitiond.pl will look at /etc/cognition/cognitiond.conf to read its options. See sample config file. * Parsing improvements: - Improvements to white space handling. - Improvements to oddball ISO date formats such as 2 digit years, missing years, dates specified by week number or by ordinal day number. * Exports: - vCard: + Multiple vCard output now returns hCard contacts in same order as encountered on the page. + Cope better with more structured names. - jCard: + Multiple jCard output now returns hCard contacts in same order as encountered on the page. + Cope better with more structured names. - iCalendar: + Add VCARDURL parameter support for CONTACT, ORGANIZER and ATTENDEE properties, as described in this draft spec: <http://xml.coverpages.org/draft-royer-ical-vcard-01.txt> + Datetime fixes: convert to UTC and format correctly. * Microformats: - Implement support for hReview. - Rewrote support for N (structured names) in hCard parser to create vcard:N objects to wrap vcard:given-name, etc. - Allow explicit plus signs in geo microformat. cognition/0.1-alpha10 (2008-06-27) :- * Completely rewritten document structure parsing, using HTML 5 outlines algorithm <http://www.whatwg.org/specs/web-apps/current-work/#outlines> as a guide. Thanks to Ryan King and Geoffrey Sneddon for pointing me towards this algorithm. I also used Geoffrey's python implementation as a crib sheet to help me figure out what was supposed to happen when the HTML 5 spec was ambiguous. <http://hg.gsnedders.com/spec-gen/file/tip/specGen/processes/outliner.py> * Microformats: - rel-tag: + Support for class="tag". + Internal representation now uses Richard Newman's RDF Tag ontology. <http://www.holygoat.co.uk/owl/redwood/0.1/tags/> - XFN: + Explicit XFN 1.0 support. If you give an explicit profile URI pointing to the XFN 1.0 profile, but not to the XFN 1.1 profile, then newer XFN terms such as 'me', 'kin' and 'contact' are ignored. (But rel="me" is still used for determining the representative hCard of a page.) - hCard: + Support for fax: and modem: URIs. + Support "type"/"value" subproperties for "label" properties. - hCalendar: + Support for XOXO vtodo-list optimisation. Very nifty. - Experimental support for data-X classes. <http://purl.org/uF/pattern-data-class/1> - xFolk: + Merged support for xFolk into hReview. xFolk.pm is gone now. <http://buzzword.org.uk/cognition/uf-plus.html#xfolk-hreview> - hReview: + Support "xfolkentry" as an alias for "hreview". + Support "taggedlink" as an alias for "item". + Allow multiple instances of class "description". * Exports: - Special support for rdf:value, such that if an export module is looking for a literal value, but finds a resource which itself has an rdf:value literal, will use that literal. Indeed, it is capable of drilling down through rdf:value properties several layers deep. e.g. the following RDFa can be sucessfully exported as vCard: <div typeof="foaf:Person"> <div rel="foaf:name"> <p rel="rdf:value"> <b property="rdf:value">Toby Inkster</b> </p> </div> </div> - vCard: add support for vCard 4.0 "RELATED" property. XFN, foaf:knows and the RDF relationship vocab <http://vocab.org/relationship/> can all be used to supply the data. * Cognition understands rdfs:subPropertyOf, and will make use of a list of any rdfs:subPropertyOf relationships found in "~/.cognition/subPropertyOf.rdf". (It will also take heed of any such relationships found parsing the page, but won't go looking for them specially.) That is, if Cognition is outputting a vCard, so is looking for a foaf:name for a person, and you have stated that custom:moniker is an rdfs:subPropertyOf of foaf:name, and this person has a custom:moniker property defined, then the custom:moniker property is used. (Note: this was a lot more work than it should be. I'm on the lookout for a third-party triple store that can take the headache out of this sort of thing for me.) cognition/0.1-alpha11 (2008-07-24) :- * Microformats: - Improved and more consistent parsing. A lot of parsing code that was repeated between the different microformat modules has been moved to Cognition::uF::simple_parse(). It includes better support for embedded microformats like: <div class="vcard"> <div class="agent"> <p class="vcard"></p> </div> </div> and proper support for ISO 8601 durations (not just treated as strings). - hResume + Add support for this draft <http://microformats.org/wiki/hResume>. + Mostly uses DOAC <http://ramonantonio.net/doac/0.1/doac.rdfs> to map to RDF. + LanguageSkills can be specified as ".hresume .contact.vcard .lang". + "affiliation" translated to vCard 4.0 draft "MEMBER" property. - hAudio: + Add support for this draft <http://microformats.org/wiki/hAudio>. - hMeasure / hMoney: + Add support for this draft <http://microformats.org/wiki/measure>. + Units currently treated as an opaque string, though I do have some experimental unit-conversion code that I may include in a future release of Cognition. + Nest within an hCard or hCalendar event to associate the measurement with that contact/event. - species: + Add experimental support for this proposed microformat. + Use the "biota" class to mark up a binomial/trinomial, plus (optionally) other taxonomic data. + Nest within an hCard to mark up the species of the hCard's owner. + Include class="attendee biota" within an hCalendar event to mark up a sighting of a member of the species. - XFN: + Refinements to implied foaf:knows. e.g. if Alice is Bob's parent, it is not necessarily implied that Alice and Bob know each other. For just a handful of relationships (e.g. friend, spouse, etc), foaf:knows is still implied. + Implements the XHTML Enemies Network (XEN). It's a spoof, but some people may find it useful. XEN relationships are only processed on pages that include the profile URI <http://xen.adactio.com/>. - figure: + Support rel-tag and rel-license nested inside figures. - hCard: + Make "lang" plural. + Support vCard 4.0 "member" property - either contains a nested hCard or a URI. * Exports: - vCard: keep up with improvements to hCard. - jCard: keep up with improvements to hCard. * DateTime parsing: - General datetime parsing improvements - I've bundled the Perl DateTime::Format::ISO8601 module within the Cognition distribution, renaming it to Cognition::DTParse. It includes several modifications to make it more tolerant, especially in the case of timezone handling and dealing with whitespace. - Support HTML 5 <time> element. - In conjunction with the smarter microformat parsing mentioned above, the STRINGIFY function now know when the property it's reading is supposed to be a datetime and can tailor its behaviour accordingly. In particular it will attempt to read values from the "datetime" attribute if it exists. This allows, in hCalendar: <time class="dtstart" datetime="2008-07-24">Thursday</time> and also: <span class="dtstart"> <time class="value" datetime="2008-07-24">Thursday</time> at <time class="value" datetime="21:00:00">9pm</time> <time class="value" datetime="+0100">(UK)</time> </span> Note that <time> is not the only HTML element that supports a "datetime" attribute. The following might be useful in hCard: <ins class="tel rev" datetime="2008-07-24T21:00:00"> My new <span class="type">home</span> phone number is <span class="value">01632 960 123</span> </ins>