Swignition Input

Although the primary purpose of Swignition is to parse HTML, it accepts input in various formats. Its behaviour depends on the Content-Type HTTP header.

Parsing Mode Perl Module Responsible Content-Type
* with the string xmlns:rdf in the first kilobyte of the file
where file starts with one of the characters _<#, optionally preceded with white space
with the string <rss or http://www.w3.org/2005/Atom or http://purl.org/atom/ns in the first kilobyte of the file
where the root element is <TriX>
§ if it looks like RDF/JSON.
HTML Swignition::HtmlParser
  • text/html
  • application/xhtml+xml
  • application/vnd.wap.xhtml+xml
RDF/XML Swignition::RdfXmlParser
(wrapper for Redland)
  • application/rdf+xml
  • text/rdf
  • application/xml *
  • text/xml *
Turtle / N-Triples Swignition::TurtleParser
(wrapper for Redland)
  • application/turtle
  • application/x-turtle
  • application/rdf+turtle
  • text/turtle
  • text/plain †
Notation 3 Swignition::Notation3Parser
(wrapper for Redland)
  • text/n3
  • text/x-n3
  • text/rdf+n3
TriG Swignition::TrigParser
(wrapper for Redland)
  • application/x-trig
Feeds Swignition::FeedParser
(wrapper for Redland)
  • application/rss+xml
  • application/atom+xml
  • application/xml ‡
  • text/xml ‡
TriX Swignition::PoxParser
  • application/xml ¶
  • text/xml ¶
  • Anything ending +xml ¶
XML (generic)
  • application/xml
  • text/xml
  • Anything ending +xml
RDF/JSON Swignition::JsonParser
  • application/json §
JSON (generic)
  • application/json


Swignition is able to parse XHTML and HTML — indeed, this is its primary purpose. It can understand a wide variety of Microformats and other POSH formats (built-in, non-GRDDL support), including:

It recognises various ways of embedding RDF within HTML:

And of course it understands HTML's built-in features for metadata and document structure:

RDF Serialisations (RDF/XML, Notation3, Turtle, N-Triples & TriG)

RDF/XML, Turtle, N-Triples, and if you've got a recent enough version of Redland, TriG are parsed. Notation3 is tried, but doesn't always work.

Feeds (RSS & Atom)

Swignition uses Redland to convert "tag soup" RSS or Atom into strict RSS 1.0, and is thus able to read them as RDF.

Item descriptions in RSS feeds are treated as HTML (if they look like they're a bit more than plain text) and inspected for Microformats and for RDFa. Entry summaries in Atom feeds if explicitly marked as type="html" are treated similarly.


Any media type containing the string xml, except those noted above, is recognised as XML.

If the root element's tag name is <TriX> then Swignition will parse it as TriX. Swignition will take notice of <?xml-stylesheet ?> processing instructions and run the file through any XSLT1 transformations (in the order they're encountered) before parsing it. Swignition is able to handle multiple TriX graphs (it just merges them into one graph, but does so properly).

For non-TriX XML files, Swignition attempts to parse them using GRDDL.


Swignition understands RDF/JSON, but will only parse JSON this way if it "seems to be RDF/JSON" (which it checks by looking for colons in the key strings of the JSON root object - they each need a colon). Otherwise, you may explicitly indicate that the file is actually RDF/JSON and not generic JSON by including a link to the RDF/JSON schema. For example:

  "$schema" : { "$ref" : "http://SOAPjr.org/schemas/RDF_JSON" } ,
  "http://example.org/about" : 
      { "type" : "literal" , "value" : "Anna's Homepage." }

Swignition also understands jsonGRDDL to extract data from generic JSON.