signposting

Finding signposting in FAIR resources.

This library helps client to discover links that follow the FAIR signposting conventions.

This can then be used to navigate between:

  • Persistent identifiers

  • HTML landing pages

  • File downloads/items

  • Structured metadata

The library works by inspecting the HTTP messages for Link: headers from a given URI with find_signposting_http(), which which categorize them by their rel Link relation into a Signposting object with absolute URIs.

It is up to the clients of this library to decide how to further navigate or retrieve the associated resources, e.g. using a RDF library like rdflib.

This library also provide ways to discover FAIR signposting in HTML <link> annotations and in Link set documents. Future versions may provide ways to discover/merge these concurrently.

class signposting.AbsoluteURI(value: str, base: Optional[str] = None)[source]

An absolute URI, e.g. “http://example.com/

Create and validate an absolute URI reference.

Note that IRIs are not supported unless %-encoded.

Parameters:
  • value – URI string to validate as Absolute URI. May be a relative URI reference if base is provided.

  • base – (Optional) URI used to resolve the potentially relative URI reference, otherwise value must be an absolute URI.

Raises:

ValueError – if the final URI reference is invalid or not absolute.

class signposting.LinkRel(value)[source]

A link relation as used in Signposting.

Link relations are defined by RFC8288, but only link relations listed in FAIR and signposting conventions are included in this enumerator.

A link relation enum can be looked up from its RFC8288 value by calling LinkRel("cite-as") – note that this particular example has a different Python-compatible spelling in it’s enum name (LinkRel.cite_as).

class signposting.MediaType(value=<class 'str'>)[source]

An IANA media type, e.g. text/plain.

This class ensures the type string is valid according to RFC6838 and for convenience converts it to lowercase.

While the constructor do check that the main type is an official IANA subtree (see MAIN), it does not enforce the individual subtype to be registered. In particular RFC6838 permits unregistered subtypes starting with vnd., prs. and x.

Extra content type parameters such as ;profile=http://example.com/ are not supported by this class, as they do not form part of the media type registration.

Construct a MediaType.

Throws ValueError

MAIN = ['application', 'audio', 'example', 'font', 'image', 'message', 'model', 'multipart', 'text', 'video']

Top level type trees as of 2022-05-17 in IANA registry

main: str

The main type, e.g. image

sub: str

The sub-type, e.g. jpeg

class signposting.Signpost(rel: Union[LinkRel, str], target: Union[AbsoluteURI, str], media_type: Optional[Union[MediaType, str]] = None, profiles: Optional[Union[AbstractSet[AbsoluteURI], str]] = None, context: Optional[Union[AbsoluteURI, str]] = None, link: Optional[Link] = None)[source]

An individual link of Signposting, e.g. for rel=cite-as.

This is a convenience class that may be wrapping a link or otherwise constructed.

In some case the link relation may have additional attributes, e.g. signpost.link["title"] – the purpose of this class is however to lift only the navigational attributes for FAIR Signposting.

Construct a Signpost from a link relation.

Parameters:
  • rel – Link relation, e.g. "cite-as"

  • target – URI (e.g. "http://example.com/pid-01")

  • media_type – Optional expected media type of the target (e.g. "text/html")

  • context – Optional URI this is a signposting from (e.g. "http://example.com/page-01.html") (called anchor in Link header)

  • link – Optional origin Link header (not parsed further) for further attributes

Raises:

ValueError – If a plain string value is invalid for the corresponding type-checked classes LinkRel, AbsoluteURI or MediaType,

context: Optional[AbsoluteURI]

Resource URL this is the signposting for, e.g. a HTML landing page.

Note that following HTTP redirections means this URI may be different from the one originally requested.

This attribute is optional (with None indicating unknown context), context may be implied from the resource, e.g. as indicated by Signposting.context

The Link object this signpost was created from.

May contain additional attributes such as link["title"]. Note that a single Link may have multiple rel relations, therefore it is possible that multiple Signpost instances refer to the same link.

profiles: FrozenSet[AbsoluteURI]

Profile URIs for the target with the given type.

Profiles are mainly identifiers, indicating that a particular convention or subtype should be expected in the target’s .

For instance, a rel=describedby signpost to a JSON-LD document can have type=application/ld+json and profile=http://www.w3.org/ns/json-ld#compacted

There may be multiple profiles, or (more commonly) none.

rel: LinkRel

The link relation of this signposting

target: AbsoluteURI

The URI that is the target of this link, e.g. http://example.com/

Note that URIs with Unicode characters will be represented as %-escaped URIs rather than as IRIs.

type: Optional[MediaType]

The media type of the target.

It is recommended to use this type in content-negotiation for retrieving the target URI.

This property is optional, and should only be expected if rel is LinkRel.describedby or LinkRel.item

with_context(context: Optional[Union[AbsoluteURI, str]]) Signpost[source]

Create a copy of this signpost, but with the specified context.

If the context is None, it means the copy will not have a context.

class signposting.Signposting(context: Optional[Union[AbsoluteURI, str]] = None, signposts: Optional[Iterable[Signpost]] = None, include_no_context: bool = True, warn_duplicate=True)[source]

Signposting links for a given resource.

Links are categorized according to FAIR signposting conventions and split into different attributes like citeAs or describedBy.

It is possible to iterate over this class or use the signposts property to find all recognized signposts.

Note that in the case of a resource not having any signposts, instances of this class are considered false.

The constructor takes a an iterable of Signpost.

Signposts are filtered by the matching context (if provided), then assigned to attributes like citeAs or describedBy depending on their Signpost.rel link relation.

Multiple signposts discovered for singular relations like citeAs are ignored in this attribute assignment, however these are included in the Iterable interface of this class and thus also in its length.

A Signposting object is equivalent to boolean False in conditional expression if it is empty, that is len(signposting)==0, indicating no signposts were discovered for the given context. However the remaining signposts will still be available from signposts, as indicated by other_contexts and retrievable with for_context().

Parameters:
  • context – the resource to select signposting for, or any signposts if None.

  • signposts – An iterable of Signpost that should be considered for selecting signposting.

  • include_no_context – If True (default), consider signposts without explicit context, assuming they are about context. If False, such signposts are ignored for assignment, but remain available from signposts.

  • warn_duplicate – If True (default), warn of duplicate signposts that can’t be assigned.

Raises:

ValueError – If include_no_context is false, but context was not provided or None.

authors: Set[Signpost]

Author(s) of this resource (and possibly its items)

citeAs: Optional[Signpost]

Persistent Identifier (PID) for this resource, preferred for citation and permalinks

collection: Optional[Signpost]

Optional collection resource that the selected resource is part of

context: Optional[AbsoluteURI]

Resource URI this is the signposting for, e.g. a HTML landing page.

Documentation on other signposting attributes refer to the context as “this resource”.

This attribute is optional, None indicate no context filtering applies and that individual signposts can have any context.

property context_url: Optional[AbsoluteURI]

Use context instead

Type:

DEPRECATED

describedBy: Set[Signpost]

Metadata resources about this resource and its items, typically in a Linked Data format.

Resources may require content negotiation, check Signpost.type attribute (if present) for content type, e.g. text/turtle.

for_context(context: Optional[Union[AbsoluteURI, str]]) Signposting[source]

Return signposting for given context URI.

This will select an alternative view of the signposts filtered by the given context.

The remaining signposts and their contexts will be included under signposts – any signposts with implicit context will be replaced with having an explicit context from context.

Tip: To ensure all signposts have explicit context, use s.for_context(s.context)

Parameters:

context

The context to select signposts from. The URI should be a member of other_contexts or equal to context, otherwise the returned Signposting will be empty.

If this parameter is None, then the individual Signpost.context values are ignored and any signposts will be considered.

items: Set[Signpost]

Items contained by this resource, e.g. downloads.

The content type of the download may be available as Signpost.type attribute.

license: Optional[Signpost]

Optional license of this resource (and presumably its items)

linksets: Set[Signpost]

Linkset resources with further signposting for this resource (and potentially others).

A Linkset is a JSON or text serialization of Link headers available as a separate resource, and may be used to externalize large collection of links, e.g. thousands of item relations.

Resources may require content negotiation, check Link["type"] attribute (if present) for content types application/linkset or application/linkset+json.

other_contexts: Set[AbsoluteURI]

Other resource URLs which signposting has been provided for.

Use for_context() to retrieve their signpostings, or filter the full list of signposts from signposts according to Signpost.context

property signposts: AbstractSet[Signpost]

All FAIR Signposts with recognized relation types.

This may include any additional signposts for link relations that only expect a single link, like citeAs, as well as any signposts for other contexts as listed in other_contexts.

types: Set[Signpost]

Semantic types of this resource, e.g. from schema.org

signposting.find_signposting_html(uri: Union[AbsoluteURI, str]) Signposting[source]

Parse HTML to find <link> elements for signposting.

HTTP redirects will be followed and any relative paths in links made absolute correspondingly.

Parameters:

uri – An absolute http/https URI, which HTML will be inspected.

Throws ValueError:

If the uri is invalid

Throws IOError:

If the network request failed, e.g. connection timeout

Throws requests.HTTPError:

If the HTTP request failed, e.g. 404 Not Found

Throws UnrecognizedContentType:

If the HTTP resource was not a recognized HTML/XHTML content type

Throws HTMLParser.HTMLParseError:

If the HTML could not be parsed.

Returns:

A parsed Signposting object (which may be empty)

signposting.find_signposting_http(url: str) Signposting[source]

Find signposting from HTTP headers.

Parameters:

url – The URL to request HTTP Link headers from using HTTP HEAD

Returns:

A parsed Signposting object of the discovered signposting

Find signposting among HTTP Link headers.

Links are discovered according to defined FAIR signposting relations.

Parameters:
  • headers – A list of individual HTTP Link headers. The headers should be valid according to RFC8288, excluding the "Link:" prefix.

  • baseurl – Optional base URL to make relative link targets absolute from

Returns:

A Signposting of the collected signposts.

signposting.find_signposting_linkset(uri: Union[AbsoluteURI, str], acceptType: Optional[Union[MediaType, str]] = None) Signposting[source]

Parse linkset to find <link> elements for signposting.

HTTP redirects will be followed.

Parameters:
  • uri – An absolute http/https URI, which HTML will be inspected.

  • acceptType – A MediaType to content-negotiate access for. The default is to content-negotiate including application/linkset and application/linkset+json with JSON having preference.

Throws ValueError:

If the uri is invalid

Throws IOError:

If the network request failed, e.g. connection timeout

Throws requests.HTTPError:

If the HTTP request failed, e.g. 404 Not Found

Throws UnrecognizedContentType:

If the HTTP resource was not a recognized linkset content type. This exception is also raised if acceptType was provided, but didn’t match returned Content-Type.

Throws HTMLParser.HTMLParseError:

If the HTML could not be parsed.

Returns:

A parsed Signposting object (which may be empty)