signposting¶
Finding signposting in FAIR resources.
This library helps client to discover links that follow the FAIR signposting conventions.
This can then be used to navigate between:
Persistent identifiers
HTML landing pages
File downloads/items
Structured metadata
The library works by inspecting the HTTP messages for
Link: headers from a given URI with find_signposting_http(), which
which categorize them by their rel Link relation into a
Signposting object with absolute URIs.
It is up to the clients of this library to decide how to further
navigate or retrieve the associated resources, e.g. using a
RDF library like rdflib.
This library also provide ways to discover
FAIR signposting in HTML <link> annotations and in
Link set documents. Future versions may provide ways to
discover/merge these concurrently.
- class signposting.AbsoluteURI(value: str, base: Optional[str] = None)[source]¶
An absolute URI, e.g. “http://example.com/”
Create and validate an absolute URI reference.
Note that IRIs are not supported unless
%-encoded.- Parameters:
value – URI string to validate as Absolute URI. May be a relative URI reference if base is provided.
base – (Optional) URI used to resolve the potentially relative URI reference, otherwise value must be an absolute URI.
- Raises:
ValueError – if the final URI reference is invalid or not absolute.
- class signposting.LinkRel(value)[source]¶
A link relation as used in Signposting.
Link relations are defined by RFC8288, but only link relations listed in FAIR and signposting conventions are included in this enumerator.
A link relation enum can be looked up from its RFC8288 value by calling
LinkRel("cite-as")– note that this particular example has a different Python-compatible spelling in it’s enum name (LinkRel.cite_as).
- class signposting.MediaType(value=<class 'str'>)[source]¶
An IANA media type, e.g.
text/plain.This class ensures the type string is valid according to RFC6838 and for convenience converts it to lowercase.
While the constructor do check that the main type is an official IANA subtree (see
MAIN), it does not enforce the individual subtype to be registered. In particular RFC6838 permits unregistered subtypes starting withvnd.,prs.andx.Extra content type parameters such as
;profile=http://example.com/are not supported by this class, as they do not form part of the media type registration.Construct a MediaType.
Throws ValueError
- MAIN = ['application', 'audio', 'example', 'font', 'image', 'message', 'model', 'multipart', 'text', 'video']¶
Top level type trees as of 2022-05-17 in IANA registry
- main: str¶
The main type, e.g.
image
- sub: str¶
The sub-type, e.g.
jpeg
- class signposting.Signpost(rel: Union[LinkRel, str], target: Union[AbsoluteURI, str], media_type: Optional[Union[MediaType, str]] = None, profiles: Optional[Union[AbstractSet[AbsoluteURI], str]] = None, context: Optional[Union[AbsoluteURI, str]] = None, link: Optional[Link] = None)[source]¶
An individual link of Signposting, e.g. for
rel=cite-as.This is a convenience class that may be wrapping a
linkor otherwise constructed.In some case the link relation may have additional attributes, e.g.
signpost.link["title"]– the purpose of this class is however to lift only the navigational attributes for FAIR Signposting.Construct a Signpost from a link relation.
- Parameters:
rel – Link relation, e.g.
"cite-as"target – URI (e.g.
"http://example.com/pid-01")media_type – Optional expected media type of the target (e.g.
"text/html")context – Optional URI this is a signposting from (e.g.
"http://example.com/page-01.html") (calledanchorin Link header)link – Optional origin
Linkheader (not parsed further) for further attributes
- Raises:
ValueError – If a plain string value is invalid for the corresponding type-checked classes
LinkRel,AbsoluteURIorMediaType,
- context: Optional[AbsoluteURI]¶
Resource URL this is the signposting for, e.g. a HTML landing page.
Note that following HTTP redirections means this URI may be different from the one originally requested.
This attribute is optional (with
Noneindicating unknown context), context may be implied from the resource, e.g. as indicated bySignposting.context
- link: Optional[Link]¶
The
Linkobject this signpost was created from.May contain additional attributes such as
link["title"]. Note that a single Link may have multiplerelrelations, therefore it is possible that multipleSignpostinstances refer to the same link.
- profiles: FrozenSet[AbsoluteURI]¶
Profile URIs for the target with the given type.
Profiles are mainly identifiers, indicating that a particular convention or subtype should be expected in the target’s .
For instance, a
rel=describedbysignpost to a JSON-LD document can havetype=application/ld+jsonandprofile=http://www.w3.org/ns/json-ld#compactedThere may be multiple profiles, or (more commonly) none.
- target: AbsoluteURI¶
The URI that is the target of this link, e.g.
http://example.com/Note that URIs with Unicode characters will be represented as %-escaped URIs rather than as IRIs.
- type: Optional[MediaType]¶
The media type of the target.
It is recommended to use this type in content-negotiation for retrieving the target URI.
This property is optional, and should only be expected if
relisLinkRel.describedbyorLinkRel.item
- with_context(context: Optional[Union[AbsoluteURI, str]]) Signpost[source]¶
Create a copy of this signpost, but with the specified context.
If the context is
None, it means the copy will not have a context.
- class signposting.Signposting(context: Optional[Union[AbsoluteURI, str]] = None, signposts: Optional[Iterable[Signpost]] = None, include_no_context: bool = True, warn_duplicate=True)[source]¶
Signposting links for a given resource.
Links are categorized according to FAIR signposting conventions and split into different attributes like
citeAsordescribedBy.It is possible to iterate over this class or use the
signpostsproperty to find all recognized signposts.Note that in the case of a resource not having any signposts, instances of this class are considered false.
The constructor takes a an iterable of
Signpost.Signposts are filtered by the matching context (if provided), then assigned to attributes like
citeAsordescribedBydepending on theirSignpost.rellink relation.Multiple signposts discovered for singular relations like
citeAsare ignored in this attribute assignment, however these are included in the Iterable interface of this class and thus also in its length.A Signposting object is equivalent to boolean False in conditional expression if it is empty, that is
len(signposting)==0, indicating no signposts were discovered for the given context. However the remaining signposts will still be available fromsignposts, as indicated byother_contextsand retrievable withfor_context().- Parameters:
context – the resource to select signposting for, or any signposts if
None.signposts – An iterable of
Signpostthat should be considered for selecting signposting.include_no_context – If True (default), consider signposts without explicit context, assuming they are about
context. If False, such signposts are ignored for assignment, but remain available fromsignposts.warn_duplicate – If True (default), warn of duplicate signposts that can’t be assigned.
- Raises:
ValueError – If
include_no_contextis false, butcontextwas not provided or None.
- citeAs: Optional[Signpost]¶
Persistent Identifier (PID) for this resource, preferred for citation and permalinks
- context: Optional[AbsoluteURI]¶
Resource URI this is the signposting for, e.g. a HTML landing page.
Documentation on other signposting attributes refer to the context as “this resource”.
This attribute is optional, None indicate no context filtering applies and that individual signposts can have any context.
- property context_url: Optional[AbsoluteURI]¶
Use
contextinstead- Type:
DEPRECATED
- describedBy: Set[Signpost]¶
Metadata resources about this resource and its items, typically in a Linked Data format.
Resources may require content negotiation, check
Signpost.typeattribute (if present) for content type, e.g.text/turtle.
- for_context(context: Optional[Union[AbsoluteURI, str]]) Signposting[source]¶
Return signposting for given context URI.
This will select an alternative view of the
signpostsfiltered by the givencontext.The remaining signposts and their contexts will be included under
signposts– any signposts with implicit context will be replaced with having an explicit context fromcontext.Tip: To ensure all signposts have explicit context, use
s.for_context(s.context)- Parameters:
context –
The context to select signposts from. The URI should be a member of
other_contextsor equal tocontext, otherwise the returned Signposting will be empty.If this parameter is None, then the individual
Signpost.contextvalues are ignored and any signposts will be considered.
- items: Set[Signpost]¶
Items contained by this resource, e.g. downloads.
The content type of the download may be available as
Signpost.typeattribute.
- linksets: Set[Signpost]¶
Linkset resources with further signposting for this resource (and potentially others).
A Linkset is a JSON or text serialization of Link headers available as a separate resource, and may be used to externalize large collection of links, e.g. thousands of
itemrelations.Resources may require content negotiation, check
Link["type"]attribute (if present) for content typesapplication/linksetorapplication/linkset+json.
- other_contexts: Set[AbsoluteURI]¶
Other resource URLs which signposting has been provided for.
Use
for_context()to retrieve their signpostings, or filter the full list of signposts fromsignpostsaccording toSignpost.context
- property signposts: AbstractSet[Signpost]¶
All FAIR Signposts with recognized relation types.
This may include any additional signposts for link relations that only expect a single link, like
citeAs, as well as any signposts for other contexts as listed inother_contexts.
- signposting.find_signposting_html(uri: Union[AbsoluteURI, str]) Signposting[source]¶
Parse HTML to find
<link>elements for signposting.HTTP redirects will be followed and any relative paths in links made absolute correspondingly.
- Parameters:
uri – An absolute http/https URI, which HTML will be inspected.
- Throws ValueError:
If the uri is invalid
- Throws IOError:
If the network request failed, e.g. connection timeout
- Throws requests.HTTPError:
If the HTTP request failed, e.g. 404 Not Found
- Throws UnrecognizedContentType:
If the HTTP resource was not a recognized HTML/XHTML content type
- Throws HTMLParser.HTMLParseError:
If the HTML could not be parsed.
- Returns:
A parsed
Signpostingobject (which may be empty)
- signposting.find_signposting_http(url: str) Signposting[source]¶
Find signposting from HTTP headers.
- Parameters:
url – The URL to request HTTP
Linkheaders from using HTTPHEAD- Returns:
A parsed
Signpostingobject of the discovered signposting
- signposting.find_signposting_http_link(headers: List[str], baseurl: Optional[str] = None) Signposting[source]¶
Find signposting among HTTP Link headers.
Links are discovered according to defined FAIR signposting relations.
- Parameters:
headers – A list of individual HTTP
Linkheaders. The headers should be valid according to RFC8288, excluding the"Link:"prefix.baseurl – Optional base URL to make relative link targets absolute from
- Returns:
A
Signpostingof the collected signposts.
- signposting.find_signposting_linkset(uri: Union[AbsoluteURI, str], acceptType: Optional[Union[MediaType, str]] = None) Signposting[source]¶
Parse linkset to find <link> elements for signposting.
HTTP redirects will be followed.
- Parameters:
uri – An absolute http/https URI, which HTML will be inspected.
acceptType – A MediaType to content-negotiate access for. The default is to content-negotiate including
application/linksetandapplication/linkset+jsonwith JSON having preference.
- Throws ValueError:
If the uri is invalid
- Throws IOError:
If the network request failed, e.g. connection timeout
- Throws requests.HTTPError:
If the HTTP request failed, e.g. 404 Not Found
- Throws UnrecognizedContentType:
If the HTTP resource was not a recognized linkset content type. This exception is also raised if
acceptTypewas provided, but didn’t match returnedContent-Type.- Throws HTMLParser.HTMLParseError:
If the HTML could not be parsed.
- Returns:
A parsed Signposting object (which may be empty)