signposting¶
Finding signposting in FAIR resources.
This library helps client to discover links that follow the FAIR signposting conventions.
This can then be used to navigate between:
Persistent identifiers
HTML landing pages
File downloads/items
Structured metadata
The library works by inspecting the HTTP messages for
Link:
headers from a given URI with find_signposting_http()
, which
which categorize them by their rel
Link relation into a
Signposting
object with absolute URIs.
It is up to the clients of this library to decide how to further
navigate or retrieve the associated resources, e.g. using a
RDF library like rdflib
.
This library also provide ways to discover
FAIR signposting in HTML <link>
annotations and in
Link set documents. Future versions may provide ways to
discover/merge these concurrently.
- class signposting.AbsoluteURI(value: str, base: Optional[str] = None)[source]¶
An absolute URI, e.g. “http://example.com/”
Create and validate an absolute URI reference.
Note that IRIs are not supported unless
%
-encoded.- Parameters:
value – URI string to validate as Absolute URI. May be a relative URI reference if base is provided.
base – (Optional) URI used to resolve the potentially relative URI reference, otherwise value must be an absolute URI.
- Raises:
ValueError – if the final URI reference is invalid or not absolute.
- class signposting.LinkRel(value)[source]¶
A link relation as used in Signposting.
Link relations are defined by RFC8288, but only link relations listed in FAIR and signposting conventions are included in this enumerator.
A link relation enum can be looked up from its RFC8288 value by calling
LinkRel("cite-as")
– note that this particular example has a different Python-compatible spelling in it’s enum name (LinkRel.cite_as
).
- class signposting.MediaType(value=<class 'str'>)[source]¶
An IANA media type, e.g.
text/plain
.This class ensures the type string is valid according to RFC6838 and for convenience converts it to lowercase.
While the constructor do check that the main type is an official IANA subtree (see
MAIN
), it does not enforce the individual subtype to be registered. In particular RFC6838 permits unregistered subtypes starting withvnd.
,prs.
andx.
Extra content type parameters such as
;profile=http://example.com/
are not supported by this class, as they do not form part of the media type registration.Construct a MediaType.
Throws ValueError
- MAIN = ['application', 'audio', 'example', 'font', 'image', 'message', 'model', 'multipart', 'text', 'video']¶
Top level type trees as of 2022-05-17 in IANA registry
- main: str¶
The main type, e.g.
image
- sub: str¶
The sub-type, e.g.
jpeg
- class signposting.Signpost(rel: Union[LinkRel, str], target: Union[AbsoluteURI, str], media_type: Optional[Union[MediaType, str]] = None, profiles: Optional[Union[AbstractSet[AbsoluteURI], str]] = None, context: Optional[Union[AbsoluteURI, str]] = None, link: Optional[Link] = None)[source]¶
An individual link of Signposting, e.g. for
rel=cite-as
.This is a convenience class that may be wrapping a
link
or otherwise constructed.In some case the link relation may have additional attributes, e.g.
signpost.link["title"]
– the purpose of this class is however to lift only the navigational attributes for FAIR Signposting.Construct a Signpost from a link relation.
- Parameters:
rel – Link relation, e.g.
"cite-as"
target – URI (e.g.
"http://example.com/pid-01"
)media_type – Optional expected media type of the target (e.g.
"text/html"
)context – Optional URI this is a signposting from (e.g.
"http://example.com/page-01.html"
) (calledanchor
in Link header)link – Optional origin
Link
header (not parsed further) for further attributes
- Raises:
ValueError – If a plain string value is invalid for the corresponding type-checked classes
LinkRel
,AbsoluteURI
orMediaType
,
- context: Optional[AbsoluteURI]¶
Resource URL this is the signposting for, e.g. a HTML landing page.
Note that following HTTP redirections means this URI may be different from the one originally requested.
This attribute is optional (with
None
indicating unknown context), context may be implied from the resource, e.g. as indicated bySignposting.context
- link: Optional[Link]¶
The
Link
object this signpost was created from.May contain additional attributes such as
link["title"]
. Note that a single Link may have multiplerel
relations, therefore it is possible that multipleSignpost
instances refer to the same link.
- profiles: FrozenSet[AbsoluteURI]¶
Profile URIs for the target with the given type.
Profiles are mainly identifiers, indicating that a particular convention or subtype should be expected in the target’s .
For instance, a
rel=describedby
signpost to a JSON-LD document can havetype=application/ld+json
andprofile=http://www.w3.org/ns/json-ld#compacted
There may be multiple profiles, or (more commonly) none.
- target: AbsoluteURI¶
The URI that is the target of this link, e.g.
http://example.com/
Note that URIs with Unicode characters will be represented as %-escaped URIs rather than as IRIs.
- type: Optional[MediaType]¶
The media type of the target.
It is recommended to use this type in content-negotiation for retrieving the target URI.
This property is optional, and should only be expected if
rel
isLinkRel.describedby
orLinkRel.item
- with_context(context: Optional[Union[AbsoluteURI, str]]) Signpost [source]¶
Create a copy of this signpost, but with the specified context.
If the context is
None
, it means the copy will not have a context.
- class signposting.Signposting(context: Optional[Union[AbsoluteURI, str]] = None, signposts: Optional[Iterable[Signpost]] = None, include_no_context: bool = True, warn_duplicate=True)[source]¶
Signposting links for a given resource.
Links are categorized according to FAIR signposting conventions and split into different attributes like
citeAs
ordescribedBy
.It is possible to iterate over this class or use the
signposts
property to find all recognized signposts.Note that in the case of a resource not having any signposts, instances of this class are considered false.
The constructor takes a an iterable of
Signpost
.Signposts are filtered by the matching context (if provided), then assigned to attributes like
citeAs
ordescribedBy
depending on theirSignpost.rel
link relation.Multiple signposts discovered for singular relations like
citeAs
are ignored in this attribute assignment, however these are included in the Iterable interface of this class and thus also in its length.A Signposting object is equivalent to boolean False in conditional expression if it is empty, that is
len(signposting)==0
, indicating no signposts were discovered for the given context. However the remaining signposts will still be available fromsignposts
, as indicated byother_contexts
and retrievable withfor_context()
.- Parameters:
context – the resource to select signposting for, or any signposts if
None
.signposts – An iterable of
Signpost
that should be considered for selecting signposting.include_no_context – If True (default), consider signposts without explicit context, assuming they are about
context
. If False, such signposts are ignored for assignment, but remain available fromsignposts
.warn_duplicate – If True (default), warn of duplicate signposts that can’t be assigned.
- Raises:
ValueError – If
include_no_context
is false, butcontext
was not provided or None.
- citeAs: Optional[Signpost]¶
Persistent Identifier (PID) for this resource, preferred for citation and permalinks
- context: Optional[AbsoluteURI]¶
Resource URI this is the signposting for, e.g. a HTML landing page.
Documentation on other signposting attributes refer to the context as “this resource”.
This attribute is optional, None indicate no context filtering applies and that individual signposts can have any context.
- property context_url: Optional[AbsoluteURI]¶
Use
context
instead- Type:
DEPRECATED
- describedBy: Set[Signpost]¶
Metadata resources about this resource and its items, typically in a Linked Data format.
Resources may require content negotiation, check
Signpost.type
attribute (if present) for content type, e.g.text/turtle
.
- for_context(context: Optional[Union[AbsoluteURI, str]]) Signposting [source]¶
Return signposting for given context URI.
This will select an alternative view of the
signposts
filtered by the givencontext
.The remaining signposts and their contexts will be included under
signposts
– any signposts with implicit context will be replaced with having an explicit context fromcontext
.Tip: To ensure all signposts have explicit context, use
s.for_context(s.context)
- Parameters:
context –
The context to select signposts from. The URI should be a member of
other_contexts
or equal tocontext
, otherwise the returned Signposting will be empty.If this parameter is None, then the individual
Signpost.context
values are ignored and any signposts will be considered.
- items: Set[Signpost]¶
Items contained by this resource, e.g. downloads.
The content type of the download may be available as
Signpost.type
attribute.
- linksets: Set[Signpost]¶
Linkset resources with further signposting for this resource (and potentially others).
A Linkset is a JSON or text serialization of Link headers available as a separate resource, and may be used to externalize large collection of links, e.g. thousands of
item
relations.Resources may require content negotiation, check
Link["type"]
attribute (if present) for content typesapplication/linkset
orapplication/linkset+json
.
- other_contexts: Set[AbsoluteURI]¶
Other resource URLs which signposting has been provided for.
Use
for_context()
to retrieve their signpostings, or filter the full list of signposts fromsignposts
according toSignpost.context
- property signposts: AbstractSet[Signpost]¶
All FAIR Signposts with recognized relation types.
This may include any additional signposts for link relations that only expect a single link, like
citeAs
, as well as any signposts for other contexts as listed inother_contexts
.
- signposting.find_signposting_html(uri: Union[AbsoluteURI, str]) Signposting [source]¶
Parse HTML to find
<link>
elements for signposting.HTTP redirects will be followed and any relative paths in links made absolute correspondingly.
- Parameters:
uri – An absolute http/https URI, which HTML will be inspected.
- Throws ValueError:
If the uri is invalid
- Throws IOError:
If the network request failed, e.g. connection timeout
- Throws requests.HTTPError:
If the HTTP request failed, e.g. 404 Not Found
- Throws UnrecognizedContentType:
If the HTTP resource was not a recognized HTML/XHTML content type
- Throws HTMLParser.HTMLParseError:
If the HTML could not be parsed.
- Returns:
A parsed
Signposting
object (which may be empty)
- signposting.find_signposting_http(url: str) Signposting [source]¶
Find signposting from HTTP headers.
- Parameters:
url – The URL to request HTTP
Link
headers from using HTTPHEAD
- Returns:
A parsed
Signposting
object of the discovered signposting
- signposting.find_signposting_http_link(headers: List[str], baseurl: Optional[str] = None) Signposting [source]¶
Find signposting among HTTP Link headers.
Links are discovered according to defined FAIR signposting relations.
- Parameters:
headers – A list of individual HTTP
Link
headers. The headers should be valid according to RFC8288, excluding the"Link:"
prefix.baseurl – Optional base URL to make relative link targets absolute from
- Returns:
A
Signposting
of the collected signposts.
- signposting.find_signposting_linkset(uri: Union[AbsoluteURI, str], acceptType: Optional[Union[MediaType, str]] = None) Signposting [source]¶
Parse linkset to find <link> elements for signposting.
HTTP redirects will be followed.
- Parameters:
uri – An absolute http/https URI, which HTML will be inspected.
acceptType – A MediaType to content-negotiate access for. The default is to content-negotiate including
application/linkset
andapplication/linkset+json
with JSON having preference.
- Throws ValueError:
If the uri is invalid
- Throws IOError:
If the network request failed, e.g. connection timeout
- Throws requests.HTTPError:
If the HTTP request failed, e.g. 404 Not Found
- Throws UnrecognizedContentType:
If the HTTP resource was not a recognized linkset content type. This exception is also raised if
acceptType
was provided, but didn’t match returnedContent-Type
.- Throws HTMLParser.HTMLParseError:
If the HTML could not be parsed.
- Returns:
A parsed Signposting object (which may be empty)