Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 47 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ In short, SHARE/trove holds metadata records that describe things and makes thos


## Parts
a look at the tangles of communication between different parts of the system:
a slightly simplified look at the tangles of communication between different parts of the system,
as currently implemented:

```mermaid
graph LR;
Expand Down Expand Up @@ -48,6 +49,50 @@ graph LR;
subscribers-->oaipmh;
```

### /trove/ingest
a slightly simplified look at how metadata records are ingested, as currently implemented:
```mermaid
sequenceDiagram
participant ms as metadata source
box shtrove
participant ss as web server
participant sd as db (postgres)
participant sw as worker (celery)
participant sq as queues (rabbitmq)
participant si as indexer
participant se as elasticsearch
end
ms ->> ss: POST /trove/ingest
ss ->> sd: save ResourceIdentifier(s)
ss ->> sd: save Indexcard
ss ->> sd: save ResourceDescription(s)
ss ->> sq: enqueue derive task
ss ->> ms: 201 CREATED (success!)
sq -->> sw: receive derive task
sd <<-->> sw: load ResourceDescription(s)
sw ->> sd: save DerivedIndexcards
sw ->> sq: enqueue indexer message
sq -->> si: bulk receive messages
sd <<-->> si: bulk load metadata records
si ->> se: bulk index
```

### /trove/index-card-search
a slightly simplified look at how search requests are served, as currently implemented:
```mermaid
sequenceDiagram
participant c as client
box shtrove
participant ss as web server
participant sd as db (postgres)
participant se as elasticsearch
end
c ->> ss: GET /trove/index-card-search
ss <<-->> se: query for result ids (and context)
ss <<-->> sd: load metadata records
ss ->> c: respond/stream search results
```

## Code map

A brief look at important areas of code as they happen to exist now.
Expand Down Expand Up @@ -91,9 +136,7 @@ Uses the [resource description framework](https://www.w3.org/TR/rdf11-primer/#se

### Identifiers

Whenever feasible, use full URI strings to identify resources, concepts, types, and properties that may be exposed outwardly.

Prefer using open, standard, well-defined namespaces wherever possible ([DCAT](https://www.w3.org/TR/vocab-dcat-3/) is a good place to start; see `trove.vocab.namespaces` for others already in use). When app-specific concepts must be defined, use the `TROVE` namespace (`https://share.osf.io/vocab/2023/trove/`).
Whenever feasible, use full [IRI](https://www.rfc-editor.org/rfc/rfc3987.html) strings (utf-8) to identify resources, concepts, types, and properties that may be exposed outwardly (without converting to URI or using to send requests). Prefer using open, standard, well-defined namespaces wherever possible ([DCAT](https://www.w3.org/TR/vocab-dcat-3/) is a good place to start; see `trove.vocab.namespaces` for others already in use). When app-specific concepts must be defined, use the `TROVE` namespace (`https://share.osf.io/vocab/2023/trove/`).

A notable exception (non-URI identifier) is the "source-unique identifier" or "suid" -- essentially a two-tuple `(source, identifier)` that uniquely and persistently identifies a metadata record in a source repository. This `identifier` may be any string value, provided by the external source.

Expand Down
2 changes: 1 addition & 1 deletion trove/openapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def _openapi_path(path_iri: str, api_graph: primitive_rdf.RdfGraph) -> Tuple[str

def _concept_markdown_blocks(concept_iri: str, api_graph: primitive_rdf.RdfGraph) -> Generator[str, None, None]:
for _label in api_graph.q(concept_iri, RDFS.label):
yield f'## {_label.unicode_value}'
yield f'## concept: {_label.unicode_value}'
for _comment in api_graph.q(concept_iri, RDFS.comment):
yield f'<aside>{_comment.unicode_value}</aside>'
for _desc in api_graph.q(concept_iri, DCTERMS.description):
Expand Down
26 changes: 24 additions & 2 deletions trove/vocab/trove.py
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,24 @@ def _literal_markdown(text: str, *, language: str) -> literal:

the response will have the http header `Content-Disposition: attachment`
with a filename based on the query param value, current date, and response content mediatype
''', language='en')},
},
TROVE.iriShorthand: {
RDF.type: {RDF.Property, TROVE.QueryParameter},
JSONAPI_MEMBERNAME: {literal('iriShorthand', language='en')},
RDFS.label: {literal('iriShorthand', language='en')},
RDFS.comment: {literal('define a shorthand namespace or alias for IRIs in this query string', language='en')},
TROVE.jsonSchema: {literal_json({'type': 'string'})},
DCTERMS.description: {_literal_markdown('''**iriShorthand** is
a query parameter to define a shorthand name used for parsing IRIs in other query parameters

for example, a request to `/trove/index-card-search` with these query parameters:
- `iriShorthand[blarg]=https://blarg.example/vocab/`
- `iriShorthand[foo]=https://another.example/vocab/foo`
- `cardSearchFilter[blarg:prop]=foo`

will find cards with the IRI value `<https://another.example/vocab/foo>`
at the property `<https://blarg.example/vocab/prop>`
''', language='en')},
},
TROVE.cardSearchText: {
Expand Down Expand Up @@ -709,11 +727,15 @@ def _literal_markdown(text: str, *, language: str) -> literal:
DCTERMS.description: {_literal_markdown(f'''a **property-path** is
a dot-separated path of short-hand IRIs, used in several api parameters

currently the only supported shorthand is defined by [OSFMAP]({osfmap.OSFMAP_LINK})

for example, `creator.name` is parsed as a two-step path that follows
`creator` (aka `dcterms:creator`, `<http://purl.org/dc/terms/creator>`) and then `name` (aka `foaf:name`, `<http://xmlns.com/foaf/0.1/name>`)

currently, the only implied shorthand is that defined by [OSFMAP]({osfmap.OSFMAP_LINK})
-- to search on other properties, use an `iriShorthand` query param to provide an explicit
alias or namespace (e.g. with `iriShorthand[blarg]=https://blarg.example/vocab/`,
`blarg:prop1.blarg:prop2` in another param will be parsed as a two-step property-path
following `<https://blarg.example/vocab/prop1>` then `<https://blarg.example/vocab/prop2>`)

most places that allow one property-path also accept a comma-separated set of paths,
like `title,description` (which is parsed as two paths: `title` and `description`)
or `affiliation,creator.affiliation,funder` (which is parsed as three paths: `affiliation`,
Expand Down
Loading