-
Notifications
You must be signed in to change notification settings - Fork 1
Description
When a TEI file is ingested into the eXist-DB a metrics.xml
file is created that includes the calculated count-based and network metrics (of the nodes) returned by the metrics service. We currently only store the "nodes" in the metrics.xml file, the edges based on the co-occurences of characters in segments are not included. When requesting the "networkdata" via the endpoints /corpora/.../plays/{playname}/networkdata/gexf
and probably others, e.g. graphml
the edges are constructed on the fly with the function api:networkdata-gexf()
the "api.xqm". The code that creates the edges https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L1051-L1059
let $edges :=
for $spkr at $pos in $speakers
for $cooc in $links($spkr)
where index-of($speakers, $cooc)[1] gt $pos
let $weight := $segments//sgm[spkr=$spkr][spkr=$cooc] => count()
return
<edge xmlns="http://www.gexf.net/1.2draft"
id="{$spkr}|{$cooc}" source="{$spkr}" target="{$cooc}"
weight="{$weight}"/>
The code is more or less duplicated in the function api:networkdata-graphml($corpusname, $playname)
https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L1079-L1161 and the edge-data is also included in the response of the .../networkdata/csv
endpoint (https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L930-L1002)
We could:
- refactor the code and move the network edge and nodes generation to the
metrics.xqm
module in which the functions that generate the network metrics with the help of the metrics service are contained, e.g. https://github.com/dracor-org/dracor-api/blob/main/modules/metrics.xqm#L81-L86 - not only serialize the nodes to the
metrics.xml
but also include the edge data there as a single source of truth for the API endpoints that return the network data in formats like gexf.
I just discovered this because I am planning to serialize the edges as well as the nodes in the rdf output in rdf.xqm
. While I can simply rely on metrics.xml and the util functions to get the nodes, the edges need either be retrieved from one of the output serializations, e.g. gexf or, what I am currently doing, duplicate the the edge generation code from the gexf api function. I would rather re-use an util function or retrieve this information from a single source of truth, e.g. the metrics.xml
file.
Implementing this is not urgent, I am just adding this issue so that we do not forget should we ever get to a substantial refactoring of the code.