+
Skip to content

Serialize network edges to metrics.xml #341

@ingoboerner

Description

@ingoboerner

When a TEI file is ingested into the eXist-DB a metrics.xml file is created that includes the calculated count-based and network metrics (of the nodes) returned by the metrics service. We currently only store the "nodes" in the metrics.xml file, the edges based on the co-occurences of characters in segments are not included. When requesting the "networkdata" via the endpoints /corpora/.../plays/{playname}/networkdata/gexf and probably others, e.g. graphml the edges are constructed on the fly with the function api:networkdata-gexf() the "api.xqm". The code that creates the edges https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L1051-L1059

let $edges :=
        for $spkr at $pos in $speakers
          for $cooc in $links($spkr)
          where index-of($speakers, $cooc)[1] gt $pos
          let $weight := $segments//sgm[spkr=$spkr][spkr=$cooc] => count()
          return
            <edge xmlns="http://www.gexf.net/1.2draft"
            id="{$spkr}|{$cooc}" source="{$spkr}" target="{$cooc}"
            weight="{$weight}"/>

The code is more or less duplicated in the function api:networkdata-graphml($corpusname, $playname)
https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L1079-L1161 and the edge-data is also included in the response of the .../networkdata/csv endpoint (https://github.com/dracor-org/dracor-api/blob/main/modules/api.xqm#L930-L1002)

We could:

  • refactor the code and move the network edge and nodes generation to the metrics.xqm module in which the functions that generate the network metrics with the help of the metrics service are contained, e.g. https://github.com/dracor-org/dracor-api/blob/main/modules/metrics.xqm#L81-L86
  • not only serialize the nodes to the metrics.xml but also include the edge data there as a single source of truth for the API endpoints that return the network data in formats like gexf.

I just discovered this because I am planning to serialize the edges as well as the nodes in the rdf output in rdf.xqm. While I can simply rely on metrics.xml and the util functions to get the nodes, the edges need either be retrieved from one of the output serializations, e.g. gexf or, what I am currently doing, duplicate the the edge generation code from the gexf api function. I would rather re-use an util function or retrieve this information from a single source of truth, e.g. the metrics.xml file.

Implementing this is not urgent, I am just adding this issue so that we do not forget should we ever get to a substantial refactoring of the code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载