+
Skip to content

Add (xPath) identifier to segment #342

@ingoboerner

Description

@ingoboerner

The ../corpora/.../plays/{playname} endpoint (and maybe others) include information on the segments that are taken into account when generating network edges based on co-occurences of characters.

see https://dracor.org/api/v1/corpora/rus/plays/gogol-revizor for example:

"segments" : [ {
    "type" : "scene",
    "title" : "Действие первое | Явление I",
    "speakers" : [ "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin", "artemij_filippovich_zemljanika", "luka_lukich", "lekar" ],
    "number" : 1
  }, {
    "type" : "scene",
    "title" : "Действие первое | Явление II",
    "speakers" : [ "pochtmejster", "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin" ],
    "number" : 2
  }

Segments are extracted by the util function dutil:get-segments($play) and then transformed into the API output (provided by dutil:get-play-info($corpusname, $playname))

let $segments := array {
      for $segment at $pos in dutil:get-segments($tei)
      let $heads :=
        $segment/(ancestor::tei:div/tei:head,tei:head)
          ! functx:remove-elements-deep(., ('*:note'))
          ! normalize-space(.)
      let $speakers := dutil:distinct-speakers($segment)
      return map:merge((
        map {
          "type": $segment/@type/string(),
          "number": $pos
        },
        if(count($heads)) then
          map {"title": string-join($heads, ' | ')}
        else (),
        if(count($speakers)) then map:entry(
          "speakers",
          array { for $sp in $speakers return $sp }
        ) else ()
      ))
    }

Could we add an additional identifier to the segment that would be the xPath of the TEI element <div>?
I would like to use these identifier to "connect" the segment to the citable unit returned by the DTS endopoints (the $ref identifiers are based on the element's xPath addresses)

The proposed segment object could look as such:

{
    "type" : "scene",
    "title" : "Действие первое | Явление II",
    "speakers" : [ "pochtmejster", "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin" ],
    "number" : 2,
    "xpath" : "/TEI[1]/text[1]/body[1]/div[1]/div[2]"
  }

There is a xQuery function path() that returns this xPath, so adding this is not much effort, I tested it in the rdf generation, see
<xpath>{path($seg) => replace("Q\{http://www.tei-c.org/ns/1.0\}", "")}</xpath> (I am removing the TEI namespace), $segments is coming from dutil:get-segments()

let $segments-transformed := <segments>
          {
            for $seg in $segments 
            return
              <sgm>
                <xpath>{path($seg) => replace("Q\{http://www.tei-c.org/ns/1.0\}", "")}</xpath>
                {
                  for $id in dutil:distinct-speakers($seg)
                  return <spkr>{$id}</spkr>
                }
              </sgm>
          }
        </segments>

Adding this identifier would allow users to retrieve the TEI representation of the segment, even if they are not relying on DTS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载