-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The ../corpora/.../plays/{playname}
endpoint (and maybe others) include information on the segments that are taken into account when generating network edges based on co-occurences of characters.
see https://dracor.org/api/v1/corpora/rus/plays/gogol-revizor for example:
"segments" : [ {
"type" : "scene",
"title" : "Действие первое | Явление I",
"speakers" : [ "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin", "artemij_filippovich_zemljanika", "luka_lukich", "lekar" ],
"number" : 1
}, {
"type" : "scene",
"title" : "Действие первое | Явление II",
"speakers" : [ "pochtmejster", "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin" ],
"number" : 2
}
Segments are extracted by the util function dutil:get-segments($play)
and then transformed into the API output (provided by dutil:get-play-info($corpusname, $playname)
)
let $segments := array {
for $segment at $pos in dutil:get-segments($tei)
let $heads :=
$segment/(ancestor::tei:div/tei:head,tei:head)
! functx:remove-elements-deep(., ('*:note'))
! normalize-space(.)
let $speakers := dutil:distinct-speakers($segment)
return map:merge((
map {
"type": $segment/@type/string(),
"number": $pos
},
if(count($heads)) then
map {"title": string-join($heads, ' | ')}
else (),
if(count($speakers)) then map:entry(
"speakers",
array { for $sp in $speakers return $sp }
) else ()
))
}
Could we add an additional identifier to the segment that would be the xPath of the TEI element <div>
?
I would like to use these identifier to "connect" the segment to the citable unit returned by the DTS endopoints (the $ref
identifiers are based on the element's xPath addresses)
The proposed segment object could look as such:
{
"type" : "scene",
"title" : "Действие первое | Явление II",
"speakers" : [ "pochtmejster", "gorodnichij", "ammos_fedorovich_ljapkin_tjapkin" ],
"number" : 2,
"xpath" : "/TEI[1]/text[1]/body[1]/div[1]/div[2]"
}
There is a xQuery function path()
that returns this xPath, so adding this is not much effort, I tested it in the rdf generation, see
<xpath>{path($seg) => replace("Q\{http://www.tei-c.org/ns/1.0\}", "")}</xpath>
(I am removing the TEI namespace), $segments
is coming from dutil:get-segments()
let $segments-transformed := <segments>
{
for $seg in $segments
return
<sgm>
<xpath>{path($seg) => replace("Q\{http://www.tei-c.org/ns/1.0\}", "")}</xpath>
{
for $id in dutil:distinct-speakers($seg)
return <spkr>{$id}</spkr>
}
</sgm>
}
</segments>
Adding this identifier would allow users to retrieve the TEI representation of the segment, even if they are not relying on DTS.