CN117194434B

CN117194434B - Learning-based trajectory indexing and query method based on homogeneous region partitioning on directed road networks

Info

Publication number: CN117194434B
Application number: CN202311235731.XA
Authority: CN
Inventors: 韩京宇; 刘阳; 郎杭; 武凡
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2025-08-08
Anticipated expiration: 2043-09-22
Also published as: CN117194434A

Abstract

The invention provides a learning type track index and query method based on homogeneous region division on a directional road, which is characterized in that a subsequent array is respectively arranged for each directional road section, the traffic ratio is recorded, the directional road section is searched from the center of the road network in a depth-first mode, the unique serial number of the directional road section is determined, the directional road section is firstly divided into a plurality of classes according to the length and then divided into a plurality of first-order homogeneous region classes according to the number of lanes, all the directional road sections in each first-order homogeneous region class are divided into a plurality of second-order homogeneous region classes, the serial number of a track point in each second-order homogeneous region class is determined, a piecewise linear model of a three-dimensional space is obtained for each second-order homogeneous region class in a training mode to serve as a track point storage position predictor, track query is performed, an original query result set is returned, the method can ensure that track data in each second-order homogeneous region class are uniformly distributed, query performance and prediction accuracy can be improved, and index query efficiency can be improved.

Description

Learning track index and query method based on homogeneous region division on directed road network

Technical Field

The invention relates to a learning type track index and query method based on homogeneous region division on a directional road network, and belongs to the technical field of road network track data index.

Background

Under the large data environment that the data is explosive-type and large-scale aggregated, and is high-dimensional and complex, the traditional index structure has the problems of high space cost, low query efficiency, high access cost and the like when processing the large data. The learning type index effectively improves the index performance and reduces the access space overhead by modeling and learning the characteristics of the bottom data distribution, the query load and the like.

The multidimensional learning type index is divided into three types, namely an index based on the Leeberg measure, an index based on a space filling curve and an index based on data division. LI PENGFEI et al, in the paper LISA A Learned Index Structure for SPATIAL DATA, propose LISA. The index firstly divides a multidimensional data space into grids, the data are ordered according to grid units, then a function is constructed by adopting a method based on the Leeberg measure to reduce the dimension of the multidimensional data into one-dimensional values, then a monotonic slicing prediction function is used for dividing the one-dimensional data space into a plurality of slices, and each one-dimensional value is mapped into a slice ID. And finally, training a local model to distribute each fragment to a corresponding disk page, wherein the input of the model is a one-dimensional value contained in the fragment, and the output is a page address corresponding to the one-dimensional value.

Based on the index of the space-filling curve, qi Jianzhong et al in the paper "EFFECTIVELY LEARNING SPATIAL INDICES" set forth an index of RSMI, which proposes a recursive strategy that partitions a large dataset, acquires the curve value for each partition using the space-filling curve, and trains a multi-layer perceptron to learn the index of each partition. Then, a rank space-based ordering technique is introduced into each partition to establish ordering of point data, the points are grouped into blocks, a multi-layer perceptron is trained to learn distribution of two-dimensional numerical values, wherein model input is point coordinates, and output is block IDs corresponding to the coordinates.

Indexing based on data partitioning, davitkova et al in The paper "The ML-Index: A multidimensional, learned Index for point, range, AND NEAREST-neighbor queries" propose ML indexing that uses a method similar to The IDISTANCE method for dimension reduction. The index structure mainly comprises a first part and a second part, wherein the first part is used for calculating Euclidean distance between each reference point and data, dividing the data which is close to the reference point into partitions corresponding to the reference points, then calculating one-dimensional key values of each data according to partition offset and the distance between each data and the reference point of the partition, and the second part is used for learning the distribution of the one-dimensional key values by adopting an RMI model, wherein all models are linear models in a two-dimensional space.

Although all three methods convert multidimensional data into one-dimensional data through a dimension reduction mode and then train a model to learn the distribution of the one-dimensional data, when the methods encounter oblique data, the model prediction accuracy is often inaccurate. In order to better learn the distribution characteristics of multidimensional inclined data, the data space is divided according to the data distribution, the inclined data is divided into a plurality of uniformly distributed areas, and an error-controllable lightweight model is trained to learn the distribution relation of the multidimensional data, so that the query efficiency of the index is improved, and the storage size of the index is reduced.

The above-mentioned problems are those that should be considered and addressed in learning track indexing and querying processes based on homogeneous region partitioning on a directional road network.

Disclosure of Invention

The invention aims to provide a learning track index and query method based on homogeneous region division on a directional road network, which solves the problems of low query efficiency, and to-be-improved query performance and prediction accuracy in the prior art.

The technical scheme of the invention is as follows:

A learning track index and inquiry method based on homogeneous region division on directional road network includes such steps as,

S1, setting a subsequent array for each directed road section, scanning all track data on the road network, and recording traffic ratio values in the subsequent array;

S2, dividing the directed road sections into a plurality of classes according to the length, dividing each class into a plurality of first-order homogeneous region classes according to the number of lanes, dividing all the directed road sections in each first-order homogeneous region class into a plurality of second-order homogeneous region classes according to the length and the number of lanes of the precursor directed road sections, and determining the sequence of the second-order homogeneous region classes;

S3, training to obtain a piecewise linear model of the three-dimensional space as a track point storage position predictor, namely learning type track index, inputting track points during training, and outputting storage position serial numbers in the second-order homogeneous region class of the track points;

s4, acquiring a road network range and a time range of the original query, performing track query, decomposing the query into a plurality of sub-queries according to the second-order homogeneous region class, calling a corresponding piecewise linear model to predict, acquiring a sub-query result set, screening, combining all the sub-query result sets, and returning to the original query result set.

Further, in step S1, the road network is a directed graph formed by directed road segments and intersections, each directed road segment is represented by three components including a directed road segment number, a length and a lane number, the track data includes a track name, a track point sequence and a track passing directed road segment, wherein the track points of the track point sequence are represented by two components including a road network position and a time, the road network position is a floating point number, the integer part is the directed road segment number where the point is located, and the decimal part is the relative position of the point on the directed road segment.

Further, in step S1, the traffic ratio is a ratio of the number of trajectories passing through the current and subsequent directed road segments simultaneously to the total number of trajectories passing through the current directed road segment.

Further, in step S1, the depth-first search is started from the road network center directed road segment, and the unique directed road segment number is determined according to the order of the depth-first search, specifically, the depth-first search is started from the road network center directed road segment, and when each time of depth-down search, the subsequent directed road segment with a large traffic ratio is preferentially selected as the next access target, until all the directed road segments are depth-first searched, the directed road segment number starts from 0, and the accessed order of the directed road segments is the directed road segment number.

Further, the directional road section of the road network center is obtained by adding the coordinates of the midpoints of all the directional road sections of the directional road network and calculating an average coordinate, wherein the average coordinate is regarded as the road network center, and the directional road section with the shortest distance from the center of the road network among the directional road sections is regarded as the directional road section of the road network center.

Further, in step S2, the directional road segments are first divided into a plurality of classes according to the lengths, each class is then divided into a plurality of first-order homogeneous region classes according to the number of lanes, all the directional road segments in each first-order homogeneous region class are divided into a plurality of second-order homogeneous region classes according to the lengths and the number of lanes of the precursor directional road segments, specifically,

S21, counting the total length Len of all the directed road sections, setting N as the classification number of the directed road sections, and averaging the total length of the directed road sections contained in each class to be Len/N;

S22, sorting the directed road sections according to the ascending sequence of the length, accumulating the lengths of the directed road sections, calculating the previously accumulated directed road sections as a class if the accumulated length is greater than or equal to Len/N, then re-accumulating the directed road sections from the next directed road section, and the like, and finally dividing N classes;

s23, in each directed road section class divided in the step S22, the directed road sections are divided into 2 classes according to the number of lanes, the number of lanes is less than or equal to the set number and is divided into one class, and the number of lanes is greater than or equal to the set number and is divided into another class, so that all the directed road sections are divided into 2X N first-order homogeneous region classes;

S24, counting the sum of the flows of the precursor directional road sections of all the directional road sections in each first-order homogeneous region class, recording as flow ^sum, setting M as the number of second-order homogeneous directional road sections, and averaging the total flows of the precursor directional road sections contained in each class as flow ^sum/M;

S25, sorting the directed road sections according to the ascending order of the precursor directed road section flow, accumulating the precursor directed road section flow, counting the previously accumulated directed road sections as a class if the accumulated precursor directed road section flow is more than or equal to flow ^sum/M, and then re-accumulating from the next directed road section;

S26, repeating the steps S24-S25, and dividing each first-order homogeneous region class into M second-order homogeneous region classes, so that all the directed road sections in the road network are divided into 2 XN X M second-order homogeneous region classes.

Further, in step S24, the precursor directional road section flow is the sum of the directional road section flows of all the precursor directional road sections of the current directional road section, the precursor directional road section is a connected directional road section directly driven into the current directional road section, and the directional road section flow is the product of the length of the directional road section and the number of lanes.

Further, in step S3, for each second-order homogeneous region class, a piecewise linear model of a three-dimensional space is trained to be used as a trajectory point storage location predictor, i.e., a learning type trajectory index, specifically,

S31, training a piecewise linear model of a three-dimensional space in each second-order homogeneous region, inputting track points (x, y), wherein x represents road network positions, y represents time, and outputting a sequence number in a region where z represents the track points;

s32, setting a maximum error threshold value theta, connecting the ordered track points, namely the first point and the last point into a three-dimensional straight line, and solving a three-dimensional straight line equation by using a least square method:

Wherein P ₀(x₀,y₀,z₀) is any point on the three-dimensional straight line, Is a direction vector of a three-dimensional straight line. The above equation is converted to a system of equations:

Then traversing all intermediate points, substituting the above equation set to obtain z ₁ and z ₂, wherein z ₁ and z ₂ are the z coordinate values of the projection points of the three-dimensional straight line on the xz plane and the yz plane respectively, and calculating the predicted value Calculating prediction errorRecording the maximum prediction error and the maximum error point;

S33, if the maximum prediction error is larger than theta, segmenting at the maximum error point, adding the error point into the half sections with few points of the left half section and the right half section, repeating the steps above in the left half section and the right half section until the maximum prediction error is smaller than theta, and finally obtaining the piecewise linear model.

Further, in step S4, the road network range and the time range of the original query are obtained, the track query is performed, the query is decomposed into a plurality of sub-queries according to the second-order homogeneous region class, the corresponding piecewise linear model is invoked for prediction, the sub-query result set is obtained, after screening, all the sub-query result sets are combined, the original query result set is returned, specifically,

S41, acquiring a road network range [ pos _S,pos_E ] and a time range [ t _S,t_E ] of an original query, wherein pos _S is a starting road network position, pos _E is a stopping road network position, t _S is a starting time, and t _E is a stopping time;

S42, decomposing a road network range [ pos _S,pos_E ] in an original query into a plurality of sub-road network ranges [ pos _s,pos_e ] according to a second-order homogeneous region class, wherein pos _s is a starting road network position in the second-order homogeneous region class, pos _e is a stopping road network position in the second-order homogeneous region class, each sub-road network range [ pos _s,pos_e ] and a time range [ t _S,t_E ] form a sub-query q (pos _s,t_S,pos_e,t_E), and each sub-query q (pos _s,t_S,pos_e,t_E) corresponds to one second-order homogeneous region class;

S43, for each sub-query q (pos _s,t_S,pos_e,t_E), calling a corresponding piecewise linear model, and respectively inputting (pos _s,t_S) and (pos _e,t_E) to obtain a predicted value AndThen combining the maximum error threshold value theta of the piecewise linear model in the corresponding second-order similar region class, and obtaining the rangeAcquiring a candidate track point set, screening track points in a sub-query range, and acquiring a sub-query result set;

s44, repeating the step S43, obtaining all sub-query result sets, combining all sub-query result sets, and returning to the original query result set.

The beneficial effects of the invention are as follows:

1. compared with the prior art, the learning type track index and query method based on homogeneous region division on the directional road network divides the road network space according to the similarity of road network traffic, can ensure the track data in each second-order homogeneous region class to be uniformly distributed, reduces model segmentation, can improve query performance and prediction precision, and can improve index query efficiency.

2. According to the learning type track index and query method based on homogeneous region division on the directional road network, depth-first search is carried out based on connectivity mining labels, unique serial numbers of the directional road sections are determined, the traveling behaviors of vehicles are considered, and the consistency of the traveling sequence of the vehicles and the road sequencing is ensured.

3. According to the invention, a piecewise linear model of a three-dimensional space is trained for each second-order homogeneous region class, so that the index size is greatly reduced, the storage cost is reduced, the prediction error of the model is controlled, the number of blocks accessed by a disk during track inquiry is further reduced, and the index inquiry performance is improved.

4. The learning track index and query method based on homogeneous region division on the directional road network not only ensures consistency of directional road section sequencing and vehicle traveling sequence, but also divides a plurality of homogeneous regions uniformly distributed, greatly improves query efficiency of indexes on inclined data, ensures prediction errors of piecewise linear models, and improves query performance of indexes.

Drawings

FIG. 1 is a flow diagram of a learning track index and query method based on homogeneous region division on a directional road network according to an embodiment of the present invention;

FIG. 2 is an explanatory diagram of a road network of a specific example of a learning type track index and query method based on homogeneous region division on the directional road network of an embodiment;

Fig. 3 is an explanatory diagram of directional road segment information in a road network based on a specific example of a learning type track index and query method of homogeneous region division on the directional road network according to an embodiment;

fig. 4 is an explanatory diagram of track data in a road network based on a specific example of a learning-type track index and query method of homogeneous region division on a directional road network according to an embodiment.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

A learning track index and query method based on homogeneous region division on a directional road network, as shown in figure 1, comprises the following steps,

S1, setting a subsequent array for each directed road section, scanning all track data on the road network, recording traffic ratio in the subsequent array, starting depth-first search from the directed road section in the center of the road network after scanning, and determining a unique directed road section number according to the sequence of the depth-first search.

In step S1, the road network is a directed graph formed by directed road segments and intersections, each directed road segment is represented by three components of a directed road segment number, a length and a lane number, the track points are represented by two components of a road network position and a time, the road network position is a floating point number, the integer part is the directed road segment number where the point is located, and the decimal part is the relative position of the point on the directed road segment.

In step S1, the traffic ratio is a ratio of the number of trajectories passing through the current and subsequent directed road segments simultaneously to the total number of trajectories passing through the current directed road segment.

In step S1, the depth-first search is started from the road network center directed road segment, and the unique directed road segment number is determined according to the sequence of the depth-first search, specifically, the depth-first search is started from the road network center directed road segment, when each time the road network center directed road segment is searched in depth, the subsequent directed road segment with a large traffic ratio is preferentially selected as the next access target, until all the directed road segments are depth-first searched, the directed road segment number is started from 0, and the accessed sequence of the directed road segments is the directed road segment number. The directional road section of the road network center is obtained by adding the midpoint coordinates of all the directional road sections of the directional road network and calculating an average coordinate, wherein the average coordinate is regarded as the road network center, and the directional road section with the shortest distance from the midpoint of the directional road section to the road network center is regarded as the directional road section of the road network center.

S2, dividing the directed road sections into a plurality of classes according to the length, dividing each class into a plurality of first-order homogeneous region classes according to the number of lanes, dividing all the directed road sections in each first-order homogeneous region class into a plurality of second-order homogeneous region classes according to the length and the number of lanes of the precursor directed road sections, determining the sequence of the second-order homogeneous region classes, and then, in each second-order homogeneous region class, sorting the track points according to time, sorting according to the road network position and determining the sequence numbers of the track points in the second-order homogeneous region classes.

In step S2, the directional road segments are first divided into a plurality of classes according to the length, each class is then divided into a plurality of first-order homogeneous region classes according to the number of lanes, all the directional road segments in each first-order homogeneous region class are divided into a plurality of second-order homogeneous region classes according to the length and the number of lanes of the precursor directional road segments, specifically,

S23, in each directed road section class divided in the step S22, the class is divided into 2 classes according to the number of lanes, the number of lanes is less than or equal to 2 and the class is divided into one class, and the number of lanes is more than or equal to 3 and the class is divided into the other class, so that all the directed road sections are divided into 2X N first-order homogeneous region classes.

in step S24, the precursor directional road section flow is the sum of the directional road section flows of all the precursor directional road sections of the current directional road section, the precursor directional road section is a connected directional road section directly driven into the current directional road section, and the directional road section flow is the product of the length of the directional road section and the number of lanes.

S3, training to obtain a piecewise linear model of a three-dimensional space for each second-order homogeneous region class as a track point storage position predictor, namely learning type track index, inputting track points during training, and outputting storage position serial numbers in the second-order homogeneous region class of the track points.

In step S3, for each second-order homogeneous region class, a piecewise linear model of a three-dimensional space is trained to be used as a trajectory point storage position predictor, i.e. a learning type trajectory index, specifically,

S43, for each sub-query q (pos _s,t_S,pos_e,t_E), calling a corresponding piecewise linear model, and respectively inputting (pos _s,t_S) and (pos _e,t_E) to obtain a predicted value AndThen combining the maximum prediction error theta of the piecewise linear model in the corresponding second-order similar region class, and obtaining the rangeAcquiring a candidate track point set, screening track points in a sub-query range, and acquiring a sub-query result set;

The learning type track index and query method based on homogeneous region division on the directional road network divides the road network space according to the similarity of road network traffic, can ensure the track data in each second-order homogeneous region class to be uniformly distributed, reduces model segmentation, can improve query performance and prediction precision, can improve index query efficiency, and solves the problem of query performance deterioration of track indexes on inclined data.

The learning type track index and query method based on homogeneous region division on the directed road network is characterized in that depth-first search is conducted based on excavation connectivity to determine the sequence of directed road sections, first-order homogeneous region classes are divided according to the length of the directed road sections and the number of lanes, second-order homogeneous region classes are further divided according to the precursor directed road sections of the directed road sections, the sequence of track points is determined in each second-order homogeneous region class, a piecewise linear model of a three-dimensional space is trained for each second-order homogeneous region class, mapping relation between the track points and storage sequence numbers is fitted, and learning type track indexes are constructed to support track query.

According to the learning type track index and query method based on homogeneous region division on the directional road network, depth-first search is carried out based on connectivity mining labels, unique serial numbers of the directional road sections are determined, the traveling behaviors of vehicles are considered, and the consistency of the traveling sequence of the vehicles and the road sequencing is ensured.

According to the invention, a piecewise linear model of a three-dimensional space is trained for each second-order homogeneous region class, so that the index size is greatly reduced, the storage cost is reduced, the prediction error of the model is controlled, the number of blocks accessed by a disk during track inquiry is further reduced, and the index inquiry performance is improved.

The learning track index and query method based on homogeneous region division on the directional road network not only ensures consistency of directional road section sequencing and vehicle traveling sequence, but also divides a plurality of homogeneous regions uniformly distributed, greatly improves query efficiency of indexes on inclined data, ensures prediction errors of piecewise linear models, and improves query performance of indexes.

Specific examples of such a learning type track index and query method based on homogeneous region division on the directional road network of the embodiment are as follows:

As shown in fig. 2, in the road network, there are 9 directional road segments, respectively, seg1, seg2, seg3, seg4, seg5, seg6, seg7, seg8, and seg9, the directional road segment information is specifically shown in fig. 3, there are 7 tracks tra1, tra2, tra3, tra4, tra5, tra6, and tra7, and the track data is specifically shown in fig. 4. The track data comprises a track name, a track point sequence and a track passing directional road section, wherein the track point of the track point sequence consists of two components of a road network position and time, and the track data comprises the following specific components:

p1＝(seg1.24,0),p2＝(seg1.43,0),p3＝(seg1.78,3),p4＝(seg1.96,0),p5＝(seg4.12,6),p6＝(seg4.85,0),p7＝(seg2.33,3),p8＝(seg3.32,3),p9＝(seg6.67,3),p10＝(seg5.31,9),p11＝(seg5.90,0),p12＝(seg8.35,0),p13＝(seg8.68,3),p14＝(seg7.43,3),p15＝(seg9.23,12),p16＝(seg9.35,0),p17＝(seg9.65,15).

Step S1, setting a subsequent array for each directed road section, scanning all track data on the road network, and recording traffic ratio in the subsequent array, wherein each element in the subsequent array is (the name of the directed road section, the traffic ratio), and the subsequent arrays of seg1, seg4 and seg5 are shown in Table 1:

table 1 subsequent arrays of directed segments seg1, seg4, and seg5

Directed road segment number	Subsequent arrays
		seg1	[(seg2,0.33),(seg3,0.33),(seg4,0.33)]
seg4	[(seg5,0.5),(seg6,0.5)]
		seg5	[(seg7,0.5),(seg8,0),(seg9,0.5)]

The subsequent arrays of the remaining directed road segments are empty.

After the scanning is finished, starting depth-first search from the road network center directed road section, determining a unique directed road section number according to the sequence of the depth-first search, wherein the depth-first search algorithm comprises the following steps:

And adding the midpoint coordinates of all the directed road sections, and calculating an average coordinate, wherein the average coordinate is regarded as a road network center, the directed road section with the shortest distance from the road network center among the directed road sections is regarded as the road network center directed road section, the directed road section is searched for depth first, the subsequent directed road section with the large traffic ratio is preferentially selected as the next access target when the directed road section is searched for depth first every time, the accessed sequence of the directed road section is the directed road section number, and the directed road section number is started from 0.

Assuming that seg1 is a road network center directed road segment, depth-first searching is started from the road network center directed road segment seg1, when depth-down exploring is performed, for a node seg1, a subsequent directed road segment seg2 with a large traffic ratio is preferably selected as a next access target, then depth-down exploring is started from seg2, no subsequent directed road segment is returned to seg1, then seg3 is the same, finally depth-down exploring is started from seg4, a subsequent directed road segment seg5 with a large traffic ratio of seg4 is preferably selected as a next access target, the accessed sequence of the directed road segments is a directed road segment number until all the directed road segments are depth-first searched, the specific sequence of the directed road segment numbers is [ seg1, seg2, seg3, seg4, seg5, seg7, seg9, seg8, seg6], and the directed road segments are as follows:

seg1.id=0;seg2.id=1;seg3.id=2;seg4.id=3;seg5.id=4;seg7.id=5;

seg9.id=6;seg8.id=7;seg6.id=8。

and renumbering each directed road section according to the directed road section sequence number, changing the integral part of the corresponding track point road network position, and finally obtaining track point data as follows:

p1＝(0.24,0),p2＝(0.43,0),p3＝(0.78,3),p4＝(0.96,0),p5＝(3.12,6),p6＝(3.85,0),p7＝(1.33,3),p8＝(2.32,3),p9＝(8.67,3),p10＝(4.31,9),p11＝(4.90,0),p12＝(7.35,0),p13＝(7.68,3),p14＝(5.43,3),p15＝(6.23,12),p16＝(6.35,0),p17＝(6.65,15);

Step S2, dividing the directed road sections into a plurality of classes according to the length, dividing each class into a plurality of first-order homogeneous region classes according to the number of lanes, dividing all the directed road sections in each first-order homogeneous region class into a plurality of second-order homogeneous region classes according to the length and the number of lanes of the precursor directed road sections, determining the sequence of the second-order homogeneous region classes, and then, in each second-order homogeneous region class, sorting track points according to time, sorting according to the road network positions and determining the sequence numbers of the track points in the second-order homogeneous region classes.

Step 21, counting the total length len=1000 of all the directed road sections, setting n=2 as the classification number of the directed road sections, and averaging the total length Len/n=500 of the directed road sections contained in each class;

step S22, sorting the directed road segments according to ascending order of length, then accumulating the lengths of the directed road segments, if the accumulated length is greater than or equal to Len/n=500, calculating the previously accumulated directed road segments as a class, then re-accumulating from the next directed road segment, and the like, and finally dividing n=2 classes, wherein the specific classification is as follows:

[seg4,seg5,seg2,seg3,seg6,seg7],[seg1,seg8,seg9];

Step S23, in each directed road section class divided in the step S22, the class is divided into 2 classes according to the number of lanes, the number of lanes is less than or equal to 2 and is divided into one class, and the number of lanes is more than or equal to 3 and is divided into the other class. Thus, all the directed road segments can be divided into 2×n=4 first order homogeneous region classes, which are specifically classified as follows:

[seg2,seg3,seg6,seg7],[seg4,seg5],[seg8],[seg1,seg9];

Step S24, in the first-order homogeneous zone class [ seg2, seg3, seg6, seg7], the sum of the flows of the precursor directed road segments of all the directed road segments is counted,

Setting m=2 as the class number of the second-order homogeneous directional road sections, wherein the average total flow of the precursor directional road sections contained in each class is flow ^sum/m=750;

Step S25, sorting the directed road sections according to the ascending order of the flow of the precursor directed road sections, then accumulating the flow of the precursor directed road sections, calculating the previously accumulated directed road sections as a class if the flow of the accumulated precursor directed road sections is more than or equal to flow ^sum/m=750, then re-accumulating the directed road sections from the next directed road section, and dividing m=2 second-order homogeneous region classes [ seg6, seg7, seg2], [ seg3];

step S26, repeating the steps S24-S25, and dividing all the directed road sections in the road network into 2×n×m=8 second order homogeneous region classes, where the second order homogeneous region classes are as follows:

C7=[seg1,seg9],C8=[]。C1=[seg6,seg7,seg2],C2=[seg3],C3=[seg5],C4=[seg4],C5=[seg8],C6=[],

for example, the second-order homogeneous region class C7 has the track points p1, p2, p3, p4, p15, p16 and p17 falling into C7, which are ordered according to time, then ordered according to road network position, and the track points of C7 are ordered as follows (p1,p2,p4,p16,p3,p15,p17)：p1＝(0.24,0,0);p2＝(0.43,0,1);p4＝(0.96,0,2);p16＝(6.35,0,3);p3＝(0.78,3,4);p15＝(6.23,12,5);p17＝(6.65,15,6).

And S3, each second-order homogeneous region class is used for learning a piecewise linear model of a three-dimensional space as a track point storage position predictor, namely learning a track index, inputting track points during training, outputting storage position serial numbers in the second-order homogeneous region class of the track points, and calculating the maximum prediction error of the piecewise linear model.

S31, training a piecewise linear model in each second-order homogeneous region, inputting (x, y) representation (road network position, time), and outputting a region sequence number of z representation track points;

Step S32, setting a maximum error threshold value theta=1.6, connecting the ordered track points, the first point and the last point into a three-dimensional straight line, and obtaining a three-dimensional straight line equation by using a least square method

Wherein P ₀ (0.19,3.43,0) is any point on a three-dimensional straight line,Is a direction vector of a three-dimensional straight line. The above equation is converted to a system of equations:

Then traversing all intermediate points, substituting the above equation set to obtain z ₁ and z ₂, wherein z ₁ and z ₂ are the z coordinate values of the projection points of the three-dimensional straight line on the xz plane and the yz plane respectively, and calculating the predicted value Calculating prediction errorAnd recording the maximum prediction error err ^max ≡1.98 and the maximum error point p15= (6.23,12);

Step S33, because the maximum prediction error err ^max is approximately equal to 1.98>1.6, segmentation is carried out at the maximum error point p15, the error point is added into the right half section with few points, the steps are repeated on the left half section (p 1, p2, p4, p16, p 3) until the maximum prediction error is smaller than θ, segmentation is stopped, and the right half section (p 15, p 17) is not needed to be segmented because only two points are needed, and finally a piecewise linear model is obtained:

repeating the steps for each second order homogeneous region class, and constructing a plurality of models.

And S4, acquiring a road network range and a time range of the original query, performing track query, decomposing the query into a plurality of sub-queries according to the second-order homogeneous region class, calling a corresponding piecewise linear model to predict, acquiring a sub-query result set, screening, combining all the sub-query result sets, and returning to the original query result set.

Step S41, obtaining the road network range [ pos _S,pos_E ] = [0,3] and the time range [ t _S,t_E ] = [0,3] of the original query;

Step S42, decomposing a road network range [ pos _S,pos_E ] = [0,3 ] in the original query into a plurality of sub-road network ranges [ pos _s,pos_e ] according to a second-order homogeneous region class, wherein pos _s is a starting road network position in the second-order homogeneous region class, pos _e is a stopping road network position in the second-order homogeneous region class, and each sub-road network range [ pos _s,pos_e ] and a time range [ t _S,t_E ] form sub-query q (pos _s,t_S,pos_e,t_E);

For example, the second order homogeneous region class c7= [ seg1, seg9] with corresponding sub-network ranges of [0, 1) and [6,7 ], then the sub-network range belonging to C7 in the road network range [0, 3) of the original query is [ pos _s,pos_e ] = [0, 1), and similarly, the other two sub-network ranges [1, 2) and [2, 3) are decomposed, and then the original query can be decomposed into three sub-queries q ₁(0,0,1,3)、q₂ (1,0,2,3) and q ₁ (2,0,3,3), each sub-query q (pos _s,t_S,pos_e,t_E) corresponding to one second order homogeneous region class.

Step S43, for each sub-query q (pos _s,t_S,pos_e,t_E), calling the corresponding piecewise linear model, and respectively inputting (pos _s,t_S) and (pos _e,t_E) to obtain the predicted valueAndThen combining the maximum error threshold value theta of the piecewise linear model in the corresponding second-order similar region class, and obtaining the rangeAnd obtaining a candidate track point set, screening track points in the sub-query range, and obtaining a sub-query result set.

For example, sub-query q ₁ (0,0,1,3), call the corresponding piecewise linear model, and input (0, 0) and (1, 3), respectively, to obtain the predicted valueAndThen combining the maximum error threshold value theta=1.6 of the piecewise linear model in the corresponding second-order homogeneous region class, and in the rangeThat is, a candidate set of track points { (0.24,0), (0.43,0), (0.96,0), (6.35,0), (0.78,3), (6.23,12) } is acquired in the position of [0,5], and track points within the sub-query range (0,0,1,3) are screened to acquire a sub-query result set { (0.24,0), (0.43,0), (0.96,0), (0.78,3) };

Step S44, repeating step S43, obtaining all sub-query result sets, combining all sub-query result sets, and returning the original query result sets { (0.24,0), (0.43,0), (0.78,3), (0.96,0), (1.33,3), (2.32,3)) }.

The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims

1. A learning-based trajectory indexing and query method based on homogeneous region partitioning on a directed road network, characterized by comprising the following steps:

S1. Set a subsequent array for each directed road segment, scan all trajectory data on the road network, and record the traffic volume ratio in the subsequent array; after the scan, start a depth-first search from the directed road segment at the center of the road network, and determine a unique directed road segment number according to the depth-first search order;

S2. Directed road segments are first divided into several classes according to their length. Each class is further divided into several first-order homogeneous regional classes according to the number of lanes. All directed road segments within each first-order homogeneous regional class are divided into several second-order homogeneous regional classes according to the length and number of lanes of their predecessor directed road segments. The order of the second-order homogeneous regional classes is determined. Then, within each second-order homogeneous regional class, the trajectory points are sorted first by time and then by road network location to determine the sequence number of the trajectory points within the second-order homogeneous regional class.

S21. Count the total length of all directional road segments ,set up The number of categories for directional road segments is , and the average total length of directional road segments contained in each category is ;

S22, sort the directed road segments in ascending order by length, and then accumulate the lengths of the directed road segments. If the accumulated length is greater than or equal to , then the previously accumulated directed road segments are counted as one class, and then the next directed road segment is accumulated again, and so on, and finally divided Class;

S23. After the division in step S22, each directional road segment is divided into two categories according to the number of lanes. The number of lanes is less than or equal to the set number and is divided into one category, while the number of lanes is greater than or equal to the set number and is divided into another category. Therefore, all directional road segments are divided into a class of first-order homogeneous regions;

S24. In each first-order homogeneous region class, the sum of the predecessor directional segment flows of all directional segments is counted and recorded as ,set up is the number of second-order homogeneous directed road segment classes, and the average total flow of the predecessor directed road segments contained in each class is ;

S25, sort the directed road sections in ascending order according to the flow of the predecessor directed road sections, and then accumulate the flow of the predecessor directed road sections. If the accumulated flow of the predecessor directed road sections is greater than or equal to , then the previously accumulated directed road segments are counted as one class, and subsequently re-accumulated from the next directed road segment;

S26, repeat the above steps S24-S25, and divide each first-order homogeneous region class into second-order homogeneous region classes, so all directed road segments in the road network are divided into a class of second-order homogeneous regions;

S3. For each second-order homogeneous region class, a piecewise linear model in three-dimensional space is trained as a trajectory point storage location predictor, i.e., a learning trajectory index. The input during training is the trajectory point, and the output is the storage location sequence number of the trajectory point within the second-order homogeneous region class;

S31. In each second-order homogeneous region, train a three-dimensional piecewise linear model and input the trajectory points. ,in, Indicates the road network location, Indicates time, output The sequence number within the region representing the trajectory point;

S32. Set a maximum error threshold , connect the sorted trajectory points, the first point and the last point into a three-dimensional straight line, and use the least squares method to obtain the three-dimensional straight line equation:

,

in, is any point on the three-dimensional line, is the direction vector of the three-dimensional line, and the above equations are transformed into the equation system:

,

Then traverse all the intermediate points and substitute them into the above equations to get and ,in and The three-dimensional straight lines are Plane and Projection point on the plane Coordinate values, calculate predicted values , calculate the prediction error , and record the maximum prediction error and the maximum error point;

S33. If the maximum prediction error is greater than , then segment at the maximum error point, and add the error point to the half with fewer midpoints in the left and right halves, and repeat steps S32 and S33 in the left and right halves until the maximum prediction error is less than Then stop segmentation and finally get the piecewise linear model;

S4. Obtain the road network range and time range of the original query, perform trajectory query, decompose the query into multiple subqueries based on the second-order homogeneous region class, call the corresponding piecewise linear model for prediction, obtain the subquery result set, filter it, combine all the subquery result sets, and return the original query result set.

2. The learning-based trajectory indexing and query method based on homogeneous region partitioning on a directed road network as described in claim 1 is characterized in that: in step S1, the road network is a directed graph consisting of directed road segments and intersections, each directed road segment is characterized by three components: a directed road segment number, a length, and a number of lanes, and the trajectory data includes a trajectory name, a trajectory point sequence, and the trajectory passing through the directed road segment, wherein a trajectory point in the trajectory point sequence is characterized by two components: a road network position and a time, wherein the road network position is a floating point number, the integer part of which is the directed road segment number where the point is located, and the decimal part is the relative position of the point on the directed road segment.

3. The method for learning trajectory indexing and querying based on homogeneous region partitioning on a directed road network as claimed in claim 1, wherein: in step S1, the traffic volume ratio is the ratio of the number of trajectories that pass through the current and subsequent directed road segments simultaneously to the total number of trajectories that pass through the current directed road segment.

4. The learning trajectory indexing and query method based on homogeneous area division on a directed road network as described in claim 1 is characterized in that: in step S1, a depth-first search is started from the directed road section at the center of the road network, and a unique directed road section number is determined according to the order of the depth-first search. Specifically, the depth-first search is started from the directed road section at the center of the road network. Each time the depth is probed downward, the subsequent directed road section with a large traffic volume ratio is preferentially selected as the next access target. After all directed roads have been searched by the depth-first search, the directed road section number starts from 0, and the order in which the directed roads are visited is the directed road section number.

5. The learning-based trajectory indexing and query method based on homogeneous region partitioning on a directed road network as described in claim 4 is characterized in that: the central directed segment of the road network is obtained by adding the midpoint coordinates of all directed segments in the directed road network and calculating the average coordinate, and this average coordinate is regarded as the center of the road network, and the directed segment with the shortest distance from the midpoint of the directed segment to the center of the road network is regarded as the central directed segment of the road network.

6. The learning-based trajectory indexing and query method based on homogeneous region partitioning on a directed road network as claimed in claim 1, characterized in that: in step S24, the predecessor directed road segment flow rate is the sum of the directed road segment flows of all predecessor directed road segments of the current directed road segment, the predecessor directed road segment is a connected directed road segment that directly enters the current directed road segment, and the directed road segment flow rate is the product of the length of the directed road segment and the number of lanes.

7. The learning-based trajectory indexing and query method based on homogeneous region partitioning on a directed road network according to any one of claims 1 to 5, characterized in that: in step S4, the road network range and time range of the original query are obtained, a trajectory query is performed, the query is decomposed into multiple sub-queries based on the second-order homogeneous region class, corresponding piecewise linear models are called for prediction, sub-query result sets are obtained, and after screening, all sub-query result sets are combined to return the original query result set, specifically,

S41. Obtain the road network scope of the original query and time range ,in, is the starting road network location, To terminate the road network location, is the starting time, is the end time;

S42, the road network range in the original query Decompose into multiple sub-network ranges according to the second-order homogeneous area class ,in is the starting road network position within the second-order homogeneous region class, is the end road network location within the second-order homogeneous region class, and the range of each sub-road network is and time range Composing subqueries , each subquery corresponds to a second-order homogeneous region class;

S43. For each subquery , call the corresponding piecewise linear model and input and , get the predicted value and , then in the corresponding second-order homogeneous region class, combined with the maximum error threshold of the piecewise linear model , in the range Obtain the candidate trajectory point set, filter the trajectory points within the sub-query range, and obtain the sub-query result set;

S44. Repeat step S43 to obtain all sub-query result sets, combine all sub-query result sets, and return the original query result set.