+
Skip to content

gt1/prs

Repository files navigation

prs

prs contains code for parallel byte access based radix sorting. The files prs.ic and prs.ih contain generic code which can be configured by defining the preprocessor definitions PARALLEL_INTERLEAVED_RADIX_SORT_VALUE_TYPE and PARALLEL_INTERLEAVED_RADIX_SORT_FUNCTIONNAME. The headers prs_uint.h and prs_uint64_t.h contain two examples for doing so. The file prs_uint.h for instance sets

#define PARALLEL_INTERLEAVED_RADIX_SORT_VALUE_TYPE unsigned int
#define PARALLEL_INTERLEAVED_RADIX_SORT_FUNCTIONNAME radixsort_unsigned_int

This generates a function named radixsort_unsigned_int with the signature

int radixsort_unsigned_int(
	unsigned int * ita,
	unsigned int * tita,
	size_t const n,
	unsigned int const tnumthreads,
	unsigned int const rounds,
	unsigned int const * keybytes,
	int const interleave
);

The function implements radix sort using byte key access. The input sequence of length n is expected in [ita,ita+n). The range [tita,tita+n) is used as temporary space. The sorting will use up to tnumthreads threads (tnumthreads whenever n is at least tnumthreads). The routine performs rounds iterations of bucket sorting where the k'th round extracts byte keybytes[k] from the objects to be sorted. The sorted sequence is stored in [ita,ite) if rounds is even and [tita,tite) if rounds is odd. The function will compute the histogram for the next round inside the current round when interleave is set to one, otherwise the byte histogram for a round is computed in a separate scan. As setting interleave to 1 uses one division operation per element this is often slower than performing another linear scan, although it reduces the amount of memory accesses. Interleave mode (interleave=1) however can be faster than non interleaved mode as soon as the records moved are somewhat larger than the keys scanned. The function returns 0 on success and a non-zero value on failure (failed to allocate memory or to create mutexes/conditions variables/threads).

The benchmark program measures the speed up obtained by using a number of threads for sorting the sequence n-1,n-2,...,1,0 stored in 64 bit (type uint64_t) numbers.

The benchmark_q program compares the speed of the radix sort with the qsort function on a random sequence of unsigned int type numbers.

About

Parallel radix sorting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载