-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi! First, thank you for this wonderful library! Being able to crunch data in Lisp makes my life much easier :) I have since written some optimizations to improve the performance, which is particularly helpful for handling relatively large (~1GB) data sets. I'm now looking to extract and contribute these patches, after sorting out some questions about portability/extensibility... Sorry for the noise, I hope this is not the wrong place to discuss!
What I have includes:
- Call
duckdb-api:duckdb-row-count
intranslate-result
and allocate vectors with the right size at the beginning, eliminatevector-push-extend
to avoid repeated realloc in the loop. - Use specialized CL array (e.g.
(array double-float)
instead of(array t)
) when possible, and use C functionmemcpy
to do the copying instead of Lisploop
.
1 is free speed boost -- I think it's completely transparent and does not change any observable behavior!
2 gives massive improvement both immediately when loading the arrays, and when later operating on these arrays, for e.g. double-float
's (it avoids consing billions of boxed floats). However its portability/extensibility requires some discussion: different CL implementations might have different specialized array type, and C memcpy
only works if CL array and C (duckdb) array has the same element bit patterns. The code I use right now only considers SBCL/x86-64. What I have in mind is: introduce generic functions allocate-result-array
, translate-from-chunk
and translate-to-chunk
, then we can specialize them on duckdb types and use optimized version depending on some read-time conditionals. I can only fill in specializations for #+(and sbcl x86-64)
because of what I use, but users of other platform can fill in the rest.
How does that sound? Are such optimization patches welcomed in general?