+
Skip to content

Conversation

kchanqvq
Copy link
Collaborator

Hi! This is a draft of my take at #67. It takes 1.56s for the (progn (setq *test* (time (ddb:q "FROM read_csv('~/Downloads/8ysW.csv')"))) nil) benchmark on SBCL! With the memcpy methods disabled, it runs in 5s.

This is currently just a proof of concept, and I'd like to discuss the interface:

  1. Should there be a runtime option to enable/disable specialized array? Or a static option in *features* instead? It pass all test cases in its current form (i.e. enable specialized array by default) anyways, so I guess not much code really rely on unspecialized array?
  2. For *sql-null-return-value*, should there be some option to prevent falling back to unspecialized array? (e.g. fill in -1 for uint, and fill in NaN for floats)?

Besides, I don't use other CL implementation but I expect the unboxed translators to work on most CL implementation. Feel free to test it and add to the #+sbcl block!

@kchanqvq
Copy link
Collaborator Author

There's also a stylist improvement I'd like to make while working at the code -- I think using a combination of keywords and structs (e.g. (defstruct duckdb-decimal-type ...)) to represent logical types will make the code cleaner, comparing to the current (vector-type internal-type aux) convention. Are you happy with this proposal? If so, I can refactor this in the next version :) (This will also make my generic functions nicer)

@kchanqvq kchanqvq force-pushed the main branch 2 times, most recently from 9da271b to 7bfbec1 Compare May 10, 2025 18:14
@kchanqvq
Copy link
Collaborator Author

kchanqvq commented May 10, 2025

It turns out fully portable CL can achieve similar performance! I pushed a new commit with define-specialized-translate-vector using portable CL, and it runs the benchmark with 1.68s. The key here is to move the type dispatch outside the hot loop (made possible by the generic function interface). Note that cffi:mem-ref has an optimization compiler macro, which only takes effect if the FFI type is a compile time constant.

In fact I think maybe we can rewrite everything (that involves translate-value-case) using this approach, so that translation is speed up for every type? This can bring speed up even if we're writing to an unspecialized array. Because this approach gets so close to memcpy performance with portable CL, I'm tempting to even remove the memcpy methods for now to simplify things.

@kchanqvq kchanqvq marked this pull request as draft May 10, 2025 19:39
@ak-coram
Copy link
Owner

Looks like min-extension is more than just a suggestion for a minimum! :)

I'm inclined to specify it only when we're definitely on the last chunk, i.e. (< chunk-size vector-size). All of this is very promising, but there seem to be some issues with ECL? I'll take a closer look tomorrow.

@kchanqvq
Copy link
Collaborator Author

kchanqvq commented May 10, 2025

I'm inclined to specify it only when we're definitely on the last chunk, i.e. (< chunk-size vector-size).

In this branch, I collect the results of translate-vector into a list, and aggregate-vectors in the end, so we don't need vector-push-extend.

All of this is very promising, but there seem to be some issues with ECL? I'll take a closer look tomorrow.

Thanks! And take your time :) I'll try to figure out the offending part as well.

Update: fixed!

Update 2: I removed memcpy related code, because it seems we can get similar performance with fully portable CL. Now there's no platform-specific code! I'll work on the logical type refactors I mentioned above.

@kchanqvq kchanqvq force-pushed the main branch 2 times, most recently from bb299ab to 02db3ee Compare May 10, 2025 21:00
@ak-coram ak-coram marked this pull request as ready for review May 11, 2025 07:26
@ak-coram
Copy link
Owner

@kchanqvq

Congrats on being the first contributor to this library! It proves at least some people can decipher my CL code ;)

I've now tried your changes and the performance increase is baffling. I'm embarrassed how inefficient the previous version was, even if it was still pretty fast :)

The only thing I've noticed is that (define-specialized-translate-vector :duckdb-boolean boolean) doesn't seem to have an effect (at least on SBCL):

(type-of (ddb:get-result (duckdb:q "SELECT true::boolean AS a") 'a))
(SIMPLE-VECTOR 1)

A bit vector would probably be an efficient representation, but it would break compatibility as it is using 0 and 1 instead of NIL and T.

I'm merging this now since it doesn't break anything, we can sort out booleans later if needed. Great work overall!

@ak-coram ak-coram merged commit d3935bc into ak-coram:main May 11, 2025
61 of 64 checks passed
@kchanqvq
Copy link
Collaborator Author

Congrats on being the first contributor to this library!

Thanks! I'm happy it helps :)

The only thing I've noticed is that (define-specialized-translate-vector :duckdb-boolean boolean) doesn't seem to have an effect (at least on SBCL):

Indeed, and that's intended. The CL standard doesn't define what kind of specialized array-element-type is supported, so whether this family of specialized method return specialized array is entirely implemention-dependent. However, they do still take effect as a performance optimizations -- they are able to move type dispatch out of the hot loop. I plan to do this for all other cases as well, built on #74 branch.

Do you have any thoughts on the following?

  • For *sql-null-return-value*, should there be some option to prevent falling back to unspecialized array? (e.g. fill in -1 for uint, and fill in NaN for floats)?

@ak-coram
Copy link
Owner

Sorry I've missed your question.

For sql-null-return-value, should there be some option to prevent falling back to unspecialized array? (e.g. fill in -1 for uint, and fill in NaN for floats)?

I think users can fix that in their query if they want, pretty easy to do via COALESCE for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载