-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
While #2806 is a documentation issue, I do believe we should try and solve the problem properly.
Thing is, when using the same Raw expression in both SELECT and GROUP BY clauses, Knex creates separate parameter bindings, causing PostgreSQL to treat them as different expressions:
const field = knex.raw('??#>>?', ['attributes', ['job']]);
knex('users').select(field).groupBy(field);
// Generates:
// SELECT "attributes"#>>$1 FROM users GROUP BY "attributes"#>>$2
// Parameters: [['job'], ['job']]
PostgreSQL requires exact expression matching between SELECT and GROUP BY, so it throws: column must appear in the GROUP BY clause or be used in an aggregate function
.
What can be done?
Possible Solutions
1. Parameter Deduplication System (Most Elegant)
Implement a system that tracks and reuses parameter positions for identical values:
class ParameterManager {
constructor() {
this.params = [];
this.paramMap = new Map(); // Map of value -> position
}
addParameter(value) {
const key = JSON.stringify(value); // or a hash
if (this.paramMap.has(key)) {
return this.paramMap.get(key); // Return existing position
}
this.params.push(value);
const position = this.params.length;
this.paramMap.set(key, position);
return position;
}
}
Result: SELECT "attributes"#>>$1 FROM users GROUP BY "attributes"#>>$1
Pros:
- Maintains security with parameterized queries
- Fixes the issue transparently
- Reduces redundant parameters (performance benefit)
Cons:
- Breaking change if anyone relies on parameter order <--- This is the biggest problem
- Complex handling for objects/arrays
- Needs careful implementation to avoid hash collisions
2. Raw Instance Tracking (Pragmatic)
Track when the same Raw instance is used multiple times and compile it only once:
class QueryCompiler {
constructor() {
this.compiledRaws = new WeakMap();
}
compileRaw(rawInstance) {
if (this.compiledRaws.has(rawInstance)) {
return this.compiledRaws.get(rawInstance);
}
const compiled = /* compile the raw */;
this.compiledRaws.set(rawInstance, compiled);
return compiled;
}
}
Pros:
- Only affects identical Raw instances
- Backwards compatible
- Simpler to implement
Cons:
- Only works if users reuse the same instance
- Doesn't help if users create separate but identical Raw expressions
- Limited scope of fix
3. Expression Registry with Automatic Aliases (Most Robust)
Automatically generate and track aliases for complex expressions:
// Knex internally generates:
// SELECT: "attributes"#>>$1 AS __expr_hash_abc123
// GROUP BY: __expr_hash_abc123
Pros:
- Always works regardless of how expressions are created
- No parameter duplication issues
- PostgreSQL handles it correctly
Cons:
- Changes output column names (unless carefully managed)
- More complex SQL generation
- May affect existing code expecting specific column names
4. Dialect-Specific GROUP BY Optimization (PostgreSQL-specific)
For PostgreSQL, detect when GROUP BY contains expressions from SELECT and use positional references or aliases:
// Option A: Positional
SELECT "attributes"#>>$1 AS job FROM users GROUP BY 1
// Option B: Alias reference
SELECT "attributes"#>>$1 AS job FROM users GROUP BY job
Pros:
- Solves the PostgreSQL issue directly
- Clean SQL output
- No parameter duplication
Cons:
- PostgreSQL-specific solution
- Positional GROUP BY can be fragile with query modifications
- Requires detecting which expressions in GROUP BY match SELECT
5. New API Methods (Explicit Control)
Add new methods that give users explicit control over expression reuse:
// Method 1: groupByAlias - reference a column alias
knex('users')
.select(knex.raw('??#>>?', ['attributes', ['job']]).as('job'))
.groupByAlias('job') // New method!
// Method 2: expression objects that track their usage
const expr = knex.expression('??#>>?', ['attributes', ['job']]);
knex('users')
.select(expr.as('job'))
.groupBy(expr) // Knex knows to reuse the same parameter positions
// Method 3: explicit parameter sharing
knex('users')
.select(knex.raw('??#>>?', ['attributes', ['job']]).as('job'))
.groupBy(knex.raw.ref('job')) // Reference the alias
Pros:
- Backwards compatible (new opt-in features)
- Explicit and predictable
- Can ship in a minor version
- Gives users immediate workarounds
Cons:
- Requires users to change their code
- Doesn't fix existing code automatically
- Multiple ways to do the same thing (API complexity)
Short Term (Immediate Relief)
Implement Solution #5 (New API Methods) first:
- Ship in next minor version (e.g., 3.2.0)
- Provides immediate workarounds for affected users
- Fully backwards compatible
- Clear migration path
Example implementation:
// New methods available immediately
knex('users')
.select(knex.raw('??#>>?', ['attributes', ['job']]).as('job_field'))
.groupByAlias('job_field') // New!
// Or with expression tracking
const jobExpr = knex.expression('??#>>?', ['attributes', ['job']]);
knex('users')
.select(jobExpr.as('job'))
.groupBy(jobExpr) // Reuses same parameters
Long Term (Proper Fix)
Implement Solution #1 (Parameter Deduplication) with careful rollout:
-
Version 3.3.0: Add behind feature flag
knex({ deduplicateParameters: true // Opt-in })
-
Version 3.4.0: Enable by default with opt-out
knex({ deduplicateParameters: false // Opt-out for compatibility })
-
Version 4.0.0: Remove flag, always deduplicate
Why This Combination?
- Immediate Relief: Users suffering from this issue get solutions now
- Proper Fix: The underlying problem gets solved correctly
- Migration Path: Clear upgrade path with feature flags
- Backwards Compatibility: No breaking changes until major version
- Best Practices: Maintains parameterized queries throughout
Implementation Priority
- First: Add
groupByAlias()
method (easiest, most useful) - Second: Add
knex.expression()
for trackable expressions - Third: Implement parameter deduplication behind flag
- Finally: Make deduplication default in next major version
Alternative for Users (Available Now)
Until these fixes are implemented, users can work around the issue:
// Workaround: Use aliases
knex('users')
.select(knex.raw('??#>>? as job_field', ['attributes', ['job']]))
.groupBy('job_field')
// This works because GROUP BY references the alias, not the expression
Conclusion
The parameter duplication issue represents a fundamental tension between:
- PostgreSQL's strict expression matching requirements
- Security best practices (parameterized queries)
- Query builder architecture (independent clause compilation)
The recommended dual approach provides both immediate relief and a long-term solution, maintaining security while fixing the underlying architectural issue.