+
Skip to content

Fixing Knex's GROUP BY Parameter Duplication Issue #6263

@mercmobily

Description

@mercmobily

While #2806 is a documentation issue, I do believe we should try and solve the problem properly.

Thing is, when using the same Raw expression in both SELECT and GROUP BY clauses, Knex creates separate parameter bindings, causing PostgreSQL to treat them as different expressions:

const field = knex.raw('??#>>?', ['attributes', ['job']]);
knex('users').select(field).groupBy(field);

// Generates:
// SELECT "attributes"#>>$1 FROM users GROUP BY "attributes"#>>$2
// Parameters: [['job'], ['job']]

PostgreSQL requires exact expression matching between SELECT and GROUP BY, so it throws: column must appear in the GROUP BY clause or be used in an aggregate function.

What can be done?

Possible Solutions

1. Parameter Deduplication System (Most Elegant)

Implement a system that tracks and reuses parameter positions for identical values:

class ParameterManager {
  constructor() {
    this.params = [];
    this.paramMap = new Map(); // Map of value -> position
  }
  
  addParameter(value) {
    const key = JSON.stringify(value); // or a hash
    if (this.paramMap.has(key)) {
      return this.paramMap.get(key); // Return existing position
    }
    this.params.push(value);
    const position = this.params.length;
    this.paramMap.set(key, position);
    return position;
  }
}

Result: SELECT "attributes"#>>$1 FROM users GROUP BY "attributes"#>>$1

Pros:

  • Maintains security with parameterized queries
  • Fixes the issue transparently
  • Reduces redundant parameters (performance benefit)

Cons:

  • Breaking change if anyone relies on parameter order <--- This is the biggest problem
  • Complex handling for objects/arrays
  • Needs careful implementation to avoid hash collisions

2. Raw Instance Tracking (Pragmatic)

Track when the same Raw instance is used multiple times and compile it only once:

class QueryCompiler {
  constructor() {
    this.compiledRaws = new WeakMap();
  }
  
  compileRaw(rawInstance) {
    if (this.compiledRaws.has(rawInstance)) {
      return this.compiledRaws.get(rawInstance);
    }
    const compiled = /* compile the raw */;
    this.compiledRaws.set(rawInstance, compiled);
    return compiled;
  }
}

Pros:

  • Only affects identical Raw instances
  • Backwards compatible
  • Simpler to implement

Cons:

  • Only works if users reuse the same instance
  • Doesn't help if users create separate but identical Raw expressions
  • Limited scope of fix

3. Expression Registry with Automatic Aliases (Most Robust)

Automatically generate and track aliases for complex expressions:

// Knex internally generates:
// SELECT: "attributes"#>>$1 AS __expr_hash_abc123
// GROUP BY: __expr_hash_abc123

Pros:

  • Always works regardless of how expressions are created
  • No parameter duplication issues
  • PostgreSQL handles it correctly

Cons:

  • Changes output column names (unless carefully managed)
  • More complex SQL generation
  • May affect existing code expecting specific column names

4. Dialect-Specific GROUP BY Optimization (PostgreSQL-specific)

For PostgreSQL, detect when GROUP BY contains expressions from SELECT and use positional references or aliases:

// Option A: Positional
SELECT "attributes"#>>$1 AS job FROM users GROUP BY 1

// Option B: Alias reference  
SELECT "attributes"#>>$1 AS job FROM users GROUP BY job

Pros:

  • Solves the PostgreSQL issue directly
  • Clean SQL output
  • No parameter duplication

Cons:

  • PostgreSQL-specific solution
  • Positional GROUP BY can be fragile with query modifications
  • Requires detecting which expressions in GROUP BY match SELECT

5. New API Methods (Explicit Control)

Add new methods that give users explicit control over expression reuse:

// Method 1: groupByAlias - reference a column alias
knex('users')
  .select(knex.raw('??#>>?', ['attributes', ['job']]).as('job'))
  .groupByAlias('job')  // New method!

// Method 2: expression objects that track their usage
const expr = knex.expression('??#>>?', ['attributes', ['job']]);
knex('users')
  .select(expr.as('job'))
  .groupBy(expr)  // Knex knows to reuse the same parameter positions

// Method 3: explicit parameter sharing
knex('users')
  .select(knex.raw('??#>>?', ['attributes', ['job']]).as('job'))
  .groupBy(knex.raw.ref('job'))  // Reference the alias

Pros:

  • Backwards compatible (new opt-in features)
  • Explicit and predictable
  • Can ship in a minor version
  • Gives users immediate workarounds

Cons:

  • Requires users to change their code
  • Doesn't fix existing code automatically
  • Multiple ways to do the same thing (API complexity)

We could combine #1 and #5

Short Term (Immediate Relief)

Implement Solution #5 (New API Methods) first:

  • Ship in next minor version (e.g., 3.2.0)
  • Provides immediate workarounds for affected users
  • Fully backwards compatible
  • Clear migration path

Example implementation:

// New methods available immediately
knex('users')
  .select(knex.raw('??#>>?', ['attributes', ['job']]).as('job_field'))
  .groupByAlias('job_field')  // New!

// Or with expression tracking
const jobExpr = knex.expression('??#>>?', ['attributes', ['job']]);
knex('users')
  .select(jobExpr.as('job'))
  .groupBy(jobExpr)  // Reuses same parameters

Long Term (Proper Fix)

Implement Solution #1 (Parameter Deduplication) with careful rollout:

  1. Version 3.3.0: Add behind feature flag

    knex({ 
      deduplicateParameters: true  // Opt-in
    })
  2. Version 3.4.0: Enable by default with opt-out

    knex({ 
      deduplicateParameters: false  // Opt-out for compatibility
    })
  3. Version 4.0.0: Remove flag, always deduplicate

Why This Combination?

  1. Immediate Relief: Users suffering from this issue get solutions now
  2. Proper Fix: The underlying problem gets solved correctly
  3. Migration Path: Clear upgrade path with feature flags
  4. Backwards Compatibility: No breaking changes until major version
  5. Best Practices: Maintains parameterized queries throughout

Implementation Priority

  1. First: Add groupByAlias() method (easiest, most useful)
  2. Second: Add knex.expression() for trackable expressions
  3. Third: Implement parameter deduplication behind flag
  4. Finally: Make deduplication default in next major version

Alternative for Users (Available Now)

Until these fixes are implemented, users can work around the issue:

// Workaround: Use aliases
knex('users')
  .select(knex.raw('??#>>? as job_field', ['attributes', ['job']]))
  .groupBy('job_field')

// This works because GROUP BY references the alias, not the expression

Conclusion

The parameter duplication issue represents a fundamental tension between:

  • PostgreSQL's strict expression matching requirements
  • Security best practices (parameterized queries)
  • Query builder architecture (independent clause compilation)

The recommended dual approach provides both immediate relief and a long-term solution, maintaining security while fixing the underlying architectural issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载