+
Skip to content

[Bug] Pull Request reviewer duplicates #39

@fivetran-sheringuyen

Description

@fivetran-sheringuyen

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

Currently the reviewers column in github__pull_requests does not account for duplicate/removed reviewers. E.g. it just joins all reviewers based on the the string aggregated users from the stg_github__pull_request_review model.

This aggregation seems to be inherited upstream through the int_github__pull_request_reviewers model and therefore, this model should be updated to be like so instead. I don't actually think in the example branch that we even need to leverage pull_request_review anymore but should just be able to just use pull_request_reviewer_history .

Relevant error log or model output

This is the sample of the data as it currently stands:

 split_reviewers as (

  select 
    reviewers,
    split_reviewers,
    count(distinct issue_id)
  FROM `dbt-package-testing.fivetran_transformations_github.github__pull_requests` ,
  unnest(split(reviewers, ', ')) split_reviewers
  group by 1,2
  having count(distinct issue_id) > 1
)

select * from split_reviewers

Some rows literally look like this:
```fivetran-joemarkiewicz, fivetran-joemarkiewicz, fivetran-joemarkiewicz```

Presumably because the `pull_request_review` seems like an event based table so aggregating all the users there will yield duplicates.

Expected behavior

The aggregated reviewers column should be de-duplicated with only one instance per unique reviewer.

So instead of:
fivetran-joemarkiewicz, fivetran-joemarkiewicz, fivetran-joemarkiewicz

It would be
fivetran-joemarkiewicz

dbt Project configurations

n/a

Package versions

n/a

What database are you using dbt with?

bigquery

dbt Version

1.3

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:p3Affects many users; can waitstatus:in_reviewCurrently in reviewtype:bugSomething is broken or incorrectupdate_type:modelsPrimary focus requires model updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载