+
Skip to content

Failure of c_to_p for large duplication #802

@behrj

Description

@behrj

Describe the bug
The variant mapper is failing with an index error for a large duplication.

To Reproduce

import hgvs.dataproviders.uta
import hgvs.parser
import hgvs.variantmapper


def reproduce_bug():
    """Reproduce the IndexError with a large insertion in PTEN."""
    # Connect to public UTA database (requires internet)
    print("Connecting to UTA database...")
    hdp = hgvs.dataproviders.uta.connect()
    # Create parser and mapper
    parser = hgvs.parser.Parser()
    mapper = hgvs.variantmapper.VariantMapper(hdp)
    # cDNA variant: large 184bp insertion in PTEN transcript NM_000314.8
    # This insertion is at position c.1086_1087, which is near the end of the coding sequence
    cdna_hgvs = (     "NM_000314.8:c.1086_1087insACTTCTGTAACACCAGATGTTAGTGACAATGAACCTGATCATTATAGATATTCTGACACCACTGACTCTGATCCAGAGAATGAACCTTTTGATGAAGATCAGCATACACAAATTACAAAAGTCTGAA"
    )
    print(f"\nParsing cDNA variant: {cdna_hgvs[:80]}...")
    var_c = parser.parse_hgvs_variant(cdna_hgvs)
    print(f"Parsed successfully: {var_c}")
    print("\nConverting to protein coordinates (this will fail)...")
    try:
        var_p = mapper.c_to_p(var_c)
        print(f"Protein variant: {var_p}")
    except IndexError as e:
        print(f"\n✓ Successfully reproduced IndexError!")
        print(f"Error message: {e}")
        raise

reproduce_bug()

Expected behavior
If there is a good reason to fail for this variant, then it should fail more gracefully. But ideally it should produce something like this:
"p.(Thr363_Ter404dup)"

Which is the result for a one base shorter insertion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载