+
Skip to content

String columns for labels are not identified as discrete when dtype is object #6

@kayla-jackson

Description

@kayla-jackson

With the following example, the using bulk_labels from the .obs attribute works fine, because the labels here are correctly identified as categorical.

import scanpy as sc
from concordex.utils._labels import Labels

# Categorical labels
ad = sc.datasets.pbmc68k_reduced()
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)

...but if we update the column so that the dtype is object, the labels are incorrectly described as continuous

# Object labels
ad.obs['bulk_labels'] = ad.obs['bulk_labels'].astype(object)
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)

This will almost certainly be a problem if a pandas reader (e.g. pd.read_csv) is used to read in metadata from a file. I'm wondering if I should do the conversion internally, with warning, or stop with error. I'm guessing that continuous columns with string representations of NULL/NaN will also be read in as object, so internal conversion in this case would be the wrong thing to do here. We could implement some of the R logic here and do a proper "guess" of the column type, but I'd like to avoid checking each item of the column, to confirm that object vs string.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载