-
Notifications
You must be signed in to change notification settings - Fork 1
Description
With the following example, the using bulk_labels
from the .obs
attribute works fine, because the labels here are correctly identified as categorical.
import scanpy as sc
from concordex.utils._labels import Labels
# Categorical labels
ad = sc.datasets.pbmc68k_reduced()
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)
...but if we update the column so that the dtype is object, the labels are incorrectly described as continuous
# Object labels
ad.obs['bulk_labels'] = ad.obs['bulk_labels'].astype(object)
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)
This will almost certainly be a problem if a pandas reader (e.g. pd.read_csv
) is used to read in metadata from a file. I'm wondering if I should do the conversion internally, with warning, or stop with error. I'm guessing that continuous columns with string representations of NULL/NaN will also be read in as object, so internal conversion in this case would be the wrong thing to do here. We could implement some of the R logic here and do a proper "guess" of the column type, but I'd like to avoid checking each item of the column, to confirm that object vs string.