cocoa
is a Python library for comparative connectomics analyses.
It implements various dataset-agnostic as well as dataset-specific methods for matching, connectivity, co-clustering and cell typing.
Currently implemented are:
On the TO-DO list:
- female adult nerve cord (FANC)
- brain and nerve cord (BANC)
Feel free to open an Issue or a PR if you want a specific dataset added.
pip3 install git+https://github.com/flyconnectome/cocoa.git -U
All dependencies should be installed automatically. However, to use the pre-define datasets you will need to set a couple environment variables and secrets:
- To use the neuPrint datasets (hemibrain, MANC and maleCNS) you need to set your
API token as
NEUPRINT_APPLICATION_CREDENTIALS
(see neuprint-python) - To use the CAVE/chunkedgraph datasets (FlyWire, FANC) you need to have your CAVE token set (see fafbseg)
- For internal use only: if you want to use the live annotations from flytable
make sure to set the
SEATABLE_SERVER
andSEATABLE_TOKEN
environment variables (see sea-serpent)
The main concept in cocoa
is that of a DataSet
. A DataSet
represents
a collection of neurons from a specific source (e.g. FlyWire or hemibrain),
and provides methods to fetch annotations and connectivity.
While you can use cocoa
to run clusterings on just a single dataset,
its real power lies in co-clustering neurons from multiple datasets. To do
this, it auto-magically computes mappings between neurons from different
datasets based on available labels. These labels are then used to
generate a joint connectivity vector from which we can compute pairwise
distances.
>>> import cocoa as cc
>>> # Define the sets of neurons to co-cluster
>>> hb = cc.Hemibrain(label='hemibrain',
... ).add_neurons(['SLP001', 'SLP003'])
>>> fwl = cc.FlyWire(label='FlyWire_left',
... materialization=783,
... ).add_neurons(['SLP001', 'SLP003'], sides='left')
>>> fwr = cc.FlyWire(label='FlyWire_right',
... materialization=783,
... ).add_neurons(['SLP001', 'SLP003'], sides='right')
>>> # Combine into a clustering and co-cluster
>>> cl = cc.Clustering([hb, fwl, fwr]).compile()
>>> # The clustering `cl` contains the results of the clustering.
>>> # The joint connectivity vector:
>>> cl.vect_
downstream ... upstream
LHAV1b1 LHPV4g1 LHAV5e1 LHAV1b3 ... CL018 CL077 SLP202 LC9
294437347 0 0 1 0 ... 0 0 0 0
543692985 0 0 0 4 ... 0 6 0 1
720575940617091414 0 0 1 0 ... 0 0 0 0
720575940623050334 0 0 0 2 ... 1 1 0 0
720575940627960442 0 0 1 0 ... 0 0 1 0
720575940628895750 1 4 0 3 ... 0 5 0 0
>>> # The pairwise (cosine) distances:
>>> cl.dists_
SLP001_hemibrain ... SLP003_FlyWire_right
294437347 0.000000 ... 0.990616
543692985 0.988929 ... 0.092726
720575940617091414 0.141363 ... 0.994823
720575940623050334 0.993146 ... 0.046200
720575940627960442 0.218134 ... 0.992618
720575940628895750 0.990616 ... 0.000000
>>> # It also provides some useful methods to work with the data
>>> table = cl.to_table(clusters=cl.extract_homogeneous_clusters())
>>> table
id label dataset cn_frac_used dend_ix cluster
0 543692985 SLP003 hemibrain 0.503151 0 0
1 720575940623050334 SLP003 FlyWire_left 0.541004 1 0
2 720575940628895750 SLP003 FlyWire_right 0.545074 2 0
3 294437347 SLP001 hemibrain 0.308048 3 1
4 720575940617091414 SLP001 FlyWire_left 0.375770 4 1
5 720575940627960442 SLP001 FlyWire_right 0.328080 5 1
>>> # See also `cl.plot_clustermap` for a quick visualization
Alternatively, you can also use the generate_clustering
helper function.
That may be enough in cases where you don't need fine-grained control.
>>> cl = cc.generate_clustering(
... fw=['SLP001', 'SLP002'],
... hb=['SLP001', 'SLP002']
... ).compile()
cocoa
does not yet have a dedicated documentation but we provide a number of
examples/ that show how to use the library for various tasks:
0_flywire_hemibrain_FC1-3.ipynb
: demonstrates co-clustering for a small group of neurons, including visualization of the results1_malecns_flywire_mapping.ipynb
: show how to usecocoa
to generate mappings between neurons from different datasets2_malecns_flywire_optic_lobes.ipynb
: demonstrates a large-scale (~160k neurons) co-clustering between two datasets
In addition, all functions/classes have extensive docstrings:
>>> help(cc.Clustering.compile)
cc.Clustering.compile(
self,
join='outer',
metric='cosine',
mapper=<class 'cocoa.mappers.GraphMapper'>,
force_recompile=False,
exclude_labels=None,
include_labels=None,
ignore_unlabeled=True,
cn_frac_threshold=None,
augment=None,
n_batches='auto',
verbose=True,
)
Docstring:
Compile combined connectivity vector and calculate distance matrix.
Parameters
----------
join : "inner" | "outer" | "existing"
How to combine the dataset connectivity vectors:
- "existing" (default) will check if a label exists in
theory and use it even if it's not present in the
connectivity vectors of all datasets
- "inner" will get the intersection of all labels across
the connectivity vectors
- "outer" will use all available labels
Note: if you are using a GraphMapper, you should use "outer"
as the mapper will already have filtered out non-matching
labels.
metric : "cosine" | "Euclidean"
Metric to use for distance calculations.
mapper : cocoa.Mapper | dict
The mapper used to match neuron labels across datasets.
Examples are `cocoa.GraphMapper` and `cocoa.SimpleMapper`.
See the mapper's documentation for more information.
Alternatively, you can also provide a dictionary that maps
IDs to labels.
exclude_labels : str | list of str, optional
If provided will exclude given labels from the observation
vector. This uses regex!
[...]