8. Using a binary mask¶
8.1. Basic usage¶
Sometimes, you’re not interested in aggregating microarray expression samples
within regions of an atlas—you want the actual, sample-level data instead. In
this case, we provides the abagen.get_samples_in_mask()
function.
To demonstrate how this works, we’ll first make a brainmask for the left parahippocampal gyrus using the region definition from the Desikan-Killiany atlas:
>>> import nibabel as nib
>>> import pandas as pd
>>> atlas = abagen.fetch_desikan_killiany()
>>> dk = nib.load(atlas['image'])
>>> info = pd.read_csv(atlas['info'])
>>> phg = int(info.query('label == "parahippocampal" & hemisphere == "L"')['id'])
>>> img = dk.__class__(dk.dataobj[:] == phg, dk.affine, dk.header)
We can then use this mask to obtain all the microarray samples that fall within its boundaries:
>>> expression, coords = abagen.get_samples_in_mask(mask=img)
abagen.get_samples_in_mask()
returns two objects: (1) the the samples x
gene expression matrix (exp
), and (2) an array of MNI coordinates for those
samples (coords
). Because this is using abagen.get_expression_data()
under the hood, the returned expression data have been preprocessed (i.e.,
filtered, normalized) according to that workflow. As such, you can provide all
the same parameters and keyword arguments to abagen.get_samples_in_mask()
as you can to abagen.get_expression_data()
(with the exception of
atlas
which is superseded by mask
and region_agg
/agg_metric
which will be ignored). Refer to the API documentation) for
more details!
Since the returned expression dataframe is a samples x gene matrix (rather than regions x gene), the index of the dataframe corresponds to the unique well ID of the relevant sample (rather than the atlas region):
>>> print(expression)
gene_symbol A1BG A1BG-AS1 A2M ... ZYX ZZEF1 ZZZ3
well_id ...
2850 0.654914 0.234039 0.283280 ... 0.020379 0.228080 0.000000
998 0.428705 0.375819 0.457741 ... 0.254195 0.315383 0.502122
990 0.400673 0.409852 0.561666 ... 0.270064 0.397740 0.522261
... ... ... ... ... ... ... ...
159226055 0.418706 0.751837 0.087808 ... 0.651541 0.410095 0.462773
159226117 0.533079 0.773214 0.265615 ... 0.441826 0.389615 0.455249
158158343 0.362038 0.553050 0.314730 ... 0.346605 0.261426 0.337738
[43 rows x 15633 columns]
This allows you to match up the samples with additional data provided by the AHBA (e.g., ontological information) as desired.
8.2. Get ALL the samples¶
If you want all of the available processed samples rather than only those
within a given mask you can call the function without providing an explicit
mask (this is the default when no mask
parameter is passed):
>>> expression, coords = abagen.get_samples_in_mask(mask=None)
This will return all samples (after dropping those where the listed MNI coordinates don’t match the listed hemisphere designation, etc.).