2. Defining a parcellation

2.1. Acceptable parcellations

In order to process the microarray expression data from AHBA you’ll need a parcellation (or “atlas”). Here, we define a parcellation (atlas) as either (1) a NIFTI image in MNI space, or (2) a tuple of GIFTI images in fsaverage space (and with fsaverage5 resolution!). In both cases, parcels in the atlas should be denoted by unique integer IDs (distinct across hemispheres). The primary workflows in abagen are designed to readily accept any parcellations / atlases in this format; however, if you want to use a different format please refer to Non-standard parcellations.

For demonstration purposes, abagen has a copy of the Desikan-Killiany atlas that you can use. Here, we load the volumetric atlas by default:

>>> import abagen
>>> atlas = abagen.fetch_desikan_killiany()

The returned object atlas is a dictionary with two keys: image, which is filepath to a NIFTI image containing the atlas data, and info, which is a filepath to a CSV file containing extra information about the parcellation:

>>> print(atlas['image'])  
/.../data/atlas-desikankilliany.nii.gz
>>> print(atlas['info'])  
/.../data/atlas-desikankilliany.csv

You can load the surface version of the atlas by providing the surface parameter:

>>> atlas = abagen.fetch_desikan_killiany(surface=True)
>>> print(atlas['image'])  
('/.../data/atlas-desikankilliany-lh.label.gii.gz', '/.../data/atlas-desikankilliany-rh.label.gii.gz')

2.2. Providing additional parcellation info

While only the image (i.e., NIFTI or GIFTIs) is required for processing the microarray data, the CSV file with information on the parcellation scheme can also be very useful. In particular, abagen can use the CSV file to constrain the matching of tissue samples to anatomical regions in the atlas image.

Note

If you are using a surface atlas and your GIFTI files have valid label tables then abagen will automatically create a pandas.DataFrame with all the relevant information described below. However, you can always provide a separate CSV file if you are unsure and this will override any label tables present in the GIFTI files.

If you want to supply your own CSV file with information about an atlas you must ensure it has (at least) the following columns:

  1. id: an integer ID corresponding to the labels in the atlas image

  2. hemisphere: a left/right/bilateral hemispheric designation (i.e., ‘L’, ‘R’, or ‘B’)

  3. structure: a broad structural class designation (i.e., one of ‘cortex’, ‘subcortex/brainstem’, ‘cerebellum’, ‘white matter’, or ‘other’)

For example, a valid CSV might look like this:

>>> import pandas as pd
>>> atlas_info = pd.read_csv(atlas['info'])
>>> print(atlas_info)
    id                    label hemisphere            structure
0    1                 bankssts          L               cortex
1    2  caudalanteriorcingulate          L               cortex
2    3      caudalmiddlefrontal          L               cortex
..  ..                      ...        ...                  ...
80  81              hippocampus          R  subcortex/brainstem
81  82                 amygdala          R  subcortex/brainstem
82  83                brainstem          B  subcortex/brainstem

[83 rows x 4 columns]

Notice that extra columns (i.e., label) are okay as long as the three required columns are present! If you want to confirm your file is formatted correctly you can use abagen.images.check_atlas():

Note

Do not run abagen.images.check_atlas() below if you are following the tutorial for your first trial of abagen. It will convert the atlas from atlas['image'], atlas['info'] into abagen.AtlasTree object. If you ran it accidentally, use atlas directly in the next steps, instead of atlas['image'], atlas['info']. For more, see the section on Non-standard parcellations.

>>> from abagen import images
>>> atlas = abagen.fetch_desikan_killiany()
>>> # atlas = images.check_atlas(atlas['image'], atlas['info']);

If something is amiss with the file this function will raise an error and try to give some information about what you should check for.

Important

You might be asking: “why should I provide this extra information for my parcellation?” Providing this CSV file will ensure that microarray samples designated as belonging to a given hemisphere/structure by the AHBA ontology are not matched to regions in the atlas image with different hemispheric/structural designations. That is, if the AHBA ontology specifies that a tissue sample comes from the left hemisphere subcortex, it will only ever be matched to regions in atlas belonging to the left hemisphere subcortex.

While this seems trivial, it is very important because there are numerous tissue samples which occur on the boundaries of hemispheres and structural classes (i.e., cortex/subcortex). In many instances, these samples won’t fall directly within a region of the atlas, at which point abagen will attempt to match them to nearby regions. Without the hemisphere/structure information provided by this CSV file there is a high likelihood of misassigning samples, leading to biased or skewed expression data.

2.3. Individualized parcellations

Instead of providing a single parcellation image that will be used for all donors, you can instead provide a parcellation image for each donor in the space of their “raw” (or native) T1w image. abagen ships with versions of the Desikan-Killiany parcellation defined in donor-native space:

>>> atlas = abagen.fetch_desikan_killiany(native=True)
>>> print(atlas['image'].keys())
dict_keys(['9861', '10021', '12876', '14380', '15496', '15697'])
>>> print(atlas['image']['9861'])  
/.../data/native_dk/9861/atlas-desikankilliany.nii.gz

Note here that atlas['image'] is a dictionary, where the keys are donor IDs and the corresponding values are paths to the parcellation for each donor. The primary workflows in abagen that accept a single atlas (i.e., abagen.get_expression_data() and abagen.get_samples_in_mask()) will also accept a dictionary of this format.

We also provide donor-specific surface atlases (derived from the FreeSurfer outputs that can be fetched with abagen.datasets.fetch_freesurfer()). These atlases are also shipped with abagen and can be loaded with:

>>> atlas = abagen.fetch_desikan_killiany(native=True, surface=True)
>>> print(atlas['image'].keys())
dict_keys(['9861', '10021', '12876', '14380', '15496', '15697'])
>>> print(atlas['image']['9861'])  
('/.../9861/atlas-desikankilliany-lh.label.gii.gz', '/.../9861/atlas-desikankilliany-rh.label.gii.gz')

Note that if you are using your own donor-specific surface atlases they must, by default, be based on the geometry of the FreeSurfer surfaces provided with abagen.datasets.fetch_freesurfer(). If you wish to use surface atlases based on different geometry please refer to Non-standard parcellations, below.

Finally, when in doubt we recommend simply using a standard-space, group-level atlas; however, we are actively investigating whether native-space atlases provide any measurable benefits to the abagen workflows.

Note

The donor-native volumetric versions of the DK parcellation shipped with abagen were generated by Arnatkevičiūte et al., 2018, NeuroImage, and are provided under the CC BY 4.0 license. The donor-native surface versions of the DK parcellation were generated by Romero-Garcia et al., 2017, NeuroImage, and are also provided under the CC BY 4.0 license.

2.4. Non-standard parcellations

If you’d like to use a non-standard atlas in the primary abagen workflows that may be possible—with some caveats. That is, the constraining factor here is the coordinates of the tissue samples from the AHBA: they are available in (1) the native space of each donor’s MRI, or (2) MNI152 space, and we strongly encourage you to use one of these options (rather than e.g., attempting to register the coordinates to a new space). If you provide a group-level atlas the toolbox will default to using the MNI152 coordinates; if you provide donor-specific atlases then the tooblox will use the native coordinates. Thus, by default, abagen prefers you use one of the atlas conformations described above.

However, if you have an atlas in a different space or resolution you can (potentially) use it in the primary abagen workflows. To do this you will need to create a abagen.AtlasTree object. All atlases provided are internally coerced to AtlasTree instances, which is then used to assign microarray tissue samples to parcels in the atlas.

Take, for example, a surface atlas in fsaverage6 resolution (by default, surface atlases are assumed to be fsaverage5 resolution). In this case, you simply need to supply the relevant geometry files for the atlas and specify the space of the atlas:

>>> from abagen import images
>>> atlas = ('/.../fsaverage6-lh.label.gii', '/.../fsaverage6-rh.label.gii')
>>> surf = ('/.../fsaverage6-lh.surf.gii', '/.../fsaverage6-lh.surf.gii')
>>> atlas = images.check_atlas(atlas, geometry=surf, space='fsaverage6')

The same procedure can be used for an atlas using fsLR geometry:

>>> from abagen import images
>>> atlas = ('/.../fslr32k-lh.label.gii', '/.../fslr32k-rh.label.gii')
>>> surf = ('/.../fslr32k-lh.surf.gii', '/.../fslr32k-lh.surf.gii')
>>> atlas = images.check_atlas(atlas, geometry=surf, space='fslr')