9. Generating reporting methods

There are a lot of options and parameters to choose from when processing data with abagen.get_expression_data() and while we’ve attempted to select reasonable defaults, we don’t want to limit your options—you are free to pick and choose any combination of inputs to process the AHBA data! That said, we also wanted to make it easy for you to report exactly what was done to the data based on the parameters you choose, so to that end we have added the return_report parameter to abagen.get_expression_data().

When return_report=True, the workflow will return an extra output. That is, in addition to the default regional microarray expression dataframe, a string will be returned that describes, in detail, all the processing that was done to the AHBA in the process of generating the expression matrix. We have tried to write this in such a way that you can simply copy-and-paste the provided text into the methods section of a paper, though you are of course free to edit it as you see fit (though if you feel edits are necessary please let us know and we can modify the generation more permanently!).

9.1. Example report

A report can be generated with:

>>> expression, report = abagen.get_expression_data(atlas['image'], atlas['info'],
...                                                 return_report=True)

Alternatively, you can use the abagen.reporting module to generate a report directly without having to re-run the entire pipeline. (Note that the Report class accepts (nearly) all the same parameters as the get_expression_data() workflow.)

>>> from abagen import reporting
>>> generator = reporting.Report(atlas['image'], atlas['info'])
>>> report = generator.gen_report()

The returned report (with default parameters) will look something like this example:

Regional microarry expression data were obtained from 6 post-mortem brains (1 female, ages 24.0–57.0, 42.50 +/- 13.38) provided by the Allen Human Brain Atlas (AHBA, https://human.brain-map.org; [H2012N]). Data were processed with the abagen toolbox (version X.Y; https://github.com/rmarkello/abagen) using a 83-region volumetric atlas in MNI space.

First, microarray probes were reannotated using data provided by [A2019N]; probes not matched to a valid Entrez ID were discarded. Next, probes were filtered based on their expression intensity relative to background noise [Q2002N], such that probes with intensity less than the background in >=50.00% of samples across donors were discarded. When multiple probes indexed the expression of the same gene, we selected and used the probe with the most consistent pattern of regional variation across donors (i.e., differential stability; [H2015N]), calculated with:

\(\Delta_{S}(p) = \frac{1}{\binom{N}{2}} \, \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} \rho[B_{i}(p), B_{j}(p)]\)

where \(\rho\) is Spearman’s rank correlation of the expression of a single probe, p, across regions in two donor brains \(B_{i}\) and \(B_{j}\), and N is the total number of donors. Here, regions correspond to the structural designations provided in the ontology from the AHBA.

The MNI coordinates of tissue samples were updated to those generated via non-linear registration using the Advanced Normalization Tools (ANTs; https://github.com/chrisfilo/alleninf). Samples were assigned to brain regions in the provided atlas if their MNI coordinates were within 2 mm of a given parcel. To reduce the potential for misassignment, sample-to-region matching was constrained by hemisphere and gross structural divisions (i.e., cortex, subcortex/brainstem, and cerebellum, such that e.g., a sample in the left cortex could only be assigned to an atlas parcel in the left cortex; [A2019N]). All tissue samples not assigned to a brain region in the provided atlas were discarded.

Inter-subject variation was addressed by normalizing tissue sample expression values across genes using a robust sigmoid function [F2013J]:

\(x_{norm} = \frac{1}{1 + \exp(-\frac{(x-\langle x \rangle)} {\text{IQR}_{x}})}\)

where \(\langle x \rangle\) is the median and \(\text{IQR}_{x}\) is the normalized interquartile range of the expression of a single tissue sample across genes. Normalized expression values were then rescaled to the unit interval:

\(x_{scaled} = \frac{x_{norm} - \min(x_{norm})} {\max(x_{norm}) - \min(x_{norm})}\)

Gene expression values were then normalized across tissue samples using an identical procedure. Samples assigned to the same brain region were averaged separately for each donor and then across donors.

REFERENCES
----------
[A2019N]: Arnatkevic̆iūtė, A., Fulcher, B. D., & Fornito, A. (2019). A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage, 189, 353-367.
[F2013J]: Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048.
[H2012N]: Hawrylycz, M. J., Lein, E. S., Guillozet-Bongaarts, A. L., Shen, E. H., Ng, L., Miller, J. A., … & Jones, A. R. (2012). An anatomically comprehensive atlas of the adult human brain transcriptome. Nature, 489(7416), 391-399.
[H2015N]: Hawrylycz, M., Miller, J. A., Menon, V., Feng, D., Dolbeare, T., Guillozet-Bongaarts, A. L., … & Lein, E. (2015). Canonical genetic signatures of the adult human brain. Nature Neuroscience, 18(12), 1832.
[Q2002N]: Quackenbush, J. (2002). Microarray data normalization and transformation. Nature Genetics, 32(4), 496-501.

Note that due to text formatting limitations in Python, relevant equations used for e.g., normalizing the expression data will be provided in LaTeX format (i.e., surrounded by $$ characters and with TeX math commands).

Important

Please note that we explicitly release all text in the abagen.reporting module (used to generate the above-referenced reports) under a CC0 license such that it can be used in manuscripts without modification.