View on GitHub

ABCD_data_management

A manual for accessing and working with ABCD data

Welcome to the Division of Psychiatry ABCD Data User Manual

Compiled by Niamh MacSweeney (niamh.macsweeney@ed.ac.uk). If you spot any errors or have information to add, please get in touch!

Background

The purpose of this manual is to provide you with the information needed to access and work with the ABCD data. You should only access this data if you have been named on the most recent Data User Certificate (DUC) approved by the NDA. If you have any queries about the DUC, please contact niamh.macsweeney@ed.ac.uk or heather.whalley@ed.ac.uk. The most recent DUC is dated November 2021.

All named recipients on the DUC need to create an NDA acccount with University of Edinburgh listed as their institution. You can create a NDA account at this link

Getting Started

Before you start planning your own ABCD Study, you should check what research using the ABCD data has been published to date. We recommend checking the following databases/lists to get idea of what has been done with the data so far (note, these resources are not comprehensive and may be overlapping but they are a good starting point)

  1. Existing studies of ABCD data found on the ABCD Study website under the Publications tab
  2. NDA “Data from Papers page also has a list of studies that use ABCD data (please see Acknowledgements section for further info on creating an NDA study ID - you must do this prior to submitting any ABCD paper for publication.
  3. Preregistered studies that plan to use ABCD data can be found on platforms like the Open Science Framework

Once you have an idea of existing ABCD research, it is recommended that you consult the following resources before working with the ABCD Data:

  1. ABCD Study Protocols: https://abcdstudy.org/scientists/protocols/

    The ABCD Study has excellent documentation of assessment protocols for every wave. At a high level, data collection protocols outline the following: an annual comprehensive battery of physical health, mental health, substance use, culture and environment, and biospecimens; a bi-annual (every 24 months) MRI scan; and intermediate mid-year phone assessments of youth behavior, substance use, and affect.

    You are also encouraged to consult existing literature that uses ABCD data to see if there are any published guidelines or protocols for your variables of interest. For example, guidelines for working with the imaging (e.g., Hagler et al., 2019 ; Chaarani et al., 2021), cognition (Luciana et al., 2018), puberty measures (Cheng et al., 2021; Herting & Uban, et al., 2021), assessment of culture and environment (Gonzales et al., 2021) and substance use behavior (Lisdahl et al., 2018) have been published to date.

  2. This paper: “ A practical guide for researchers and reviewers using the ABCD Study and other large longitudinal datasets

    This paper is intended as a guide for researchers and reviewers working with ABCD data, highlighting the features of the data (and the strengths and limitations therein) as well as relevant analytical and methodological considerations. It arose out of discussions from the Modelling Developmental Change ABCD Workshop held in July 2021. It is strongly encouraged that you consult this document when designing your ABCD project as the information contained within provides an excellent (and fairly comprehensive) framework to ease you into working with large, multi-modal datasets like ABCD. It also goes into greater details about the resources listed below (e.g., ABCD workshop, data dictionary, DEAP etc.)

  3. Modelling Developmental Change ABCD Workshop Website: https://abcdworkshop.github.io

    There are SO many amazing resources on this website including lectures, videos, hands-on tutorials and code. Niamh MacSweeney attended the 2021 workshop so if you have any questions, please reach out to her.

  4. ABCD Data Dictionary: Found at this link

    The data dictionary is a good resource for checking variable names within specific measures. See “ “Practical guide for researchers” paper for a detailed walk through of the data dictionary.

  5. Data Exploration Analysis Portal (DEAP; deap.nimhda.org)

    Researchers with NDA access (your DEAP login details are the same as your NDA ones) are granted access to DEAP a statistical analysis platform wherein researchers can readily engage with the ABCD data, such as exploring variables, downloading data, or running analyses. This is really handy for quickly checking what variables exist within ABCD, and is a bit easier to navigate than the data dictionary. You can also create personalised data frames with variables from different measures (e.g., age, sex, BDI) by adding the variables of interest “to your cart” and then downloading your dataframe as a .Rdata file. This is explained in more detail in the “Practical guide” paper and is a lifesaver for creating covariate dataframes because it saves you from extracting the variables of interest from lots of different individual data frames (i.e., .Rds or .txt files).

  6. ABCD ReproNim course: https://www.abcd-repronim.org

This is a fairly lengthy course but is broken up into specific topics and focuses on reproducible analysis methods. It is a good way to familiarise yourself with your research area through an “ABCD lens” and also discusses the design and development of the baseline data collection.

How do I access the data?

ABCD data is managed by Niamh MacSweeney (niamh.macsweeney@ed.ac.uk) and Gladi Thng (j.g.thng@sms.ed.ac.uk). The earlier releases of the ABCD data were managed by X.Shen and a huge thanks to Shen for the help with the data management of the current release!

The ABCD Study team have an annual curated data release around November each year. The most recent data release (4.0) was made available in early November 2021. The data managers downloaded this curated data release and converted the .txt files into .Rds files. All of the ABCD files are stored in .Rds format but if you would like the files in .txt format, please get in touch to the data managers.

The ABCD data is stored on the GenScotDepression DataStore (not on the Eddie GenScotDepression). DataStore is a University computer storage cluster and the GenScotDepression is our designated storage space. GenScotDepresison is managed by Dr Mark Adams (Research Fellow in the Division of Pyschiatry). If you do not already have access to the GenScotDepression DataStore, please get in touch with Mark (mark.adams@ed.ac.uk) and kindly ask him to grant you access to this shared storage. We recommend creating your own personal folder within the user folder, e.g., users/your_name and keeping all your files associated with ABCD there.

Please note that the copy of the data on DataStore is read-only to avoid any accidental changes being made to the data. You should be able to read the data into your analysis software of choice from this file.

If anyone is having issues with conflicting working directories in RMarkdown — please get in touch with Niamh (your Rmd file and data need to be in the same working directory which isn’t possible with how the ABCD data is stored in DataStore but there is an easy workaround for this!)

File path for ABCD Data and associated documents

On DataStore, the ABCD data is stored at the following folder path: /GenScotDepression/data/abcd/release4.0/iii.data

We have categorised the data into different folders such as Mental_Health, Physical_Health, MRI_T_roi, MRI_QC etc.

In order to understand the data contained within each folder, you need to:

  1. Consult the ABCD Master Data dictionary file (Master_abcd4_wcat_data_dictionary.csv) which is located in: ../data/abcd/release4.0/iii.data This data dictionary details every variable contained within the data release. You can filter by category (e.g., Mental_Health) to make navigating the spread sheet easier.

    As a starting point, you should get the “short name” of your measure of interest from the ABCD Data Dictionary because the .Rds files in the ../iii.data folder correspond to the “short name” measures listed in the data dictionary. For example, the “short name” for the ABCD Parent Child Behavior Checklist Raw Scores Aseba (CBCL) is “abcd_cbcl01” on the ABCD data dictionary and the associated .Rds file is “abcd_cbcl01.rds”, which can be found in the “../iii.data/Mental_Health folder.

  2. You also NEED to consult the ABCD release 4.0 notes, which can be found at this folder path: /Volumes/GenScotDepression/data/abcd/release4.0/release_notes in order to understand any changes that have been made to your measures of interest since the last data release.

Working with ABCD data

The ABCD Study team strongly encourage transparent and reproducibible research methods. Software like RMarkdown, which allows you to weave together narrative text and code, can be very useful in making your workflow reproducible because you can document each decision made in the data processing and quality control steps.

Folder Structure for GitHub Repositories

It is important that we keep a consistent folder organisation structure when working with the ABCD Data. Please adopt the following system and make a detailed ReadMe file for every directory (and maybe even subdirectory depending on the project).

Folder Set Up There are three main folders:

  1. PREP: Scripts for data cleaning and manipulation
  2. ANALY: Scripts for analysis
  3. FUNCTIONS: Functions for use in scripts

File naming system: TYPE_variable_datarelease.filetype e.g., PREP_pds_R3.0.csv = prep file for pubertal development scale.csv from data release 3.0.

Local Data Dictionaries

Local users are expected to consult the available local data dictionaries that we have made for department to ensure that the quality control workflows are consistent across users. We appreciate that you may need to diverge from the “standard” QC workflows depending on your research question but if this occurs, please document any changes from the standard protocol and share with the group (e.g., via GitHub repo) so that other users can benefit from this knowledge.

At present, the following data dictionaries are available:

  1. Imaging Quality Control Data Dictionary” — this script contains the code needed to QC the structural imaging (T1w and DTI) data from Release 4.0 This script was made in line with the release notes from release 4.0 and should be adopted by all local users.

  2. Depression Measures Data Dictionary

  3. TBC

While we work towards having a data dictionary for a number of variables commonly used by the department, we encourage you to consult existing GitHub repositories made by department members (see list below).

ABCD Research Topics and Areas of Expertise

Ideally, we would like to create resource whereby group members can list the kinds of ABCD data they have worked with. This could then serve as a directory that people could consult when starting projects and will aid sharing code and expertise within the group.

Acknowledgements

When preparing your ABCD manuscript for publication, please make sure you consult and follow the information outlined on the NDA website, which can be found at this link

Special thanks for X. Shen, Mark Adams, and Gladi Thng for their assistance with data management.