agcounts R package

R Packages
Author

Brian Helsel

Published

June 14, 2022

Introduction

Neishabouri et al. released a preprint of “Quantification of Acceleration as Activity Counts in ActiGraph Wearables” on February 24, 2022 and the Python code on Github. Like many others, I thought this package had the potential to be useful when analyzing accelerometer data. It makes known the proprietary algorithm that ActiGraph uses to generate counts, but it also allows the easy conversion of raw data to counts from any accelerometer file.

I had started using Python again when I first started my postdoc at The University of Kansas Medical Center (KUMC). SaS and Stata were languages that I was taught during my PhD at Clemson University, but I lost access to the software and was looking for something that was open-source and free. I had colleagues in Bioinformatics who used Python at Clemson and I spent about a year learning some Python basics. After advancing my knowledge of Python during my postdoc at KUMC, I decided to start learning R since I knew it was common among academics. The R programming language was very intuitive after using Python and I was able to pick it up quickly. I also discovered the power of creating functions and packages. Most of the functions and packages I’ve created haven’t been shared. However, I thought translating the agcounts package from Python to R would be a fun and useful project for me to learn more about package development and further advance my knowledge in both programming languages.

Dr. Kimberly Clevenger recently wrote a blog post comparing the initial release of the agcounts R package to counts generated by Actilife. She found that the files were similar with differences mostly due to rounding, but suggests more testing with free-living data. She also modifies the code to work with GT3X, CSV, and binary files and adds her modification to the post for users to see. The second part of Dr. Clevenger’s post inspired one of the recent changes to the agcounts R package. We added the calculate_counts function to create a more flexible programming experience for users that may be using different file types.

Purpose

The purpose of this post is to provide an overview of the calculate_counts and get_counts functions. These are the two exported functions within the agcounts package.

Install agcounts v0.3.0

Before we get started, make sure you have the latest version of the agcounts package installed.

devtools::install_github("bhelsel/agcounts")

We are creating package releases on GitHub, so you can always re-install the first version of the package if it works better for you.

devtools::install_github("bhelsel/agcounts", ref = "v0.1.0")

Load the agcounts package

library(agcounts)

Let’s also add the path name to the example data set that comes with the agcounts package. You can access it by typing the following into your R Script.

pathname <- system.file("extdata/example.gt3x", package = "agcounts")

This only adds the path name to the value pathname in your Global Environment. Now we can either pass the pathname value to the get_counts function or we can read in the data set to R first and then pass the raw data to the calculate_counts function.

get_counts

data <- get_counts(path = pathname, epoch = 60, write.file = FALSE, return.data = TRUE)
Python module "pygt3x" is not found. Switching parser to GGIR.

Loading chunk: 1

 There is not enough data to perform the GGIR autocalibration method. Returning data as read by read.gt3x.
head(data, 5)
                 time Axis1 Axis2 Axis3 Vector.Magnitude
1 2023-06-13 08:34:00  2606  3116  3542             5389
2 2023-06-13 08:35:00  1738  3943  2840             5161
3 2023-06-13 08:36:00  2172  3363  2646             4799

You will notice that the arguments write.file is set to FALSE and return.data is set to TRUE. These are the default values in agcounts v0.3.0. This way the program doesn’t make changes to your computer without you changing the write.file argument to TRUE. You will also notice that the frequency argument is missing. We added automated frequency detection so it is one less argument that the user needs to supply to the get_counts function. This is done through the .get_frequency internal function, which can be viewed by typing View(agcounts:::get_frequency). The one requirement for .get_frequency to autodetect the sample rate is that the data set have a time variable. This is not a problem with the get_counts function because we use read.gt3x::read.gt3x inside of get_counts and that provides the formatting needed. If you were using calculate_counts, then you would want to make sure that the variable names are set to time, X, Y, and Z.

calculate_counts

calculate_counts is called within the get_counts function. However, if you want to use the calculate_counts function then you need to first import the data. We can import the sample data into R using the read.gt3x::read.gt3x function, but data can be imported into R as a CSV, binary, or other file as in Dr. Clevenger’s blog post.

library(read.gt3x)
raw <- read.gt3x(path = pathname, asDataFrame = TRUE, imputeZeroes = TRUE)
head(raw, 5)
Sampling Rate: 100Hz
Firmware Version: 1.7.2
Serial Number Prefix: TAS
                 time      X     Y     Z
1 2023-06-13 08:34:00 -0.020 0.008 1.031
2 2023-06-13 08:34:00  0.000 0.004 1.035
3 2023-06-13 08:34:00  0.008 0.004 1.031
4 2023-06-13 08:34:00  0.008 0.004 1.035
5 2023-06-13 08:34:00  0.008 0.004 1.031

As you can see, the read.gt3x function already returns the data in the format that we need. If the variable names are not the same, you may need to change the names first before using calculate_counts. This can be done within a piping workflow using the stats::setNames function. The columns can be in any order since calculate_counts refers to the columns by name, but you can use dplyr::relocate if you would like to change the order of the columns. Finally, we need start_time and stop_time attributes in our data frame. You can either set the attributes on a separate line of code using attr(raw, "start_time") <- raw$time[1] and attr(raw, "stop_time") <- round(raw$time[nrow(raw)]). However, if you prefer the piping workflow this can also be done using the code below.

library(dplyr)

raw %>%
  stats::setNames(c("time", "X", "Y", "Z")) %>%
  relocate(c("time", "X", "Y", "Z")) %>%
  `attr<-`(., "start_time", .$time[1]) %>%
  `attr<-`(., "stop_time", round(.$time[nrow(.)])) %>%
  calculate_counts(epoch = 60)
                 time Axis1 Axis2 Axis3 Vector.Magnitude
1 2023-06-13 08:34:00  2606  3116  3542             5389
2 2023-06-13 08:35:00  1738  3943  2840             5161
3 2023-06-13 08:36:00  2172  3363  2646             4799

Conclusion

It’s a fairly simple package, but I hope that you will find it useful within your workflow. If you have any problems, feel free to comment below or add an issue on the agcounts GitHub page.