Related
I wrote the following code in R
library(fda)
n_curves <- 15951
n_points <- 2537
argvals <- matrix(df_l$Time, nrow = n_points, ncol = n_curves)
y_mat <- matrix(df_l$Curve, nrow = n_points, ncol = n_curves)
W.obj <- Data2fd(argvals = argvals, y = y_mat, basisobj = basis, lambda = 0.5)
But I'm getting an error
Error in if ((a01[1] <= arng[1]) && (arng[2] <= a01[2])) { :
missing value where TRUE/FALSE needed
What does it mean, and how do I prevent it?
I'm using a repeated measures data, and I`m trying to do functional data analysis.My data has a lot of missing values(NA). I'm thinking that NA is probably the cause of something.
data:
> dput(head(df_l, 30))
structure(list(Time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30), Curve = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5, 10, 10, 10, 10, 8, 8, 8, 8,
8, 8)), row.names = c(NA, 30L), class = "data.frame")
> dput(head(basis, 5))
list(call = basisfd(type = type, rangeval = rangeval, nbasis = nbasis,
params = params, dropind = dropind, quadvals = quadvals,
values = values, basisvalues = basisvalues), type = "bspline",
rangeval = c(0, 2537), nbasis = 53, params = c(50.74, 101.48,
152.22, 202.96, 253.7, 304.44, 355.18, 405.92, 456.66, 507.4,
558.14, 608.88, 659.62, 710.36, 761.1, 811.84, 862.58, 913.32,
964.06, 1014.8, 1065.54, 1116.28, 1167.02, 1217.76, 1268.5,
1319.24, 1369.98, 1420.72, 1471.46, 1522.2, 1572.94, 1623.68,
1674.42, 1725.16, 1775.9, 1826.64, 1877.38, 1928.12, 1978.86,
2029.6, 2080.34, 2131.08, 2181.82, 2232.56, 2283.3, 2334.04,
2384.78, 2435.52, 2486.26))
I have longitudinal data for which I would like to reverse score a subset of items using corresponding predefined maximum scores that are stored in a seperate data frame.
In the below example data (df) there are three scores, DST, SOS, and VR at two timepoints (baseline and wave 1). neg_skew.vars contains the scores that are to be reverse across timepoints. I would like to reverse scores based on the maximum possible value for that score, as stored in df.CP1.vars$max.vars. I'd like this to work when multiple scores with different maximum values are included in neg_skew.vars.
For example, in the example below "SOS.score" is stored in neg_skew.vars. Therefore I want all SOS.Score variables to be reversed (i.e., across timepoints); this would include 'SOS.Score.baseline' and 'SOS.Score.wave1' in the example data below. I want scores to be reversed using the corresponding maximum score for SOS. For each SOS variable, I want each value to be reversed like this: (20 + 1) - value. The 20 corresponds to the maximum value for SOS stored in df.CP1.vars. As DST is also negatively skewed, all DST scores (i.e., 'DST.Score.baseline' and 'DST.Score.wave1') should be reveresed, but with 16 as the maximum value, per df.CP1.vars, so: (16 + 1) - value. This results in the desired data frame df_wanted below. VR.Score does not appear in neg_skew.vars and so no VR.Score variables are reversed (i.e., VR.Score.baseline and VR.Score.wave1).
So far I have the code listed below under # reverse scores however this produces two undesired outcomes in the resulting data frame (i.e., df2). These are A) the columns for other scores, such as DST, are not retained, and B) the maximum value used to reverse items is the maximum value for that item/at that timepoint; this is a problem as the data is longitudinal.
The desired data should look like df_wanted. I tried to set up a for-loop but ran into problems with using the dplyr pipeline.
# required packages
library(dplyr)
# create relevant variables and data sets
CP1.vars <- c("DST.Score","SOS.Score", "VR.Score")
max.vars <- c(16,20,80)
df.CP1.vars <- data.frame(CP1.vars, max.vars)
df <- structure(list(
SOS.Score.baseline = c(4, 11, 7, 9, 10, 8, 6, 8, 7, 0, 9, 10),
SOS.Score.wave1 = c(NA, 7.5, 8.5, NA, NA, 6.66, NA, 6, 8, 8, 7, 8),
DST.Score.baseline = c(11, 10, 8, 8, 8, 8, 9, 9, 7, 6, 7, 6),
DST.Score.wave1 = c(NA, 10, 8.5, NA, NA, 8, NA, 9.33, 9, 7, 8, 8),
VR.Score.baseline = c(NA, 60, 38.5, 50, NA, 48, NA, 33, 49, 67, 78, 80),
VR.Score.wave1 = c(NA, 58, 38.5, NA, NA, 40, NA, 35, 49, 67, 78, 78)),
row.names = c(NA, 12L), class = "data.frame")
neg_skew.vars <- c("SOS.Score", "DST.Score")
# reverse scores
df2 <- df %>%
select(contains(neg_skew.vars)) %>%
mutate(across(everything(), ~ max(., na.rm = TRUE) + 1 - . , .names = "{.col}_r"))
# desired outcome (order of variables irrelevant)
df_wanted <- structure(list(
SOS.Score.baseline = c(4, 11, 7, 9, 10, 8, 6, 8, 7, 0, 9, 10),
SOS.Score.wave1 = c(NA, 7.5, 8.5, NA, NA, 6.66, NA, 6, 8, 8, 7, 8),
SOS.Score.baseline_r = c(17, 10, 14, 12, 11, 13, 15, 13, 14, 21, 12, 11),
SOS.Score.wave1_r = c(NA, 13.5, 12.5, NA, NA, 14.34, NA, 15, 13, 13, 14, 13),
DST.Score.baseline = c(11, 10, 8, 8, 8, 8, 9, 9, 7, 6, 7, 6),
DST.Score.wave1 = c(NA, 10, 8.5, NA, NA, 8, NA, 9.33, 9, 7, 8, 8),
DST.Score.baseline_r = c(6, 7, 9, 9, 9, 9, 8, 8, 10, 11, 10, 11),
DST.Score.wave1_r = c(NA, 7, 8.5, NA, NA, 9, NA, 7.67, 8, 10, 9, 9),
VR.Score.baseline = c(NA, 60, 38.5, 50, NA, 48, NA, 33, 49, 67, 78, 80),
VR.Score.wave1 = c(NA, 58, 38.5, NA, NA, 40, NA, 35, 49, 67, 78, 78)),
row.names = c(NA,12L), class = "data.frame")
You can use purrr::map_dfc to loop over the neg_skew.vars and get the value directly from df.CP1.vars, and then bind the resulting dataframe with columns that remained unchanged.
library(tidyverse)
library(purrr)
df2 <- neg_skew.vars %>%
map_dfc(function(a) df %>%
select(matches(a)) %>%
mutate(across(everything(), ~ df.CP1.vars$max.vars[df.CP1.vars$CP1.vars == a] + 1 - .,
.names = "{.col}_r"))) %>%
bind_cols(df %>%
select(!contains(neg_skew.vars)))
This indeed leads to the desired outcome:
identical(df2, df_wanted)
#[1] TRUE
Data:
# create relevant variables and data sets
CP1.vars <- c("DST.Score","SOS.Score", "VR.Score")
max.vars <- c(16,20,80)
df.CP1.vars <- data.frame(CP1.vars, max.vars)
df <- structure(list(
SOS.Score.baseline = c(4, 11, 7, 9, 10, 8, 6, 8, 7, 0, 9, 10),
SOS.Score.wave1 = c(NA, 7.5, 8.5, NA, NA, 6.66, NA, 6, 8, 8, 7, 8),
DST.Score.baseline = c(11, 10, 8, 8, 8, 8, 9, 9, 7, 6, 7, 6),
DST.Score.wave1 = c(NA, 10, 8.5, NA, NA, 8, NA, 9.33, 9, 7, 8, 8),
VR.Score.baseline = c(NA, 60, 38.5, 50, NA, 48, NA, 33, 49, 67, 78, 80),
VR.Score.wave1 = c(NA, 58, 38.5, NA, NA, 40, NA, 35, 49, 67, 78, 78)),
row.names = c(NA, 12L), class = "data.frame")
neg_skew.vars <- c("SOS.Score", "DST.Score")
I recently started exploring DT and I am stuck on something. Imagine the following table:
dt <- data.table(group = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
group2 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
interval = c(NA, NA, 100, NA, NA, 150, NA, NA, 100),
value1 = c(1000, 10, 90, 2000, 30, 120, 1500, 25, 150),
value2 = c(1200, 10, 110, 2500, 35, 145, 2200, 40, 90))
Now I want to create a DT with a Style that checks the value in value1 and value2 and compares it with the value in Interval. I tried something like this:
datatable(dt) %>% formatStyle(
columns = c("value1", "value2"),
backgroundColor = styleInterval(interval, c("red", "green"))
)
But interval is not recognized as an object. This leads me to believe that I cannot pass a column in the cut parameter. I also tried to pass some kind of function in the valueColumns but this didn't seem to be possible either.
Expected output:
It makes no sense to pass the full column to styleInterval. It requires n values for cut and n+1 for values. Try alternative below instead:
myCut <- sort(unique(dt$interval))
myCol <- rainbow(length(myCut) + 1)
formatStyle(datatable(dt),
columns = c("value1", "value2"),
backgroundColor = styleInterval(myCut, myCol))
To continue on a previous topic:
Finding non-missing values between missing values
I would like to also find whether the value before the missing value is smaller, equal to or larger than the one after the missing.
To use the same example from before:
df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2,
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA,
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA,
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA,
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA,
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA,
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA,
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -20L))
rows 13, 14 and 16 having non-missing in between missing values. The output this time should be: "same", "larger" and "same" for rows 13, 14, and 16, and say "N/A" for the other rows.
A straight forward approach would be to split, convert to numeric, take the last 2 values and compare with an ifelse statement, i.e.
sapply(strsplit(do.call(paste, df)[c(13, 14, 16)], 'NA| '), function(i){
v1 <- as.numeric(tail(i[i != ''], 2));
ifelse(v1[1] > v1[2], 'greater',
ifelse(v1[1] == v1[2], 'same', 'smaller'))
})
#[1] "same" "smaller" "same"
NOTE
I took previous answer as a given (do.call(paste, df)[c(13, 14, 16)])
A more generic approach (as noted by Ronak, last 2 digits will fail in some cases) would be,
sapply(strsplit(gsub("([[:digit:]])+\\s+[NA]+\\s+([[:digit:]])", '\\1_\\2',
do.call(paste, df)[c(13, 14, 16)]), ' '), function(i) {
v1 <- i[grepl('_', i)];
v2 <- strsplit(v1, '_')[[1]];
ifelse(v2[1] > v2[2], 'greater',
ifelse(v2[1] == v2[2], 'same', 'smaller')) })
#[1] "same" "smaller" "same"
I want to "stratify-then-impute" using the packages available in R.
That is, I am hoping to:
1) stratify my dataset using a binary variable called "arm". This variable has no missing data.
2) run an imputation model for the two subsets
3) combine the two imputed data sets
4) run a pooled analysis.
My dataset looks like:
dataSim <- structure(list(pid = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20), arm = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), X1 = c(0.1, NA, 0.51,
0.56, -0.82, NA, NA, NA, -0.32, 0.4, 0.58, NA, 0.22, -0.23, 1.49,
-1.88, -1.77, -0.94, NA, -1.34), X2 = c(NA, -0.13, NA, 1.2, NA,
NA, NA, 0.02, -0.04, NA, NA, 0.25, -0.81, -1.67, 1.01, 1.69,
-0.06, 0.07, NA, -0.11)), .Names = c("pid", "arm", "X1", "X2"
), row.names = c(NA, 20L), class = "data.frame")
To impute, the data, I'm currently using the mi() function as follows:
library(mi)
data.1 <- dataSim[dataSim[,"arm"]==1,]
data.0 <- dataSim[dataSim[,"arm"]==0,]
data.miss.1 <- missing_data.frame(data.1)
data.miss.0 <- missing_data.frame(data.0)
imputations.1 <- mi(data.1, n.iter=5, n.chains=5, max.minutes=20, parallel=FALSE)
imputations.0 <- mi(data.0, n.iter=5, n.chains=5, max.minutes=20, parallel=FALSE)
complete(imputations.1) # viewing the imputed datasets
complete(imputations.0)
Then I don't know how to combine the 2 imputations in order to do a pooled analysis. I have unsuccessfully tried:
imputations <- rbind(imputations.0, imputations.1) # This doesn't work
# analysis.X1 <- pool(X1 ~ arm, data = imputations ) # This is what I want to run
I assume this method is a simplified version of including an interaction term when imputing, but I don't know how this is possible either.
Thanks