Exclude a Specific Value from a Unique Value Counter - r

I am trying to count how many different responses a person gives during a trial of an experiment, but there is a catch.
There are supposed to be 6 possible responses (1,2,3,4,5,6) BUT sometimes 0 is recorded as a response (it's a glitch / flaw in design).
I need to count the number of different responses they give, BUT ONLY counting unique values within the range 1-6. This helps us calculate their accuracy.
Is there a way to exclude the value 0 from contributing to a unique value counter? Any other work-arounds?
Currently I am trying this method below, but it includes 0, NA, and I think any other entry in a cell in the Unique Value Counter Column (I have named "Span6"), which makes me sad.
# My Span6 calculator:
ASixImageTrials <- data.frame(eSOPT_831$T8.RESP, eSOPT_831$T9.RESP, eSOPT_831$T10.RESP, eSOPT_831$T11.RESP, eSOPT_831$T12.RESP, eSOPT_831$T13.RESP)
ASixImageTrials$Span6 = apply(ASixImageTrials, 1, function(x) length(unique(x)))

Use na.omit inside unique and sum logic vector as below
df$res = apply(df, 1, function(x) sum(unique(na.omit(x)) > 0))
df
Output:
X1 X2 X3 X4 X5 res
1 2 1 1 2 1 2
2 3 0 1 1 2 3
3 3 NA 1 1 3 2
4 3 3 3 4 NA 2
5 1 1 0 NA 3 2
6 3 NA NA 1 1 2
7 2 0 2 3 0 2
8 0 2 2 2 1 2
9 3 2 3 0 NA 2
10 0 2 3 2 2 2
11 2 2 1 2 1 2
12 0 2 2 2 NA 1
13 0 1 4 3 2 4
14 2 2 1 1 NA 2
15 3 NA 2 2 NA 2
16 2 2 NA 3 NA 2
17 2 3 2 2 2 2
18 2 NA 3 2 2 2
19 NA 4 5 1 3 4
20 3 1 2 1 NA 3
Data:
set.seed(752)
mat <- matrix(rbinom(100, 10, .2), nrow = 20)
mat[sample(1:100, 15)] = NA
data.frame(mat) -> df
df$res = apply(df, 1, function(x) sum(unique(na.omit(x)) > 0))

could you edit your question and clarify why this doesn't solve your problem?
# here is a numeric vector with a bunch of numbers
mtcars$carb
# here is how to limit that vector to only 1-6
mtcars$carb[ mtcars$carb %in% 1:6 ]
# here is how to tabulate that result
table( mtcars$carb[ mtcars$carb %in% 1:6 ] )

Related

Add an occasion flag to data frame

For each individual, I would like to add an occasion flag for my data frame when the amount is bigger than zero. I need this flag for further calculations. Here what I would like to achieve.
dfin <-
ID AMT
1 50
1 NA
1 10
1 NA
2 15
2 NA
2 NA
3 10
3 15
dfout <-
ID AMT FLAG
1 50 1
1 NA 1
1 10 2
1 NA 2
2 15 1
2 NA 1
2 NA 1
3 10 1
3 15 2
How can I achieve this in R?
You can test which values are not NA and compute the cumulative sum.
dfout = dfin
dfout$FLAG = cumsum(!is.na(dfin$AMT))
dfout
ID AMT FLAG
1 1 50 1
2 1 NA 1
3 1 10 2
4 1 NA 2
5 2 15 3
6 2 NA 3
7 2 NA 3
8 3 10 4
As I have changed the output that I want. I am here answering the question based on the answer provided by #G5W to make it by ID
library(dplyr)
dfout <- dfin %>%
group_by(ID) %>%
mutate(FLAG = cumsum(!is.na(AMT)))

Missing data per questionnaire for a specific group

I am trying to view how many missing I have per questionnaires for a specific group of participants. i.e.
I have a dataframe i.e.
id Result QA1 QA2 QA3 QA4 QA5 QA6 QB1 QB2 QB3 QB4 QB5 QB6
1 1 1 3 2 2 3 3 3 NA 1 1 2 1
2 1 2 NA 2 2 2 1 1 3 2 1 2 3
3 2 3 2 3 1 1 1 2 1 1 NA 3 NA
4 1 2 1 NA 3 2 NA 1 3 3 1 2 1
5 6 1 1 3 2 1 3 2 1 1 1 1 NA
Say I want to know how many missing there are in questionnaire A for all results that are coded by 1, how can I do this? Any suggestions?
You can create a function which takes as arguments the dataframe, the questionnaire and the code, i.e.
fun1 <- function(df, questionnaire, code){
d <- sum(is.na(df[df$Result == code,grepl(questionnaire, names(df))]))
return(d)
}
fun1(df, 'A', 1)
#[1] 3
fun1(df, 'B', 1)
#[1] 1
fun1(df, 'A', 2)
#[1] 0

Subtract multiple columns ignoring NA

I'm fairly new to R and have run into an issue with NA's. This question may have been answered elsewhere but I can't seem to find the answer. I'm trying to do sort of the opposite of rowSums() in that I'm trying to subtract x2 and x3 from x1 in order to generate x4 without NA's. The code I'm currently using is as follows:
> x <- data.frame(x1 = 3, x2 = c(4:1, 2:5), x3=c(1,NA))
> x$x4=x$x1-x$x2-x$x3
> x
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA NA
3 3 2 1 0
4 3 1 NA NA
5 3 2 1 0
6 3 3 NA NA
7 3 4 1 -2
8 3 5 NA NA
In other words I want to ingore the NA's similar to how rowSums allows the na.rm=TRUE argument so that I get this result:
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA 0
3 3 2 1 0
4 3 1 NA 2
5 3 2 1 0
6 3 3 NA 0
7 3 4 1 -2
8 3 5 NA -2
Any help is greatly appreciated.
You can use something like this if all columns have NAs -
x$x4 <- ifelse(is.na(x$x1),0,x$x1) -ifelse(is.na(x$x2),0,x$x2)-ifelse(is.na(x$x3),0,x$x3)
Provided you want to treat NAs as 0. Else you can replace the 0s in the above formula with the value you need.
Just use rowSums:
> x$x4 <- x$x1 - rowSums(x[,2:3], na.rm=TRUE)
> x
x1 x2 x3 x4
1 3 4 1 -2
2 3 3 NA 0
3 3 2 1 0
4 3 1 NA 2
5 3 2 1 0
6 3 3 NA 0
7 3 4 1 -2
8 3 5 NA -2

Subsequent row summing in dataframe object

I would like to do subsequent row summing of a columnvalue and put the result into a new columnvariable without deleting any row by another columnvalue .
Below is some R-code and an example that does the trick and hopefully illustrates my question. I was wondering if there is a more elegant way to do since the for loop will be time consuming in my actual object.
Thanks for any feedback.
As an example dataframe:
MyDf <- data.frame(ID = c(1,1,1,2,2,2), Y = 1:6)
MyDf$FIRST <- c(1,0,0,1,0,0)
MyDf.2 <- MyDf
MyDf.2$Y2 <- c(1,3,6,4,9,15)
The purpose of this is so that I can write code that calculates Y2 in MyDf.2 above for each ID, separately.
This is what I came up with and, it does the trick. (Calculating a TEST column in MyDf that has to be equal to Y2 cin MyDf.2)
MyDf$TEST <- NA
for(i in 1:length(MyDf$Y)){
MyDf[i,]$TEST <- ifelse(MyDf[i,]$FIRST == 1, MyDf[i,]$Y,MyDf[i,]$Y + MyDf[i-1,]$TEST)
}
MyDf
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
MyDf.2
ID Y FIRST Y2
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
You need ave and cumsum to get the column you want. transform is just to modify your existing data.frame.
> MyDf <- transform(MyDf, TEST=ave(Y, ID, FUN=cumsum))
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15

Replacing zero's and NA with recursive value

I'm trying to replace NA & zero values recursive. Im working on time series data where a NA or zero is best replaced with the value previous week (every 15min measurement so 672 steps back). My data contains ~two years data of 15min values, thus this is a large set. Not much NA or zeros are expected and adjacent series of zero's or NA >672 are also not expected.
I found this thread (recursive replacement in R) where a recursive way is shown, adapted it to my problem.
load[is.na(load)] <- 0
o <- rle(load)
o$values[o$values == 0] <- o$values[which(o$values == 0) - 672]
newload<-inverse.rle(o)
Now is this "the best" or an elegant method?
And how will I protect my code from errors when a zero value occurs within the first 672 values?
Im used to matlab, where I would do something like:
% Replace NaN with 0
Load(isnan(Load))=0;
% Find zero values
Ind=find(Load==0);
for f=Ind
if f>672
fprintf('Replacing index %d with the load 1 day ago\n', Ind)
% Replace zero with previous week value
Load(f)=Load(f-672);
end
end
As im not familiar to R how would i set such a if else loop up?
A reproducible example(change the code as the example used from other thread didnt cope with adjacent zeros):
day<-1:24
load<-rep(day, times=10)
load[50:54]<-0
load[112:115]<-NA
load[is.na(load)] <- 0
load[load==0]<-load[which(load == 0) - 24]
Which gives the original load dataframe without zero's and NA's.
When in the first 24 values a zero exist, this goes wrong because there is no value to replace with:
loadtest[c(10,50:54)]<-0 # instead of load[50:54]<-0 gives:
Error in loadtest[which(loadtest == 0) - 24] :
only 0's may be mixed with negative subscripts
Now to work around this an if else statement can be used, but i dont know how to apply. Something like:
day<-1:24
loadtest<-rep(day, times=10)
loadtest[c(10,50:54)]<-0
loadtest[112:115]<-NA
loadtest[is.na(loadtest)] <- 0
if(INDEX(loadtest[loadtest==0])<24) {
# nothing / mean / standard value
} else {
loadtest[loadtest==0]<-loadtest[which(loadtest == 0) - 24]
}
Ofcourse INDEX isnt valid code..
You can use this example:
set.seed(42)
x <- sample(c(0,1,2,3,NA), 100, T)
stepback <- 6
x_old <- x
x_new <- x_old
repeat{
filter <- x_new==0 | is.na(x_new)
x_new[filter] <- c(rep(NA, stepback), head(x_new, -stepback))[filter]
if(identical(x_old,x_new)) break
x_old <- x_new
}
x
x_new
Result:
> x
[1] NA NA 1 NA 3 2 3 0 3 3 2 3 NA 1 2 NA NA 0 2 2 NA 0 NA NA 0
[26] 2 1 NA 2 NA 3 NA 1 3 0 NA 0 1 NA 3 1 2 0 NA 2 NA NA 3 NA 3
[51] 1 1 1 3 0 3 3 0 1 2 3 NA 3 2 NA 0 1 NA 3 1 0 0 1 2 0
[76] 3 0 1 2 0 2 0 1 3 3 2 1 0 0 1 3 0 1 NA NA 3 1 2 3 3
> x_new
[1] NA NA 1 NA 3 2 3 NA 3 3 2 3 3 1 2 3 2 3 2 2 2 3 2 3 2
[26] 2 1 3 2 3 3 2 1 3 2 3 3 1 1 3 1 2 3 1 2 3 1 3 3 3
[51] 1 1 1 3 3 3 3 1 1 2 3 3 3 2 1 2 1 3 3 1 1 2 1 2 3
[76] 3 1 1 2 2 2 3 1 3 3 2 1 3 1 1 3 2 1 3 1 3 1 2 3 3
Note that some values are still NA, because there is no prior information to use for them. If your data has sufficient prior information, this will not happen.
One option would be to wrap your vector into a matrix with 672 rows:
load2 <- matrix(load, nrow=672)
Then apply the last observation carried forward (either from zoo, or the method above, or ...) to each row of the matrix:
load3 <- apply( load2, 1, locf.function )
Then take the resulting matrix back to a vector with the correct length:
load4 <- t(load3)[ seq_along(load) ]

Resources