Missing data per questionnaire for a specific group - r

I am trying to view how many missing I have per questionnaires for a specific group of participants. i.e.
I have a dataframe i.e.
id Result QA1 QA2 QA3 QA4 QA5 QA6 QB1 QB2 QB3 QB4 QB5 QB6
1 1 1 3 2 2 3 3 3 NA 1 1 2 1
2 1 2 NA 2 2 2 1 1 3 2 1 2 3
3 2 3 2 3 1 1 1 2 1 1 NA 3 NA
4 1 2 1 NA 3 2 NA 1 3 3 1 2 1
5 6 1 1 3 2 1 3 2 1 1 1 1 NA
Say I want to know how many missing there are in questionnaire A for all results that are coded by 1, how can I do this? Any suggestions?

You can create a function which takes as arguments the dataframe, the questionnaire and the code, i.e.
fun1 <- function(df, questionnaire, code){
d <- sum(is.na(df[df$Result == code,grepl(questionnaire, names(df))]))
return(d)
}
fun1(df, 'A', 1)
#[1] 3
fun1(df, 'B', 1)
#[1] 1
fun1(df, 'A', 2)
#[1] 0

Related

how to remove Some NA with respect of 2 groups [duplicate]

This question already has an answer here:
R remove groups with only NAs
(1 answer)
Closed 3 years ago.
suppose I have
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA
first column is household index and second is persons in each household. I want to remove rows whose are NA in mode for each person in each household.for example in the first household mode column for third person is all NA so I want to remove it. same for second person in second family
output:
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
2 1 2
2 1 NA
library(data.table)
dt[, .SD[ ( !all( is.na( mode ) ) ) ], by= .( HH, PP ) ][]
HH PP mode
1: 1 1 2
2: 1 1 NA
3: 1 1 NA
4: 1 2 2
5: 1 2 2
6: 2 1 2
7: 2 1 NA
sample data
dt <- fread(" HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA")

split dataframe cumulatively by variable level

With a df like this:
x=data.frame(id=c(1,1,1,2,2,2,3,3,3), val=c(1,2,3,2,3,4,1,3,0))
I want to get output like this:
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0
where the df is split into a list of as many dataframes as there are levels of the splitting variable, i.e. id. Each dataframe should start at the first level and include all rows up to each successive level.
I can do this with a loop:
out<-NULL
for(i in 1:3){
out[[i]] <- x[x$id<=i,]
}
out
However, is there a simpler method using e.g. split that I am overlooking? Ideally a one liner.
You can do this in base R with split and Reduce using the accumulate=TRUE argument. split is used to split the data.frame into a list of data.frames by by ID. Reduce is applies rbind to each list element and adding the accumulate=TRUE successively combines the data.frames in the list.
Reduce(rbind, split(x, x$id), accumulate=TRUE)
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0

Create a counting variable which I can use to group my unemployment data in R

I have data as below where i created the variable "B" with the function:
index <- which(Count$unemploymentduration ==1)
Count$B[index]<-1:length(index)
ID unemployment B
1 1 1
1 2 NA
1 3 NA
1 4 NA
2 1 2
2 2 NA
2 0 NA
2 1 3
2 2 NA
2 3 NA
2 4 NA
2 5 NA
And i want my data in this way and have no real idea how to get it like this.
Thought of an "if-function" but never used one in R.
ID unemployment B
1 1 1
1 2 1
1 3 1
1 4 1
2 1 2
2 2 2
2 0 2
2 1 3
2 2 3
2 3 3
2 4 3
2 5 3
Could someone help me out?
We can use na.locf from library(zoo)
library(zoo)
Count$B <- na.locf(Count$B)
But, this can be created directly without using an 'index'
Count$B <- cumsum(Count$unemployment==1)

Exclude a Specific Value from a Unique Value Counter

I am trying to count how many different responses a person gives during a trial of an experiment, but there is a catch.
There are supposed to be 6 possible responses (1,2,3,4,5,6) BUT sometimes 0 is recorded as a response (it's a glitch / flaw in design).
I need to count the number of different responses they give, BUT ONLY counting unique values within the range 1-6. This helps us calculate their accuracy.
Is there a way to exclude the value 0 from contributing to a unique value counter? Any other work-arounds?
Currently I am trying this method below, but it includes 0, NA, and I think any other entry in a cell in the Unique Value Counter Column (I have named "Span6"), which makes me sad.
# My Span6 calculator:
ASixImageTrials <- data.frame(eSOPT_831$T8.RESP, eSOPT_831$T9.RESP, eSOPT_831$T10.RESP, eSOPT_831$T11.RESP, eSOPT_831$T12.RESP, eSOPT_831$T13.RESP)
ASixImageTrials$Span6 = apply(ASixImageTrials, 1, function(x) length(unique(x)))
Use na.omit inside unique and sum logic vector as below
df$res = apply(df, 1, function(x) sum(unique(na.omit(x)) > 0))
df
Output:
X1 X2 X3 X4 X5 res
1 2 1 1 2 1 2
2 3 0 1 1 2 3
3 3 NA 1 1 3 2
4 3 3 3 4 NA 2
5 1 1 0 NA 3 2
6 3 NA NA 1 1 2
7 2 0 2 3 0 2
8 0 2 2 2 1 2
9 3 2 3 0 NA 2
10 0 2 3 2 2 2
11 2 2 1 2 1 2
12 0 2 2 2 NA 1
13 0 1 4 3 2 4
14 2 2 1 1 NA 2
15 3 NA 2 2 NA 2
16 2 2 NA 3 NA 2
17 2 3 2 2 2 2
18 2 NA 3 2 2 2
19 NA 4 5 1 3 4
20 3 1 2 1 NA 3
Data:
set.seed(752)
mat <- matrix(rbinom(100, 10, .2), nrow = 20)
mat[sample(1:100, 15)] = NA
data.frame(mat) -> df
df$res = apply(df, 1, function(x) sum(unique(na.omit(x)) > 0))
could you edit your question and clarify why this doesn't solve your problem?
# here is a numeric vector with a bunch of numbers
mtcars$carb
# here is how to limit that vector to only 1-6
mtcars$carb[ mtcars$carb %in% 1:6 ]
# here is how to tabulate that result
table( mtcars$carb[ mtcars$carb %in% 1:6 ] )

How to create a count variable by group for specific values in the variable of interest?

At the moment I have to deal with paradata (long-format) generated by a software during the data collection phase of a cohort study.
How can I create a variable containing the number of occurence of a certain value by a group-variable (like by id: gen _n if VAR1==2 in Stata)?
Basically the data looks like this:
ID: VAR1:
1 2
1 1
1 2
2 2
2 3
2 2
3 2
3 2
3 2
I can create a variable count.1 using
`data$count.1 <- ave(data$VAR1, data$ID, FUN = seq_along)`
ID: VAR1: count.1:
1 2 1
1 1 2
1 2 3
2 2 1
2 3 2
2 2 3
3 2 1
3 2 2
3 2 3
How can I create a variable count.2 counting by ID the number of the occurence of the event 2 in VAR1?
ID: VAR1: count.1: count.2:
1 2 1 1
1 1 2 NA
1 2 3 2
2 2 1 1
2 3 2 NA
2 2 3 2
3 1 1 NA
3 2 2 1
3 2 3 2
The Data:
ID=c(1,1,1,2,2,2,3,3,3)
VAR1=c(2,1,2,2,3,2,1,2,2)
data <- as.data.frame(cbind(ID, VAR1))
Thanks in advance!!!
Try
data$count.2 <- with(data, ave(VAR1==2, ID,
FUN=function(x) ifelse(x, cumsum(x), NA)) )
data$count.2
#[1] 1 NA 2 1 NA 2 NA 1 2
Or using data.table
library(data.table)
setDT(data)[VAR1==2, count.2:=1:.N, by=ID][]
# ID VAR1 count.2
#1: 1 2 1
#2: 1 1 NA
#3: 1 2 2
#4: 2 2 1
#5: 2 3 NA
#6: 2 2 2
#7: 3 1 NA
#8: 3 2 1
#9: 3 2 2
Or using dplyr
library(dplyr)
data %>%
group_by(ID) %>%
mutate(count.2= ifelse(VAR1==2, cumsum(VAR1==2), NA))

Resources