Related
In my dataframe, the three responses (yes, maybe, no) to a question are printed as three separate variables (a binary outcome of each possible response).
I want to combine the three binary responses into one variable, showing which response was selected.
The following piece of code does this:
data$var1 <- ifelse(data$var1.Yes, 0,
ifelse(data$var1.Maybe, 1,
ifelse(data$var1.No,2, NA)))
However, because I have many variables (e.g., var1, var2, var3, etc..), I want to pass a function or loop where the code runs for multiple variables whose column names include ascending numbers.
I thought of the following function:
fun <- function(i){
paste0("data$var", i) <- ifelse(paste0("data$var", i, ".Yes"), 0,
ifelse(paste0("data$var",i,".Maybe"), 1,
ifelse(paste0("data$var",i,".No"),2, NA)))
}
fun(1:3)
Unfortunately, this does not work. How can I apply this function to several variables at once?
dput(test)
structure(list(var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe = c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No= c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe = c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No= c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe = c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
class = "data.frame"))
You can loop through each three columns;
lapply(1:(ncol(test)/3), function(col) ifelse(test[,col*3-2], 0,
ifelse(test[,col*3-1], 1,
ifelse(col*3, 2, NA))))
# [[1]]
# [1] 1 2 0 1 0 0 0 2 NA 0
#
# [[2]]
# [1] 2 1 0 NA 0 0 0 2 2 0
#
# [[3]]
# [1] 2 0 2 2 1 1 1 NA 1 0
This can be merged with your data:
cbind(test, matrix(unlist(lapply_results), nrow = nrow(test)))
Data:
data.frame(
var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe= c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No = c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe= c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No = c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe= c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
var3.No = c(1, 0, 1, 1, 0, 0, 0, NA, 0, 0)) -> test
I created a binary matrix and I wanna plot 1's as black square.
How can I write it without using any package?
For example, my matrix is:
m <- matrix(c(0,1,1,0,0,1,0,1,1),nrow=3, ncol=3)
Do you want this?
m <- matrix(c(0,1,1,0,0,1,0,1,1), nrow=3, ncol=3)
image(m, main = "My binary matrix plot", col = c("white", "black"))
If image doesn't suffice, we could write a generalized function using mapply like this one.
chessplot <- function(m, col=1, border=NA) {
stopifnot(dim(m)[1] == dim(m)[2]) ## allows only square matrices
n <- nrow(m)
plot(n, n, type='n', xlim=c(0, n), ylim=c(0, n))
mapply(\(i, j, m) {
rect(-1 + i, n - j, 0 + i, n - j + 1, col=m, border=border)
}, seq(n), rep(seq(n), each=n), t(m)) |> invisible()
}
Gives:
chessplot(m3)
chessplot(m4)
chessplot(m8)
Data:
m3 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1), .Dim = c(3L, 3L))
m4 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0), .Dim = c(4L,
4L))
m8 <- structure(c(0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1,
0, 1, 0, 1, 0), .Dim = c(8L, 8L))
I have a 3185x90 dataset of binary values and want to do a chi-squared test of independence, comparing all column variables against each other.
I've been tried using different variations of code from google searches with chisq.test() and some for loops, but none of them have worked so far.
How do I do this?
This is the frame I've tinkered with. My dataset is oak.
chi_trial <- data.frame(a = c(0,1), b = c(0,1))
for(row in 1:nrow(oak)){
print(row)
print(chisq.test(c(oak[row,1],d[row,2])))
}
I also tried this:
apply(d, 1, chisq.test)
which gives me the error: Error in FUN(newX[, i], ...) :
all entries of 'x' must be nonnegative and finite
dput(oak[1:2],)
structure(list(post_flu = structure(c(1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
label = "Receipt of Flu Vaccine - Encounter Survey", format.stata = "%10.0g")), row.names = c(NA,
-3185L), class = c("tbl_df", "tbl", "data.frame"), label = "Main Oakland Clinic Analysis Dataset")
I added a sample of my data with the final lines of the output. The portion of the dataset is small, but it all looks like this.
You could use something like the code below, which is similar to R's cor function. I don't have your data, so I'm simulating some. Note that I get one significant p-value, using the traditional cut-off of 0.05.
set.seed(3)
nr=3185; nc=3
oak <- as.data.frame(matrix(sample(0:1, size=nr*nc, replace=TRUE), ncol=nc))
oak
mult.chi <- function(data){
nc <- ncol(data)
res <- matrix(0, nrow=nc, ncol=nc) # or NA
for(i in 1:(nc-1))
for(j in (i+1):nc)
res[i,j] <- suppressWarnings(chisq.test(oak[,i], oak[,j])$p.value)
rownames(res) <- colnames(data)
colnames(res) <- colnames(data)
res
}
mult.chi(oak)
# V1 V2 V3
# V1 0 0.7847063 0.32012466
# V2 0 0.0000000 0.01410326
# V3 0 0.0000000 0.00000000
So consider applying a multiple testing adjustment as mentioned in the comments.
Here is a solution with combn to get all combinations of column numbers 2 by 2. Tested with the data in #Edward's answer.
chisq2cols <- function(X){
y <- matrix(0, ncol(X), ncol(X))
cmb <- combn(ncol(X), 2)
y[upper.tri(y)] <- apply(cmb, 2, function(k){
tbl <- table(X[k])
chisq.test(tbl)$p.value
})
y
}
chisq2cols(oak)
# [,1] [,2] [,3]
#[1,] 0 0.7847063 0.32012466
#[2,] 0 0.0000000 0.01410326
#[3,] 0 0.0000000 0.00000000
I have two logical vectors which look like this:
x = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
I would like to count the intersections between ranges of consecutive values. Meaning that consecutive values (of 1s) are handled as one range. So in the above example, each vector contains one range of 1s and these ranges intersect only once.
Is there any R package for range intersections which could help here?
I think this should work (calling your logical vectors x and y):
sum(rle(x & y)$values)
A few examples:
x = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 1
x = c(1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 2
x = c(1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0)
y = c(0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 3
By way of explanation, x & y gives the intersections on a per-element level, rle collapses runs of adjacent intersections, and sum counts.
I have this data frame
d1 <- c(1, 0, 0, 1, 0, 0, 0, 1)
d2 <- c(0, 1, 0, 1, 1, 0, 0, 0)
d3 <- c(0, 0, 1, 0, 0, 0, 1, 0)
d4 <- c(0, 0, 0, 1, 0, 0, 0, 0)
d5 <- c(0, 0, 0, 0, 0, 0, 1, 0)
d6 <- c(0, 0, 0, 1, 0, 1, 0, 1)
d7 <- c(0, 0, 1, 0, 0, 1, 0, 1)
d8 <- c(1, 0, 0, 0, 0, 0, 0, 1)
d9 <- c(0, 0, 0, 0, 0, 1, 0, 1)
d10 <- c(1, 1, 0, 0, 0, 1, 0, 1)
df <- as.data.frame(rbind(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10))
str(df)
I get all lines where V8 == 1, and find the relative frequencies for each column like this (for example column 2, V2):
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))
0 1
0.8333333 0.1666667
My question is how can I get each relative frequency individually, let's say set it into a new variable. I found this
How to extract value from table function in R
but it does not work in my case, since 0 and 1 are numericals.
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))["1"]
use as.numeric, and then, after that, change them to ratios
the numbers 0 and 1 are extracted with
as.numeric(names(table(data)))
and the numbers 64 and 17 are extracted with
counts<-as.numeric(table(data))
then
ratios<-counts/sum(counts)
Not completely sure about what you're trying to do but...
sapply(subset(df, V8==1), function(x) sum(x==1)/length(x))