How do you create categorical variables using dummy variables in r? - r

case1: 0, 0, 0, 0, 1, 1, 1, 1, 0, 0
case2: 1, 1, 0, 0, 1, 0, 0, 0, 1, 0
case3: 0, 1, 0, 0, 0, 1, 1, 0, 0, 1
I would like to get the following vectors from the table above.
answer: 2, mix, 0, 0, mix, mix, mix, 1, 2, 3
How do I solve the above problem in r?

I made my own function to solve the above problem.
I hope it helps people who have the same problems as me.
dummy_to_cate <- function(mydata,column_area){
result_vec <- NA
vec_q=NA
colname <- colnames(mydata)
result_vec[is.na(mydata[,column_area[1]])==FALSE]<-paste(colname[1])
for (i in column_area[-1]) {
vec_q[is.na(mydata[,column_area[i]])==FALSE] <-1
vec_q[is.na(mydata[,column_area[i]])==TRUE] <-0
result_vec[vec_q==1 & is.na(mydata[,column_area[1]])==TRUE]<- paste(colname[i])
}
df<-is.na(mydata[,column_area])==FALSE
result_vec[rowSums(df)>= 2]<-'mix'
return(result_vec)
}

Maybe you can try the code below
sapply(asplit(rbind(c1, c2, c3), 2), function(x) {
u <- which(x == 1)
ifelse(length(u) == 0, 0, ifelse(length(u) == 1, u, "mix"))
})
which gives
[1] "2" "mix" "0" "0" "mix" "mix" "mix" "1" "2" "3"
Data
c1 <- c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0)
c2 <- c(1, 1, 0, 0, 1, 0, 0, 0, 1, 0)
c3 <- c(0, 1, 0, 0, 0, 1, 1, 0, 0, 1)

Related

Running a function on each three sets of columns

In my dataframe, the three responses (yes, maybe, no) to a question are printed as three separate variables (a binary outcome of each possible response).
I want to combine the three binary responses into one variable, showing which response was selected.
The following piece of code does this:
data$var1 <- ifelse(data$var1.Yes, 0,
ifelse(data$var1.Maybe, 1,
ifelse(data$var1.No,2, NA)))
However, because I have many variables (e.g., var1, var2, var3, etc..), I want to pass a function or loop where the code runs for multiple variables whose column names include ascending numbers.
I thought of the following function:
fun <- function(i){
paste0("data$var", i) <- ifelse(paste0("data$var", i, ".Yes"), 0,
ifelse(paste0("data$var",i,".Maybe"), 1,
ifelse(paste0("data$var",i,".No"),2, NA)))
}
fun(1:3)
Unfortunately, this does not work. How can I apply this function to several variables at once?
dput(test)
structure(list(var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe = c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No= c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe = c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No= c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe = c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
class = "data.frame"))
You can loop through each three columns;
lapply(1:(ncol(test)/3), function(col) ifelse(test[,col*3-2], 0,
ifelse(test[,col*3-1], 1,
ifelse(col*3, 2, NA))))
# [[1]]
# [1] 1 2 0 1 0 0 0 2 NA 0
#
# [[2]]
# [1] 2 1 0 NA 0 0 0 2 2 0
#
# [[3]]
# [1] 2 0 2 2 1 1 1 NA 1 0
This can be merged with your data:
cbind(test, matrix(unlist(lapply_results), nrow = nrow(test)))
Data:
data.frame(
var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe= c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No = c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe= c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No = c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe= c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
var3.No = c(1, 0, 1, 1, 0, 0, 0, NA, 0, 0)) -> test

How to plot a binary matrix without using additional packages?

I created a binary matrix and I wanna plot 1's as black square.
How can I write it without using any package?
For example, my matrix is:
m <- matrix(c(0,1,1,0,0,1,0,1,1),nrow=3, ncol=3)
Do you want this?
m <- matrix(c(0,1,1,0,0,1,0,1,1), nrow=3, ncol=3)
image(m, main = "My binary matrix plot", col = c("white", "black"))
If image doesn't suffice, we could write a generalized function using mapply like this one.
chessplot <- function(m, col=1, border=NA) {
stopifnot(dim(m)[1] == dim(m)[2]) ## allows only square matrices
n <- nrow(m)
plot(n, n, type='n', xlim=c(0, n), ylim=c(0, n))
mapply(\(i, j, m) {
rect(-1 + i, n - j, 0 + i, n - j + 1, col=m, border=border)
}, seq(n), rep(seq(n), each=n), t(m)) |> invisible()
}
Gives:
chessplot(m3)
chessplot(m4)
chessplot(m8)
Data:
m3 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1), .Dim = c(3L, 3L))
m4 <- structure(c(0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0), .Dim = c(4L,
4L))
m8 <- structure(c(0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1,
0, 1, 0, 1, 0), .Dim = c(8L, 8L))

Chi Square Test of Independence of Whole Dataset

I have a 3185x90 dataset of binary values and want to do a chi-squared test of independence, comparing all column variables against each other.
I've been tried using different variations of code from google searches with chisq.test() and some for loops, but none of them have worked so far.
How do I do this?
This is the frame I've tinkered with. My dataset is oak.
chi_trial <- data.frame(a = c(0,1), b = c(0,1))
for(row in 1:nrow(oak)){
print(row)
print(chisq.test(c(oak[row,1],d[row,2])))
}
I also tried this:
apply(d, 1, chisq.test)
which gives me the error: Error in FUN(newX[, i], ...) :
all entries of 'x' must be nonnegative and finite
dput(oak[1:2],)
structure(list(post_flu = structure(c(1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
label = "Receipt of Flu Vaccine - Encounter Survey", format.stata = "%10.0g")), row.names = c(NA,
-3185L), class = c("tbl_df", "tbl", "data.frame"), label = "Main Oakland Clinic Analysis Dataset")
I added a sample of my data with the final lines of the output. The portion of the dataset is small, but it all looks like this.
You could use something like the code below, which is similar to R's cor function. I don't have your data, so I'm simulating some. Note that I get one significant p-value, using the traditional cut-off of 0.05.
set.seed(3)
nr=3185; nc=3
oak <- as.data.frame(matrix(sample(0:1, size=nr*nc, replace=TRUE), ncol=nc))
oak
mult.chi <- function(data){
nc <- ncol(data)
res <- matrix(0, nrow=nc, ncol=nc) # or NA
for(i in 1:(nc-1))
for(j in (i+1):nc)
res[i,j] <- suppressWarnings(chisq.test(oak[,i], oak[,j])$p.value)
rownames(res) <- colnames(data)
colnames(res) <- colnames(data)
res
}
mult.chi(oak)
# V1 V2 V3
# V1 0 0.7847063 0.32012466
# V2 0 0.0000000 0.01410326
# V3 0 0.0000000 0.00000000
So consider applying a multiple testing adjustment as mentioned in the comments.
Here is a solution with combn to get all combinations of column numbers 2 by 2. Tested with the data in #Edward's answer.
chisq2cols <- function(X){
y <- matrix(0, ncol(X), ncol(X))
cmb <- combn(ncol(X), 2)
y[upper.tri(y)] <- apply(cmb, 2, function(k){
tbl <- table(X[k])
chisq.test(tbl)$p.value
})
y
}
chisq2cols(oak)
# [,1] [,2] [,3]
#[1,] 0 0.7847063 0.32012466
#[2,] 0 0.0000000 0.01410326
#[3,] 0 0.0000000 0.00000000

Intersecting ranges of consecutive values in logical vectors in R

I have two logical vectors which look like this:
x = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
I would like to count the intersections between ranges of consecutive values. Meaning that consecutive values (of 1s) are handled as one range. So in the above example, each vector contains one range of 1s and these ranges intersect only once.
Is there any R package for range intersections which could help here?
I think this should work (calling your logical vectors x and y):
sum(rle(x & y)$values)
A few examples:
x = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 1
x = c(1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
y = c(0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 2
x = c(1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0)
y = c(0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0)
sum(rle(x & y)$values)
# [1] 3
By way of explanation, x & y gives the intersections on a per-element level, rle collapses runs of adjacent intersections, and sum counts.

Extract value from table function in R (no factors)

I have this data frame
d1 <- c(1, 0, 0, 1, 0, 0, 0, 1)
d2 <- c(0, 1, 0, 1, 1, 0, 0, 0)
d3 <- c(0, 0, 1, 0, 0, 0, 1, 0)
d4 <- c(0, 0, 0, 1, 0, 0, 0, 0)
d5 <- c(0, 0, 0, 0, 0, 0, 1, 0)
d6 <- c(0, 0, 0, 1, 0, 1, 0, 1)
d7 <- c(0, 0, 1, 0, 0, 1, 0, 1)
d8 <- c(1, 0, 0, 0, 0, 0, 0, 1)
d9 <- c(0, 0, 0, 0, 0, 1, 0, 1)
d10 <- c(1, 1, 0, 0, 0, 1, 0, 1)
df <- as.data.frame(rbind(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10))
str(df)
I get all lines where V8 == 1, and find the relative frequencies for each column like this (for example column 2, V2):
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))
0 1
0.8333333 0.1666667
My question is how can I get each relative frequency individually, let's say set it into a new variable. I found this
How to extract value from table function in R
but it does not work in my case, since 0 and 1 are numericals.
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))["1"]
use as.numeric, and then, after that, change them to ratios
the numbers 0 and 1 are extracted with
as.numeric(names(table(data)))
and the numbers 64 and 17 are extracted with
counts<-as.numeric(table(data))
then
ratios<-counts/sum(counts)
Not completely sure about what you're trying to do but...
sapply(subset(df, V8==1), function(x) sum(x==1)/length(x))

Resources