Replace matrix values based on vector with matching names in R - r

I have a data frame in R with binary entries for three variables (a, b and c)
set.seed(12)
library(dplyr)
df <- data.frame(a = rbinom(10, 1, 0.5), b = rbinom(10, 2, 0.3), c = rbinom(10, 1, 0.8))
df
a b c
1 0 1 1
2 1 1 0
3 0 0 1
4 1 1 1
5 0 0 1
6 1 1 1
7 1 0 1
8 0 0 1
9 0 1 1
10 1 0 1
I also have a numeric vector whose names are the same of the columns in df:
vec <- rnorm(3, 1, 0.5)
names(vec) <- colnames(df)
vec
a b c
1.3369906 2.0360179 0.7294857
Here's the thing: for each column in df and for each observation, if the variable has a value of 1, then replace the 1 by the values in vec. Otherwise, if the original value is 0, then I want to keep it. I tried to perform a loop, but it didn't work well.
for(i in 1:ncol(df)){
df[,i][df==1] <- vec[i]
}
I feel it might be due to the need to specify the matching pattern between the matrix and the vector. Is there an alternative way to do that?

If you want to use a loop, the following should do the trick.
for(i in 1:ncol(df)){
for(j in 1:nrow(df)){
if(df[j,i]>=1){df[j,i]<-vec[colnames(df[i])]}
else{df[j,i]<-df[j,i]}
}
}

Related

Random value in column in R

Does anyone have an idea how to generate column of random values where only one random row is marked with number "1". All others should be "0".
I need function for this in R code.
Here is what i need in photos:
df <- data.frame(subject = 1, choice = 0, price75 = c(0,0,0,1,1,1,0,1))
This command will update the choice column to contain a single random row with value of 1 each time it is called. All other rows values in the choice column are set to 0.
df$choice <- +(seq_along(df$choice) == sample(nrow(df), 1))
With integer(length(DF$choice)) a vector of 0 is created where [<- is replacing a 1 on the position from sample(length(DF$choice), 1).
DF <- data.frame(subject=1, choice="", price75=c(0,0,0,1,1,1,0,1))
DF$choice <- `[<-`(integer(nrow(DF)), sample(nrow(DF), 1L), 1L)
DF
# subject choice price75
#1 1 0 0
#2 1 0 0
#3 1 0 0
#4 1 1 1
#5 1 0 1
#6 1 0 1
#7 1 0 0
#8 1 0 1
> x <- rep(0, 10)
> x[sample(1:10, 1)] <- 1
> x
[1] 0 0 0 0 0 0 0 1 0 0
Many ways to set a random value in a row\column in R
df<-data.frame(x=rep(0,10)) #make dataframe df, with column x, filled with 10 zeros.
set.seed(2022) #set a random seed - this is for repeatability
#two base methods for sampling:
#sample.int(n=10, size=1) # sample an integer from 1 to 10, sample size of 1
#sample(x=1:10, size=1) # sample from 1 to 10, sample size of 1
df$x[sample.int(n=10, size=1)] <- 1 # randomly selecting one of the ten rows, and replacing the value with 1
df

Add X number of columns to a data.frame

I would like to add a varying number (X) of columns with 0 to an existing data.frame within a function.
Here is an example data.frame:
dt <- data.frame(x=1:3, y=4:6)
I would like to get this result if X=1 :
a x y
1 0 1 4
2 0 2 5
3 0 3 6
And this if X=3 :
a b c x y
1 0 0 0 1 4
2 0 0 0 2 5
3 0 0 0 3 6
What would be an efficient way to do this?
We can assign multiple columns to '0' based on the value of 'X'
X <- 3
nm1 <- names(dt)
dt[letters[seq_len(X)]] <- 0
dt[c(setdiff(names(dt), nm1), nm1)]
Also, we can use add_column from tibble and create columns at a specific location
library(tibble)
add_column(dt, .before = 1, !!!setNames(as.list(rep(0, X)),
letters[seq_len(X)]))
A second option is cbind
f <- function(x, n = 3) {
cbind.data.frame(matrix(
0,
ncol = n,
nrow = nrow(x),
dimnames = list(NULL, letters[1:n])
), x)
}
f(dt, 5)
# a b c d e x y
#1 0 0 0 0 0 1 4
#2 0 0 0 0 0 2 5
#3 0 0 0 0 0 3 6
NOTE: because letters has a length of 26 the function would need some adjustment regarding the naming scheme if n > 26.
You can try the code below
dt <- cbind(`colnames<-`(t(rep(0,X)),letters[seq(X)]),dt)
If you don't care the column names of added columns, you can use just
dt <- cbind(t(rep(0,X)),dt)
which is much shorter

R with/without dplyr: create new columns as combinations of prev. columns per row

Suppose I have a data frame like this:
A B C
1 0 1
0 1 1
1 0 0
I would want to product the following derivative using dplyr (or other lib):
A B C AB AC BC
1 0 1 0 1 0
0 1 1 0 0 1
1 0 0 0 0 0
So, I would want to automatically create new columns in the data frame, where their values will be the products of the initial column set (so in this case 3 products for each row - A*B, A*C and B*C). The gist is to do that automatically (I have 6 columns I can't code all combinations). The names of automatically created columns should have some naming scheme since I will need to filter them later.
We can use combn to get the column combination and then use a for loop to create new columns.
# Create example data frame
dat <- read.table(text = "A B C
1 0 1
0 1 1
1 0 0",
header = TRUE)
# Create the column name combination
m <- combn(names(dat), m = 2)
# Create new columns
for (i in 1:ncol(m)){
dat[paste(m[, i], collapse = "")] <- dat[m[1, i]] * dat[m[2, i]]
}
dat
# A B C AB AC BC
# 1 1 0 1 0 1 0
# 2 0 1 1 0 0 1
# 3 1 0 0 0 0 0
Sometimes it's best to code without thinking too hard:
df <- data.frame(A = c(1, 0, 1),
B = c(0, 1, 0),
C = c(1, 1, 0))
J <- K <- seq_along(df)
J_n <- K_n <- names(df)
for (j in J) {
for (k in K) {
if (j < k) {
j_name <- J_n[j]
k_name <- K_n[k]
df[[paste0(j_name, k_name)]] <- df[[j]] * df[[k]]
}
}
}
This assumes that the new names are not present in the original data frame. So if your original data frame contained columns A, B, and AB, this won't work.

ordering columns in dataframe based on incomplete vector

I have a vector based on col names which looks like
x <- c("C", "A", "T")
my dataframe looks like with rownames and colnames defined.
names A B C D T
Dan 1 0 1 0 1
Joe 0 1 0 1 0
I want to order the dataframe so the columns in the vector appear first followed by columns not in the vector
names C A T B D
Dan 1 1 1 0 0
Joe 0 0 0 1 1
Thanks
The following will rearrange your data to set the columns specified in the vector x at the beginning, and the remaining columns in their original order afterwards.
x <- c("C", "A", "T")
mydata <- mydata[, c(x, setdiff(names(mydata), x))]
If the names column should stay at the first position and is not specified within x, use (Thanks #StevenBeaupré for pointing it out and providing the code):
mydata <- mydata[, c(names(mydata)[1], x, setdiff(names(mydata)[-1], x))]
Small data example:
mydata <- data.frame(names = c("Dan", "Joe"), A = c(1, 0), B = c(0,1),
C = c(1, 0), D = c(0,1), T = c(1, 0))
> mydata
names A B C D T
1 Dan 1 0 1 0 1
2 Joe 0 1 0 1 0
mydata <- mydata[, c(names(mydata)[1], x, setdiff(names(mydata)[-1], x))]
> mydata
names C A T B D
1 Dan 1 1 1 0 0
2 Joe 0 0 0 1 1

Index based assignment with apply in R

I'm cleaning up some survey data in R; assigning variables 1,0 based on the responses to a question. Say I had a question with 3 options; a,b,c; and I had a data frame with the responses and logical variables:
df <- data.frame(a = rep(0,3), b = rep(0,3), c = rep(0,3), response = I(list(c(1),c(1,2),c(2,3))))
So I want to change the 0's to 1's if the response matches the column index (ie 1=a, 2=b, 3=c).
This is fairly easy to do with a loop:
for (i in 1:nrow(df2)) df2[i,df2[i,"response"][[1]]] <- 1
Is there any way to do this with an apply/lapply/sapply/etc? Something like:
df <- sapply(df,function(x) x[x["response"][[1]]] <- 1)
Or should I stick with a loop?
You can use matrix indexing, from ?[:
A third form of indexing is via a numeric matrix with the one column
for each dimension: each row of the index matrix then selects a single
element of the array, and the result is a vector. Negative indices are
not allowed in the index matrix. NA and zero values are allowed: rows
of an index matrix containing a zero are ignored, whereas rows
containing an NA produce an NA in the result.
# construct a matrix representing the index where the value should be one
idx <- with(df, cbind(rep(seq_along(response), lengths(response)), unlist(response)))
idx
# [,1] [,2]
#[1,] 1 1
#[2,] 2 1
#[3,] 2 2
#[4,] 3 2
#[5,] 3 3
# do the assignment
df[idx] <- 1
df
# a b c response
#1 1 0 0 1
#2 1 1 0 1, 2
#3 0 1 1 2, 3
or you can try this .
library(tidyr)
library(dplyr)
df1=df %>%mutate(Id=row_number()) %>%unnest(response)
df[,1:3]=table(df1$Id,df1$response)
a b c response
1 1 0 0 1
2 1 1 0 1, 2
3 0 1 1 2, 3
Perhaps this helps
df[1:3] <- t(sapply(df$response, function(x) as.integer(names(df)[1:3] %in% names(df)[x])))
df
# a b c response
#1 1 0 0 1
#2 1 1 0 1, 2
#3 0 1 1 2, 3
Or a compact option is
library(qdapTools)
df[1:3] <- mtabulate(df$response)

Resources