values changes (avoid 0 1 to 1 2) - r

I want to transform factor to numeric to be able to take the mean of it as.numeric changes the value, numeric doesn't work.
mtcars$vec <- factor(c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
num.cols <- c("vec" )
mtcars[num.cols] <- lapply(mtcars[num.cols], as.numeric)
str(mtcars)
mtcars$vec
expected results should be numeric and consist of only 0 and 1
mtcars$vec
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
many thanks in advance

We need to convert to character and then to numeric because if we directly apply as.numeric, it gets coerced to the integer storage values instead of the actual values which starts from 1. In this case, there is a confusion because the values are binary
mtcars[num.cols] <- lapply(mtcars[num.cols],
function(x) as.numeric(as.character(x)))
mtcars$vec
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Or a faster option is also
mtcars[num.cols] <- lapply(mtcars[num.cols], function(x) as.numeric(levels(x)[x]))
If it is a single column, we can do this more easily
mtcars[[num.cols]] <- as.numeric(levels(mtcars[[num.cols]])[mtcars[[num.cols]]])
As an example
v1 <- factor(c(15, 15, 3, 3))
as.numeric(v1)
#[1] 2 2 1 1
as.numeric(as.character(v1))
#[1] 15 15 3 3

Related

How to create multiple new columns based of off groups of columns that start with a certain prefix and also contain a certain string?

I have data that look like this
df <- data.frame(ID = c(1,2,3,4,5,6),
var1_unmod = c (1,0,0,1,0,1),
var1_me1 = c(0,1,0,0,0,0),
var1_me2 = c(1,1,1,0,1,0),
var1_me3 = c(0,0,1,0,0,0),
var1_ac1 = c(1,0,1,1,0,1),
var2_unmod = c(1,0,1,1,0,0),
var2_me1 = c(0,0,0,0,1,0),
var2_me2 = c(1,1,0,1,1,1),
var2_ac1 = c(1,1,0,1,0,0),
var2_me1ac1 = c(1,0,0,0,0,0),
var2_me2ac1 = c(1,0,0,1,1,1))
ID var1_unmod var1_me1 var1_me2 var1_me3 var1_ac1 var2_unmod var2_me1 var2_me2 var2_ac1 var2_me1ac1 var2_me2ac1
1 1 1 0 1 0 1 1 0 1 1 1 1
2 2 0 1 1 0 0 0 0 1 1 0 0
3 3 0 0 1 1 1 1 0 0 0 0 0
4 4 1 0 0 0 1 1 0 1 1 0 1
5 5 0 0 1 0 0 0 1 1 0 0 1
6 6 1 0 0 0 1 0 0 1 0 0 1
except that in the actual dataset, the prefixes aren't sequential like var1 and var2, they are basically random combinations of letters and numbers, and there are about 30 different ones.
For each of these prefixes (var1, var2, ...), I need to create a single variable that indicates whether any of the columns with that prefix that also contain me1, me2, or me3 (so for var2 this would be var2_me1, var2_me2, var2_me1ac1, var2_me2ac1) are nonzero. The output dataset would have additional columns like this:
ID var1_unmod var1_me1 var1_me2 var1_me3 var1_ac1 var1_meX var2_unmod var2_me1 var2_me2 var2_ac1 var2_me1ac1 var2_me2ac1 var2_meX
1 1 1 0 1 0 1 1 1 0 1 1 1 1 1
2 2 0 1 1 0 0 1 0 0 1 1 0 0 1
3 3 0 0 1 1 1 1 1 0 0 0 0 0 0
4 4 1 0 0 0 1 0 1 0 1 1 0 1 1
5 5 0 0 1 0 0 1 0 1 1 0 0 1 1
6 6 1 0 0 0 1 0 0 0 1 0 0 1 1
First I need to identify the applicable columns for each prefix (because there is no pattern to the prefixes, I'm thinking I will have to hard code at least this part), and then maybe somehow write a loop that iterates through the columns (stored in a vector?) for each prefix. I tend to have trouble referencing varying column names within loops. Any help is appreciated!
Here is a basic approach:
cols <- colnames(df)
varnames <- c("var1", "var2")
df2 <- df
for (i in varnames) {
newname <- paste(i, "meX", sep="_")
df2[, newname] <- apply(df2[, grepl(i, cols) & grepl("me", cols)], 1, sum)
df2[, newname] <- ifelse(df2[, newname] >= 1, 1, 0)
}
This will probably need to be modified based on the specific details of your data.
Define unique group of columns in cols, use lapply to iterate over each unique value and return 1 if there is atleast one 1 in the row in '_me' columns.
all_cols <- names(df)
cols <- c('var1', 'var2')
df[paste0(cols, '_meX')] <- lapply(cols, function(x)
as.integer(rowSums(df[grep(paste0(x, '_me'), all_cols, value = TRUE)]) > 0))
The new columns look like :
df[13:14]
# var1_meX var2_meX
#1 1 1
#2 1 1
#3 1 0
#4 0 1
#5 1 1
#6 0 1

Create a new column based on several conditions

I want to create a new column based on some conditions imposed on several columns. For example, here is an example dataset:
a <- data.frame(x=c(1,0,1,0,0), y=c(0,0,0,0,0), z=c(1,1,0,0,0))
a
x y z
1 1 0 1
2 0 0 1
3 1 0 0
4 0 0 0
5 0 0 0
Specifically, if for any particular row 1 is present, then the new column returns 1. If all are 0, then the new column returns 0. So the dataset with the new column will be
x y z w
1 1 0 1 1
2 0 0 1 1
3 1 0 0 1
4 0 0 0 0
5 0 0 0 0
My initial thought was to use %in% but couldn't get the result I want. Thank you for your help!
If your data frame consists of binary values, e.g., only 0 and 1, you can try the code below with rowSums
a$w <- +(rowSums(a)>0)
such that
> a
x y z w
1 1 0 1 1
2 0 0 1 1
3 1 0 0 1
4 0 0 0 0
5 0 0 0 0
We can use rowMaxs from matrixStats
library(matrixStats)
a$w <- rowMaxs(as.matrix(a))
a$w
#[1] 1 1 1 0 0
You can find max of each row :
a$w <- do.call(pmax, a)
a
# x y z w
#1 1 0 1 1
#2 0 0 1 1
#3 1 0 0 1
#4 0 0 0 0
#5 0 0 0 0
which can also be done with apply :
a$w <- apply(a, 1, max)

Change values of a vector to 0 and 1

From a vector I would like to make some values 0 and some values 1. It doesnt work, why?
a <- c(1,34,5,3,6,67,3,2)
a[c(1,3,5)] <- 0 # works
a[!c(1,3,5)] <- 1 # doesnt work
Should look like
a
[1] 0 1 0 1 0 1 1 1
! is for logical values. Try -
a[-c(1,3,5)] <- 1
a
#[1] 0 1 0 1 0 1 1 1
You can try
> +!!replace(a,c(1,3,5),0)
[1] 0 1 0 1 0 1 1 1
We can create the logical index with %in%
a[!seq_along(a) %in% c(1, 3, 5)] <- 1
a
#[1] 0 1 0 1 0 1 1 1

How to set a loop to assign lots of variables

I just started using R for a psych class, so please go easy on me. I watched a bunch of youtube videos on For loops, but none have answered my question. I have 4 data frames (A, B, C, D), each with 25 columns. I want to combine the nth column from each data frame together, and save them as an object, like so:
Q1 <- cbind(A[1], B[1], C[1], D[1])
Q2 <- cbind(A[2], B[2], C[2], D[2])
How can I set a loop to do this for all 25 so I don’t have to do it manually?
Thanks in advance
Each of my data frames looks like this (with column headings reflecting the letter of the data frame (i.e. B has QB1, QB2, etc.
QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8 QA9 QA10 QA11 QA12 QA13 QA14 QA15
1 1 2 2 0 0 2 0 1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
3 1 0 0 0 0 0 1 0 0 2 1 1 0 0 0
4 1 0 0 0 0 0 1 1 0 1 0 2 0 0 0
In order to do it in a for loop, you need to use assign() from baseR and eval_tidy(), sym() from rlang(). Basically, you will need to evaluate strings as variables.
Create simulation data
library(rlang)
nrows = 10
ncols = 25
df_names <- c("A","B","C","D")
for(df_name in df_names){
# assign value to a string as variable
assign(
df_name,
as.data.frame(
matrix(
data = sample(
c(0,1),
size = nrows * ncols,
replace = TRUE
),
ncol = 25
)
)
)
# rename columns
assign(
df_name,
setNames(eval_tidy(sym(df_name)),paste0("Q",df_name,1:ncols))
)
}
Show A
> head(A)
QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8 QA9 QA10 QA11 QA12 QA13 QA14 QA15 QA16 QA17 QA18 QA19 QA20 QA21 QA22 QA23 QA24 QA25
1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 0 0 1 0 0 1 1 0 1 1
2 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 1 0 1 0 1 1 0
3 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 1 1
4 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 1
5 1 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 1 0 1 0 1 1 0 1 1
6 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 1 0
To answer your question:
This should create 25 variables from Q1 to Q25:
# assign dataframes from Q1 to Q25
for(i in 1:25){
new_df_name <- paste0("Q",i)
# initialize Qi with the same number of rows as A,B,C,D ...
assign(
new_df_name,
data.frame(tmp = matrix(NA,nrow = rows))
)
# loop A,B,C,D ... and bind them
for(df_name in df_names){
assign(
new_df_name,
cbind(
eval_tidy(sym(new_df_name)),
eval_tidy(sym(df_name))[,i,drop = FALSE]
)
)
}
# drop tmp to clean up
assign(
new_df_name,
eval_tidy(sym(new_df_name))[,-1]
)
}
Show result:
> Q25
QA25 QB25 QC25 QD25
1 1 0 1 1
2 0 1 0 0
3 1 1 0 0
4 1 0 1 1
5 1 1 0 0
6 0 1 1 1
7 1 0 0 0
8 0 0 0 1
9 1 1 1 0
10 0 0 1 1
The codes should be much easier if you save results in a list using map(). The major complexity is from assigning values to separate variables.
You can combine some dplyr verbs in a for loop to combine the columns from each data set and assign them to 25 new objects.
# merge data, gather, split by var numbers, assign each df to environment
for (i in 1:25) {
df <- cbind(q1,q2,q3,q4) %>% mutate(id=row_number()) %>%
gather(k,v,-id) %>%
mutate(num=sub('A|B|C|D','',k)) %>%
filter(num==i) %>% select(-num) %>% spread(k,v)
assign(paste0('df',i),df)
}
ls(pattern = 'df')
[1] "df1" "df10" "df11" "df12" "df13" "df14" "df15" "df16" "df17" "df18" "df19" "df2"
[13] "df20" "df21" "df22" "df23" "df24" "df25" "df3" "df4" "df5" "df6" "df7" "df8"
[25] "df9"
Code to create initial 4 toy data frames.
# create four toy data frames
q1 <- data.frame(matrix(runif(100),ncol=25))
q2 <- data.frame(matrix(runif(100),ncol=25))
q3 <- data.frame(matrix(runif(100),ncol=25))
q4 <- data.frame(matrix(runif(100),ncol=25))
# set var names for each toy data
names(q1) <- sub('X','A',names(q1))
names(q2) <- sub('X','B',names(q2))
names(q3) <- sub('X','C',names(q3))
names(q4) <- sub('X','D',names(q4))

Using loop to make column selections using different vectors

Let's say I have 3 vectors (strings of 10):
X <- c(1,1,0,1,0, 1,1, 0, NA,NA)
H <- c(0,0,1,0,NA,1,NA,1, 1, 1 )
I <- c(0,0,0,0,0, 1,NA,NA,NA,1 )
Data.frame Y contains 10 columns and 6 rows:
1 2 3 4 5 6 7 8 9 10
0 1 0 0 1 1 1 0 1 0
1 1 1 0 1 0 1 0 0 0
0 0 0 0 1 0 0 1 0 1
1 0 1 1 0 1 1 1 0 0
0 0 0 0 0 0 1 0 0 0
1 1 0 1 0 0 0 0 1 1
I'd like to use vector X, H en I to make column selections in data.frame Y, using "1's" and "0's" in the vector as selection criterium .
So the results for vector X using the '1' as selection criterium should be:
X <- c(1,1,0,1,0, 1,1, 0, NA,NA)
1 2 4 6 7
0 1 0 1 1
1 1 0 0 1
0 0 0 0 0
1 0 1 1 1
0 0 0 0 1
1 1 1 0 0
For vector H using the '1' as selection criterium:
H <- c(0,0,1,0,NA,1,NA,1, 1, 1 )
3 6 8 9 10
0 1 0 1 0
1 0 0 0 0
0 0 1 0 1
1 1 1 0 0
0 0 0 0 0
0 0 0 1 1
For vector I using the '1' as selection criterium:
I <- c(0,0,0,0,0, 1,NA,NA,NA,1 )
6 10
1 0
0 0
0 1
1 0
0 0
0 1
For convenience and speed I'd like to use a loop. It might be something like this:
all.ones <- lapply[,function(x) x %in% 1]
In the outcome (all.ones), the result for each vector should stay separate. For example:
X 1,2,4,6,7
H 3,6,8,9,10
I 6,10
The standard way of doing this is using the %in% operator:
Y[, X %in% 1]
To do this for multiple vectors (assuming you want an AND operation):
mylist = list(X, H, I, D, E, K)
Y[, Reduce(`&`, lapply(mylist, function(x) x %in% 1))]
The problem is the NA, use which to get round it. Consider the following:
x <- c(1,0,1,NA)
x[x==1]
[1] 1 1 NA
x[which(x==1)]
[1] 1 1
How about this?
idx <- which(X==1)
Y[,idx]
EDIT: For six vectors, do
idx <- which(X==1 & H==1 & I==1 & D==1 & E==1 & K==1)
Y[,idx]
Replace & with | if you want all columns of Y where at least one of the lists has a 1.

Resources