Add dynamic subset conditions as variables into the data.frame - r

I have a data frame like
> x = data.frame(A=c(1,2,3),B=c(2,3,4))
> x
A B
1 1 2
2 2 3
3 3 4
and subsetting conditions in a data frame like
> cond = data.frame(condition=c('A>1','B>2 & B<4'))
> cond
condition
1 A>1
2 B>2 & B<4
which I then apply dynamically
> eval(parse(text=paste0("subset(x,",cond[1,'condition'],")")))
A B
2 2 3
3 3 4
> eval(parse(text=paste0("subset(x,",cond[2,'condition'],")")))
A B
2 2 3
Now, instead of subsetting, I would like to add the subsetting conditions as variables into the data. The end result would look like
A B condition1 condition2
1 1 2 0 0
2 2 3 1 1
3 3 4 1 0
How could I derive the above table using the dynamic conditions?

Before using eval parse, I hope you have gone through some readings like
What specifically are the dangers of eval(parse(…))?
and many others which are available.
However, to answer your question, we can continue your flow and use eval parse in sapply
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
To add it to the dataframe,
x[paste0("condition", 1:nrow(cond))] <-
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
x
# A B condition1 condition2
#1 1 2 0 0
#2 2 3 1 1
#3 3 4 1 0
Simplifying it a bit (using #jogo's comment)
+(sapply(cond$condition, function(i) with(x, eval(parse(text=as.character(i))))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0

Here is an option using tidyverse
library(tidyverse)
x %>%
mutate(!!! rlang::parse_exprs(str_c(cond$condition, collapse=";"))) %>%
rename_at(3:4, ~ paste0("condition", 1:2))
# A B condition1 condition2
#1 1 2 FALSE FALSE
#2 2 3 TRUE TRUE
#3 3 4 TRUE FALSE
If needed, the logical columns can be easily converted to binary with as.integer

Related

tapply with categorical variable

I am trying to use tapply() for some descriptive analysis, with the mtcars dataset in R.
So the problem is:
> table(mtcars$carb)
1 2 3 4 6 8
7 10 3 10 1 1
> tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x){length(x)})
0 1
0 12 6
1 7 7
The above line worked, but the line below didnt:
> tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x){table(x)})
0 1
0 Integer,3 Integer,4
1 Integer,3 Integer,2
By using tapply on mtcars$carb, I expect to get the table for each of the four combinations from vs and am. Any idea what went wrong? Thank you very much.
The calculation is already done by tapply but it is not available in easy to view form. You can wrap the output of table in list.
tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x) list(table(x)))
#[[1]]
#x
#2 3 4
#4 3 5
#[[2]]
#x
#1 2 4
#3 2 2
#[[3]]
#x
#2 4 6 8
#1 3 1 1
#[[4]]
#x
#1 2
#4 3
Or using lapply :
temp <- tapply(mtcars$carb,list(mtcars$vs,mtcars$am),table)
lapply(temp, I)
We can do this with fable
ftable(mtcars[c('carb', 'vs', 'am')])

R How to reshape matrix using value in a column

I have matrix like this:
ID Count
1 2
2 3
3 2
I want to create a matrix in which the number of rows for an ID equals the value of Count while adding a new column containing the index for each row within the ID value. For the matrix above, the result should be:
ID Index
1 1
1 2
2 1
2 2
2 3
3 1
3 2
For a simple case you can just use rep and sequence.
ID=c(1,2,3)
Count=c(2,3,2)
cbind(ID=rep(ID, Count), Index=sequence(Count))
# ID Index
#[1,] 1 1
#[2,] 1 2
#[3,] 2 1
#[4,] 2 2
#[5,] 2 3
#[6,] 3 1
#[7,] 3 2
Using tidyverse
library(tidyverse)
df1 <- df %>%
group_by(ID) %>%
nest() %>%
mutate(data=map(data,~seq_along(1:.x$Count))) %>%
unnest(data)
Output
ID data
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

Add column in R based on comparison with another column

I have a beginner R question.
I want to add a column "d" that has a value of 1 if the corresponding row in "c" is >4, and 0 otherwise. I think that if I can do this basic thing I can extend the logic to my other questions. Basically, I can't figure out how to do basic comparisons between entries in a given row.
Here is a sample set of code:
# initial data
a=c(0,1,1)
b=c(1,2,3)
c=c(4,5,6)
data=data.frame(a,b,c)
Any help would be appreciated. Thanks!
One way:
> data
a b c
1 0 1 4
2 1 2 5
3 1 3 6
> data$d=ifelse(data$c>4,1,0)
> data
a b c d
1 0 1 4 0
2 1 2 5 1
3 1 3 6 1
Another common way is to rely on the fact that TRUE/FALSE convert to 1/0 when converted to numeric:
> data$d2=as.numeric(data$c>4)
> data
a b c d d2
1 0 1 4 0 0
2 1 2 5 1 1
3 1 3 6 1 1

Transforming a table of Nearest Neighbor Distances into a Matrix

I have a data frame generated from computing nearest neighbor (K=2) using the RANN package. I would like to transform this data into a matrix with values of 0,1,2 for each cell with 0 = not neighbor, 1=nearest neighbor, 2=2nd nearest neighbor.
The data frame has two columns, the first column is the ID of the 1st NN, the second column is the ID of the 2nd NN. The rows correspond to the ID of the point from which the NN were calculated.
Is there an existing routine to easily to this sort of transformation?
Thanks
Based on the limited idea you have given, here is, I think, an unpretty solution:
NNdf <- data.frame(NN1=c(1,2,4),NN2=c(2,3,1)) # make up some data
NNdf$origin <- rownames(NNdf)
NNdf
# NN1 NN2 origin
#1 1 2 1
#2 2 3 2
#3 4 1 3
library(reshape2)
hold <- melt(NNdf, id = "origin")
hold
# origin variable value
#1 1 NN1 1
#2 2 NN1 2
#3 3 NN1 4
#4 1 NN2 2
#5 2 NN2 3
#6 3 NN2 1
hold2 <- dcast(hold, origin~value, value.var="variable")
hold2[hold2 == "NN1"] <- 1
hold2[hold2 == "NN2"] <- 2
hold2[is.na(hold2) ] <- 0
hold2
# origin 1 2 3 4
#1 1 1 2 0 0
#2 2 0 1 2 0
#3 3 2 0 0 1
(this might rely on apply(hold2,1,as.numeric) afterwards)
Another possibility, but not especially prettier. Thanks to #user1317221_G for the sample data !
NNdf <- data.frame(NN1=c(1,2,4,3),NN2=c(2,3,1,2))
NNdf$origin <- as.numeric(rownames(NNdf))
NNdf
NN1 NN2 origin
1 1 2 1
2 2 3 2
3 4 1 3
4 3 2 4
res <- matrix(0,nrow(NNdf),nrow(NNdf))
res[as.matrix(NNdf[,c("origin","NN1")])] <- 1
res[as.matrix(NNdf[,c("origin","NN2")])] <- 2
res
[,1] [,2] [,3] [,4]
[1,] 1 2 0 0
[2,] 0 1 2 0
[3,] 2 0 0 1
[4,] 0 2 1 0

Resources