Transforming a table of Nearest Neighbor Distances into a Matrix - r

I have a data frame generated from computing nearest neighbor (K=2) using the RANN package. I would like to transform this data into a matrix with values of 0,1,2 for each cell with 0 = not neighbor, 1=nearest neighbor, 2=2nd nearest neighbor.
The data frame has two columns, the first column is the ID of the 1st NN, the second column is the ID of the 2nd NN. The rows correspond to the ID of the point from which the NN were calculated.
Is there an existing routine to easily to this sort of transformation?
Thanks

Based on the limited idea you have given, here is, I think, an unpretty solution:
NNdf <- data.frame(NN1=c(1,2,4),NN2=c(2,3,1)) # make up some data
NNdf$origin <- rownames(NNdf)
NNdf
# NN1 NN2 origin
#1 1 2 1
#2 2 3 2
#3 4 1 3
library(reshape2)
hold <- melt(NNdf, id = "origin")
hold
# origin variable value
#1 1 NN1 1
#2 2 NN1 2
#3 3 NN1 4
#4 1 NN2 2
#5 2 NN2 3
#6 3 NN2 1
hold2 <- dcast(hold, origin~value, value.var="variable")
hold2[hold2 == "NN1"] <- 1
hold2[hold2 == "NN2"] <- 2
hold2[is.na(hold2) ] <- 0
hold2
# origin 1 2 3 4
#1 1 1 2 0 0
#2 2 0 1 2 0
#3 3 2 0 0 1
(this might rely on apply(hold2,1,as.numeric) afterwards)

Another possibility, but not especially prettier. Thanks to #user1317221_G for the sample data !
NNdf <- data.frame(NN1=c(1,2,4,3),NN2=c(2,3,1,2))
NNdf$origin <- as.numeric(rownames(NNdf))
NNdf
NN1 NN2 origin
1 1 2 1
2 2 3 2
3 4 1 3
4 3 2 4
res <- matrix(0,nrow(NNdf),nrow(NNdf))
res[as.matrix(NNdf[,c("origin","NN1")])] <- 1
res[as.matrix(NNdf[,c("origin","NN2")])] <- 2
res
[,1] [,2] [,3] [,4]
[1,] 1 2 0 0
[2,] 0 1 2 0
[3,] 2 0 0 1
[4,] 0 2 1 0

Related

Add group to data frame using monotonically increasing numbers

I've got a data frame that looks like this (the real data is much larger and more complicated):
df.test = data.frame(
sample = c("a","a","a","a","a","a","b","b"),
day = c(0,1,2,0,1,3,0,2),
value = rnorm(8)
)
sample day value
1 a 0 -1.11182146
2 a 1 0.65679637
3 a 2 0.03652325
4 a 0 -0.95351736
5 a 1 0.16094840
6 a 3 0.06829702
7 b 0 0.33705141
8 b 2 0.24579603
The data frame is organized by experiments but the experiment ids are missed. The same sample can be used in different experiment, but I know that in a single experiment the days start from 0 and are monotonically increasing.
How can I add the experiment ids that can be a numbers {1, 2, ...}?
So the resulted data frame will be
sample day value exp
1 a 0 -1.11182146 1
2 a 1 0.65679637 1
3 a 2 0.03652325 1
4 a 0 -0.95351736 2
5 a 1 0.16094840 2
6 a 3 0.06829702 2
7 b 0 0.33705141 3
8 b 2 0.24579603 3
I would appreciate any help, especially with a tidy/dplyr solution.
As indicated in the comments, you can do this with cumsum:
df.test %>% mutate(exp = cumsum(day == 0))
## sample day value exp
## 1 a 0 0.09300394 1
## 2 a 1 0.85322925 1
## 3 a 2 -0.25167313 1
## 4 a 0 -0.14811243 2
## 5 a 1 -1.86789014 2
## 6 a 3 0.45983987 2
## 7 b 0 2.81199150 3
## 8 b 2 0.31951634 3
You can use diff :
library(dplyr)
df.test %>% mutate(exp = cumsum(c(TRUE, diff(day) < 0)))
# sample day value exp
#1 a 0 -0.3382010 1
#2 a 1 2.2241041 1
#3 a 2 2.2202612 1
#4 a 0 1.0359635 2
#5 a 1 0.4134727 2
#6 a 3 1.0144439 2
#7 b 0 -0.1292119 3
#8 b 2 -0.1191505 3

Add dynamic subset conditions as variables into the data.frame

I have a data frame like
> x = data.frame(A=c(1,2,3),B=c(2,3,4))
> x
A B
1 1 2
2 2 3
3 3 4
and subsetting conditions in a data frame like
> cond = data.frame(condition=c('A>1','B>2 & B<4'))
> cond
condition
1 A>1
2 B>2 & B<4
which I then apply dynamically
> eval(parse(text=paste0("subset(x,",cond[1,'condition'],")")))
A B
2 2 3
3 3 4
> eval(parse(text=paste0("subset(x,",cond[2,'condition'],")")))
A B
2 2 3
Now, instead of subsetting, I would like to add the subsetting conditions as variables into the data. The end result would look like
A B condition1 condition2
1 1 2 0 0
2 2 3 1 1
3 3 4 1 0
How could I derive the above table using the dynamic conditions?
Before using eval parse, I hope you have gone through some readings like
What specifically are the dangers of eval(parse(…))?
and many others which are available.
However, to answer your question, we can continue your flow and use eval parse in sapply
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
To add it to the dataframe,
x[paste0("condition", 1:nrow(cond))] <-
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
x
# A B condition1 condition2
#1 1 2 0 0
#2 2 3 1 1
#3 3 4 1 0
Simplifying it a bit (using #jogo's comment)
+(sapply(cond$condition, function(i) with(x, eval(parse(text=as.character(i))))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
Here is an option using tidyverse
library(tidyverse)
x %>%
mutate(!!! rlang::parse_exprs(str_c(cond$condition, collapse=";"))) %>%
rename_at(3:4, ~ paste0("condition", 1:2))
# A B condition1 condition2
#1 1 2 FALSE FALSE
#2 2 3 TRUE TRUE
#3 3 4 TRUE FALSE
If needed, the logical columns can be easily converted to binary with as.integer

detect missings (NA or 0) in data frame

i want to create a new variable in a data frame that contains information about the other variables.
I have got a large data frame. To keep it short let's say:
a <- c(1,0,2,3)
b <- c(3,0,1,1)
c <- c(2,0,2,2)
d <- c(4,1,1,1)
(df <- data.frame(a,b,c,d) )
a b c d
1 1 3 2 4
2 0 0 0 1
3 2 1 2 1
4 3 1 2 1
Aim: Create a new variable that informs me if one person (row) has cero reports (or missings / NA) either in the variables a+b or in the variables c+d.
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1
As i have a large data frame i was thinking about the use of df[1:2] and df[3:4] so that i do not need to type every variable name. But i am not sure which is the best way to implement it. Maybe dplyr has a nice option?
df$x <- ifelse(rowSums(df), 1, NA)
EDIT: Answer to the updated question:
df$x <- ifelse(rowSums(df[1:2])&rowSums(df[3:4]), 1, NA)
gives,
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

Subsequent row summing in dataframe object

I would like to do subsequent row summing of a columnvalue and put the result into a new columnvariable without deleting any row by another columnvalue .
Below is some R-code and an example that does the trick and hopefully illustrates my question. I was wondering if there is a more elegant way to do since the for loop will be time consuming in my actual object.
Thanks for any feedback.
As an example dataframe:
MyDf <- data.frame(ID = c(1,1,1,2,2,2), Y = 1:6)
MyDf$FIRST <- c(1,0,0,1,0,0)
MyDf.2 <- MyDf
MyDf.2$Y2 <- c(1,3,6,4,9,15)
The purpose of this is so that I can write code that calculates Y2 in MyDf.2 above for each ID, separately.
This is what I came up with and, it does the trick. (Calculating a TEST column in MyDf that has to be equal to Y2 cin MyDf.2)
MyDf$TEST <- NA
for(i in 1:length(MyDf$Y)){
MyDf[i,]$TEST <- ifelse(MyDf[i,]$FIRST == 1, MyDf[i,]$Y,MyDf[i,]$Y + MyDf[i-1,]$TEST)
}
MyDf
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
MyDf.2
ID Y FIRST Y2
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
You need ave and cumsum to get the column you want. transform is just to modify your existing data.frame.
> MyDf <- transform(MyDf, TEST=ave(Y, ID, FUN=cumsum))
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15

Resources