I have a vector of objects (object) along with a corresponding vector of time frames (tframe) in which the objects were observed. For each unique pair of objects, I want to calculate the number of time frames in which both objects were observed.
I can write the code using for() loops, but it takes a long time to run as the number of unique objects increases. How might I change the code to speed up the run time?
Below is an example with 4 unique objects (in reality I have about 300). For example, objects a and c were both observed in time frames 1 and 2, so they get a count of 2. Objects b and d were never observed in the same time frame, so they get a count of 0.
object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)
uo <- unique(object)
n <- length(uo)
mpairs <- matrix(NA, nrow=n*(n-1)/2, ncol=3, dimnames=list(NULL,
c("obj1", "obj2", "sametf")))
row <- 0
for(i in 1:(n-1)) {
for(j in (i+1):n) {
row <- row+1
mpairs[row, "obj1"] <- uo[i]
mpairs[row, "obj2"] <- uo[j]
# no. of time frames in which both objects in a pair were observed
intwin <- intersect(tframe[object==uo[i]], tframe[object==uo[j]])
mpairs[row, "sametf"] <- length(intwin)
}}
data.frame(object, tframe)
object tframe
1 a 1
2 a 1
3 a 2
4 b 2
5 b 3
6 c 1
7 c 2
8 c 2
9 c 3
10 d 1
mpairs
obj1 obj2 sametf
[1,] "a" "b" "1"
[2,] "a" "c" "2"
[3,] "a" "d" "1"
[4,] "b" "c" "2"
[5,] "b" "d" "0"
[6,] "c" "d" "1"
You can use crossproduct to get the counts of agreement. You can then reshape the
data, if required.
Example
object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)
# This will give you the counts
# Use code from Jean's comment
tab <- tcrossprod(table(object, tframe)>0)
# Reshape the data
tab[lower.tri(tab, TRUE)] <- NA
reshape2::melt(tab, na.rm=TRUE)
Related
I have data in a form like this (reproducible code below):
#> y x char
#> 1 1 1 a
#> 2 1 2 b
#> 3 1 3 c
#> 4 2 1 d
#> 5 2 2 e
#> 6 2 3 f
#> 7 3 1 g
#> 8 3 2 h
#> 9 3 3 i
df <- data.frame(stringsAsFactors=FALSE,
y = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
x = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
char = c("a", "b", "c", "d", "e", "f", "g", "h", "i")
)
df
Is there an easy way to print to the screen using y values as the y axis, and x values as the x axis and the third column (char) as the values? A solution with map() would be great.
So my desired output would look like
abc
def
ghi
I started trying to loop through y and x, with a view to using purrr::map(), but I haven't gotten very far.
if (df$y==1 & df$x==1){
print(df$char)
}
That's what tidyr::spread() is for:
spread(df, x, char)
You can also convert your data.frame into a matrix:
a <- matrix(
data = df$char,
nrow = length(unique(df$x)),
ncol = length(unique(df$y)),
"dimnames" = list(unique(df$y), unique(df$x)),
byrow = TRUE
)
it will be:
1 2 3
1 "a" "b" "c"
2 "d" "e" "f"
3 "g" "h" "i"
To concatenate the strings into a column as you wish:
for (r in 1:nrow(a)) {
print(paste(a[r, ], collapse = ''))
}
[1] "abc"
[1] "def"
[1] "ghi"
How do I pull out unique values from each column in a data frame (both numeric and strings) and make into one column?
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
df <- cbind(a, b)
The preferred output would be:
variable Level
a a
a b
a c
a d
b 1
b 2
b 3
b 4
The sample data above is simple but the intent is to be able to use the answer for multiple data frame with different column names and data in them. Thank you.
Quick + scalable
Tidyr's gather and dplyr's distinct gives you a quick way to get that structure. (I left the package calls in the functions so you can remember which one is from which package, which I always forget.)
library(tidyverse)
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
data.frame(a,b) %>% tidyr::gather() %>% dplyr::distinct()
key value
1 a a
2 a b
3 a c
4 a d
5 b 1
6 b 2
7 b 3
8 b 4
We place it in a list, get the unique elements, set the names with letters and then stack to data.frame
d1 <- stack(setNames(lapply(list(a, b), unique), letters[1:2]))[2:1]
colnames(d1) <- c('variable', 'Level')
df data.frame creation:
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
df <- cbind(a, b)
Columns name extraction
names<-colnames(df)
Data Extration
variable<-NULL
Level<-NULL
for(i in 1:length(names))
{
variable<-c(variable,rep(names[i],length(unique(df[,i]))))
Level<-c(Level,unique(df[,i]))
}
Your generic output
db<-cbind(variable,Level)
db
variable Level
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "a" "d"
[5,] "b" "1"
[6,] "b" "2"
[7,] "b" "3"
[8,] "b" "4"
I have a matrix with last column contains characters:
A
B
B
A
...
I would like to replace A with 1 and B with 2 in R. The expected result should be:
1
2
2
1
...
If you are 100% confident only "A" and "B" appear
sample_data = c("A", "B", "B", "A")
sample_data
# [1] "A" "B" "B" "A"
as.numeric(gsub("A", 1, gsub("B", 2, sample_data)))
# [1] 1 2 2 1
Using factor or a simple lookup table would be much more flexible:
sample_data = c("A", "B", "B", "A")
Recommended:
as.numeric(factor(sample_data))
# [1] 1 2 2 1
Possible alternative:
as.numeric(c("A" = "1", "B" = "2")[sample_data])
# [1] 1 2 2 1
I'm asking to how to merge two lists in parallel, not orderly append as below codes.
For example,
A <- list(c(1,2,3), c(3,4,5), c(6,7,8))
B <- list(c("a", "b", "c"), c("d", "e", "f"), c("g", "h", "i"))
As results,
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] "a" "b" "c"
[[2]]
[[2]][[1]]
[1] 3 4 5
[[2]][[2]]
[1] "d" "e" "f"
[[3]]
[[3]][[1]]
[1] 6 7 8
[[3]][[2]]
[1] "g" "h" "i"
Using Map simply:
Map(list,A,B)
A longer approach (not recursive yet, up to second level merging):
A <- list(c(1,2,3), c(3,4,5), c(6,7,8))
B <- list(c("a", "b", "c"), c("d", "e", "f"), c("g", "h", "i"))
mergepar <- function(x = A, y = B) { # merge two lists in parallel
ln <- max(length(x), length(y)) # max length
newlist <- as.list(rep(NA, ln)) # empty list of max length
for (i in 1:ln) { # for1, across length
# two level subsetting (first with [ and then [[, so no subscript out of bound error) and lapply
newlist[[i]] <- lapply(list(A, B), function(x) "[["("["(x, i), 1))
}
return(newlist)
}
I have the data.frame
df<-data.frame("Site.1" = c("A", "B", "C"),
"Site.2" = c("D", "B", "B"),
"Tsim" = c(2, 4, 7),
"Jaccard" = c(5, 7, 1))
# Site.1 Site.2 Tsim Jaccard
# 1 A D 2 5
# 2 B B 4 7
# 3 C B 7 1
I can get the unique levels for each column using
top.x<-unique(df[1:2,c("Site.1")])
top.x
# [1] A B
# Levels: A B C
top.y<-unique(df[1:2,c("Site.2")])
top.y
# [1] D B
# Levels: B D
How do I get the unique levels for both columns and turn them into a vector i.e:
v <- c("A", "B", "D")
v
# [1] "A" "B" "D"
top.xy <- unique(unlist(df[1:2,]))
top.xy
[1] A B D
Levels: A B C D
Try union:
union(top.x, top.y)
# [1] "A" "B" "D"
union(unique(df[1:2, c("Site.1")]),
unique(df[1:2, c("Site.2")]))
# [1] "A" "B" "D"
You can get the unique levels for the firs two collumns:
de<- apply(df[,1:2],2,unique)
de
# $Site.1
# [1] "A" "B" "C"
# $Site.2
# [1] "D" "B"
Then you can take the symmetric difference of the two sets:
union(setdiff(de$Site.1,de$Site.2), setdiff(de$Site.2,de$Site.1))
# [1] "A" "C" "D"
If you're intrested in just two first two rows (as in your example):
de<- apply(df[1:2,1:2],2,unique)
de
# Site.1 Site.2
# [1,] "A" "D"
# [2,] "B" "B"
union(de[,1],de[,2])
# [1] "A" "B" "D"