I have a list of vectors and an another vector. I would like the arrange the list of vectors according to values of the other vector
a <- c(1, 2)
b <- c(1, 4)
c <- c(1, 1)
x <- list(a, b, c) # list of vector
v <- c(3, 2, 5) # other vector
Here I want arrange x according to v. So the desired output will be:
2 b
3 a
5 c
Here is an option with stack and arrange
library(dplyr)
v %>%
set_names(letters[1:3]) %>%
stack %>%
arrange(values)
# values ind
#1 2 b
#2 3 a
#3 5 c
First order list x based on order of vector v and then bind vector with take names of ordered list to form related column.
It will something like:
cbind(as.data.frame(v), col = names(x))[order(v),]
# v col
#2 2 b
#1 3 a
#3 5 c
Data:
a <- c(1, 2)
b <- c(1, 4)
c <- c(1, 1)
x <- list(a=a, b=b, c=c) # list of vector
v <- c(3, 2, 5) # other vector
Related
I am trying to create a new variable for each observation using the following formula:
Index = ∑(BAj / DISTANCEij)
where:
j = focal observation; i= other observation
Basically, I'm taking the focal individual (i) and finding the euclidean distance between it and another point and dividing the other points BA by that distance. Do that for all the other points and then sum them all and repeat all of this for each point.
Here is some sample data:
ID <- 1:4
BA <- c(3, 5, 6, 9)
x <- c(0, 2, 3, 7)
y <- c(1, 3, 4, 9)
df <- data.frame(ID, BA, x, y)
print(df)
ID BA x y
1 1 3 0 1
2 2 5 2 3
3 3 6 3 4
4 4 9 7 9
Currently, I've extracted out vectors and created a formula to calculate part of the formula shown here:
vec1 <- df[1, ]
vec2 <- df[2, ]
dist <- function(vec1, vec2) vec1$BA/sqrt((vec2$x - vec1$x)^2 +
(vec2$y - vec1$y)^2)
My question is how do I repeat this with the x and y values for vec2 changing for each new other point with vec1 remaining the same and then sum them all together?
We may loop over the row sequence, extract the data and apply the dist function
library(dplyr)
library(purrr)
df %>%
mutate(dist_out = map_dbl(row_number(), ~ {
othr <- cur_data()[-.x,]
cur <- cur_data()[.x, ]
sum(dist(cur, othr))
}))
-output
ID BA x y dist_out
1 1 3 0 1 2.049983
2 2 5 2 3 5.943485
3 3 6 3 4 6.593897
4 4 9 7 9 3.404545
Here are two base R ways.
1. for loop
ID <- 1:4
BA <- c(3, 5, 6, 9)
x <- c(0, 2, 3, 7)
y <- c(1, 3, 4, 9)
df <- data.frame(ID, BA, x, y)
n <- nrow(df)
d <- dist(df[c("x", "y")], upper = TRUE)
d <- as.matrix(d)
Index <- numeric(n)
for(j in seq_len(n)) {
d_j <- d[-j, j, drop = TRUE]
Index[j] <- sum(df$BA[j]/d_j)
}
Index
#> [1] 2.049983 5.943485 6.593897 3.404545
Created on 2022-08-18 by the reprex package (v2.0.1)
2. sapply loop
Index <- sapply(seq_len(n), \(j) sum(df$BA[j]/d[-j, j, drop = TRUE]))
Index
#> [1] 2.049983 5.943485 6.593897 3.404545
Created on 2022-08-18 by the reprex package (v2.0.1)
I want to match vector 1 to vector 2 to see if items in vector 1 and found in vector 2. Then I want to create 2 new vectors - a subset of vector 1 of the rows of values contained both vectors, and a subset of vector 1 for the values not found in both vectors. match() function followed by which(is.na()) works great for small data sets, but I have a data set with 1000 elements.
Data1 <- c(1, 2, 3, 4, 5)
Data2 <- c(1, 3, 5, 6, 7)
#Match vector1 to vector2
A <- match(Data1, Data2)
[1] 1 NA 2 NA 3
#to obtain positions of non matching elements
x <- which(is.na(A), arr.ind = TRUE)
[1] 2 4
Data1[c(2,4)]
#to obtain positions of matching elements
y < which(A >= 1)
[1] 1 3 5
Data1[c(1,3,5)]
Try this so you do not have to deal with the NAs from match():
Data1 <- c(1, 2, 3, 4, 5)
Data2 <- c(1, 3, 5, 6, 7)
# Values of Data1 in Data2
A <- Data1[Data1 %in% Data2]
A
# output:
# > A
# [1] 1 3 5
# create not in function
'%ni%' <- Negate('%in%')
# Values of Data1 not in Data2
B <- Data1[Data1 %ni% Data2]
B
# output:
# > B
# [1] 2 4
I want to combine multiple sets of two data frames (a & a_1, b & b_1, etc.). Basically, I want to do what this question is asking. I created a list of my two data sets:
# create data
a <- c(1, 2, 3)
b <- c(2, 3, 4)
at0H0 <- data.frame(a, b)
c <- c(1, 2, 3)
d <- c(2, 3, 4)
at0H0_1 <- data.frame(c, d)
e <- c(1, 2, 3)
f <- c(2, 3, 4)
at0H1 <- data.frame(a, b)
g <- c(1, 2, 3)
h <- c(2, 3, 4)
at0H1_1 <- data.frame(c, d)
# create lists of names
names <- list("at0H0", "at0H1")
namesLPC <- list("at0H0_1", "at0H1_1")
# column bind the data frames?
dfList <- list(cbind(names, namesLPC))
do.call(cbind, dfList)
But now I need it to create data frames for each. This do.call function just creates a list of the names of the data frames. Thanks!
(Edited to make reproducible code)
It's not super straight-forward, but with a little editing to a joining function you can get there:
joinfun <- function(x) do.call(cbind, unname(mget(x,inherits=TRUE)))
lapply(Map(c, names, namesLPC), joinfun)
#[[1]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
#
#[[2]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
The Map function pairs up the dataset names as required:
Map(c, names, namesLPC)
#[[1]]
#[1] "at0H0" "at0H0_1"
#
#[[2]]
#[1] "at0H1" "at0H1_1"
The lapply then loops over each part of the above list to mget (multiple-get) each object into a combined list. Like so, for the first part:
unname(mget(c("at0H0","at0H0_1"),inherits=TRUE))
#[[1]]
# a b
#1 1 2
#2 2 3
#3 3 4
#
#[[2]]
# c d
#1 1 2
#2 2 3
#3 3 4
Finally, do.call(cbind, ...) puts this combined list back into a single data.frame:
do.call(cbind, unname(mget(c("at0H0","at0H0_1"),inherits=TRUE)))
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
I've figured out a way to do it. A few notes: I have 360 data sets that I need to combine, which is why it is i in 1:360. This also names the data sets from an array of the names of the data sets (which is dataNames)
for (i in 1:360){
assign(paste(dataNames[i], sep = ""), cbind(names[[i]], namesLPC[[i]]))
}
I have an integer based dataframe with positional coordinates in one column and a variable in the second. The coordinates range from 1-10 million, the variables from 0-950 - I'm interested in returning the sum of the variables from ranges defined within a separate frame containing the start and end points of the desired range.
To make things a bit easier to compute I've shortened the example:
Data:
a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)
c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)
df1:
1, 0
2, 0
3, 1
4, 0
5, 2
df2:
1, 3
1, 4
2, 3
2, 5
3, 4
magic
output:
1,
1,
1,
3,
1,
Where magic is pulling the start and end positions in df2 columns 1 and 2 to pass to rowSums for df1 extraction.
Edit: #Frank's data.table solution: short and fast.
df2[, s := df1[df2, on=.(a >= c, a <= d), sum(b), by=.EACHI]$V1]
# output
c d s
1: 1 3 1
2: 1 4 1
3: 2 3 1
4: 2 5 3
5: 3 4 1
Another way (may be slower but works):
library(data.table)
setDT(df1)
setDT(df2)
## magic function
get_magic <- function(x)
{
spell <- c()
one <- unlist(x[1])
two <- unlist(x[2])
a <- df1[between(a, one, two), sum(b)]
spell <- append(spell, a)
return(spell)
}
# applies to row
d <- apply(df2, 1, get_magic)
print(d)
# output
[1] 1 1 1 3 1
One possible solution is by using mapply. I have used a custom function but one can write an inline function as part of mapply statement.
mapply(row_sum, df2$c, df2$d)
row_sum <- function(x, y){
sum(df1[x:y,2])
}
#Result
#[1] 1 1 1 3 1
Data
a = seq(1,5)
b = c(0,0,1,0,2)
df1 <- data.frame(a, b)
c = c(1,1,2,2,3)
d = c(3,4,3,5,4)
df2 <- data.frame(c,d)
I have a list of lists and I want to convert it into a dataframe. The challenge is that there are missing variables names in lists (not NA's but the variable is missing completely).
To illustrate on example: from
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
I would like to get
a b c
[1,] 1 2 3
[2,] 4 NA 6
Another option is
library(reshape2)
as.data.frame(acast(melt(my_list), L1~L2, value.var='value'))
# a b c
#1 1 2 3
#2 4 NA 6
Or as #David Arenburg suggested a wrapper for melt/dcast would be recast
recast(my_list, L1 ~ L2, value.var = 'value')[, -1]
# a b c
#1 1 2 3
#2 4 NA 6
You can use the bind_rows function from the dplyr package :
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
dplyr::bind_rows(lapply(my_list, as.data.frame))
This outputs:
Source: local data frame [2 x 3]
a b c
1 1 2 3
2 4 NA 6
Another answer, this requires to change the class of the arguments to data.frames:
library(plyr)
lista <- list(a=1, b=2, c =3)
listb <- list(a=4, c=6)
lista <- as.data.frame(lista)
listb <- as.data.frame(listb)
my_list <- list(lista, listb)
my_list <- do.call(rbind.fill, my_list)
my_list
a b c
1 1 2 3
2 4 NA 6