Finding nearest number between two lists - r

I have a list of dataframes (df1) and another list of dataframes (df2) which hold values required to find the 'nearest value' in the first list.
df1<-list(d1=data.frame(y=1:10), d2=data.frame(y=3:20))
df2<-list(d3=data.frame(y=2),d4=data.frame(y=4))
Say I have this function:
df1[[1]]$y[which(abs(df1[[1]]$y-df2[[1]])== min(abs(df1[[1]]$y-df2[[1]])))]
This function works perfectly in finding the closest value of df2 value 1 in df1. What I can't achieve is getting to work with lapply as in something like:
lapply(df1, function(x){
f<-x$y[which(abs(x$y-df2) == min(abs(x$y - df2)))]
})
I would like to return a dataframe with all f values which show the nearest number for each item in df1.
Thanks,
M

I assume you're trying to compare the first data.frames in df1 and df2 to each other, and the second data.frames in df1 and df2 to each other. It would also be useful to use the which.min function (check out help(which.min)).
edit
In response to your comment, you could use mapply instead:
> mapply(function(x,z) x$y[which.min(abs(x$y - z$y))], df1, df2)
d1 d2
2 4

The OP's real problem is unclear, but I would probably do...
library(data.table)
DT1 = rbindlist(unname(df1), idcol=TRUE)
DT2 = rbindlist(unname(df2), idcol=TRUE)
DT1[DT2, on=c(".id","y"), roll="nearest"]
# .id y
# 1: 1 2
# 2: 2 4

Related

How to replace several variables with several variables from another dataframe in R using a loop?

I would like to replace multiple variables with variables from a second dataframe in R.
df1$var1 <- df2$var1
df1$var2 <- df2$var2
# and so on ...
As you can see the variable names are the same in both dataframes, however, numeric values are slightly different whereas the correct version is in df2 but needs to be in df1. I need to do this for many, many variables in a complex data set and wonder whether someone could help with a more efficient way to code this (possibly without using column references).
Here some example data:
# dataframe 1
var1 <- c(1:10)
var2 <- c(1:10)
df1 <- data.frame(var1,var2)
# dataframe 2
var1 <- c(11:20)
var2 <- c(11:20)
df2 <- data.frame(var1,var2)
# assigning correct values
df1$var1 <- df2$var1
df1$var2 <- df2$var2
As Parfait has said, the current post seems a bit too simplified to give any immediate help but I will try and summarize what you may need for something like this to work.
If the assumption is that df1 and df2 have the same number of rows AND that their orders are already matching, then you can achieve this really easily by the following subset notation:
df1[,c({column names df1}), drop = FALSE] <- df2[, c({column names df2}), drop = FALSE]
Lets say that df1 has columns a, b, and c and you want to replace b and c with two columns of df1 whose columns are x, y, z.
df1[,c("b","c"), drop = FALSE] <- df2[, c("y", "z"), drop = FALSE]
Here we are replacing b with y and c with z. The drop argument is just for added protection against subsetting a data.frame to ensure you don't get a vector.
If you do NOT know the order is correct or one data frame may have a differing size than the other BUT there is a unique identifier between the two data.frames - then I would personally use a function that is designed for merging two data frames. Depending on your preference you can use merge from base or use *_join functions from the dplyr package (my preference).
library(dplyr)
#assuming a and x are unique identifiers that can be matched.
new_df <- left_join(df1, df2, by = c("a"="x"))

Problems when trying to join multiple dataframes in R

I have three data frames: df1, df2, df3 with the same number of columns and rows, in the same order.Their column names are exactly the same except for the last three columns (42:43) which are specific to each df (e.g.: col41df1, cold42df1, col43df1...col41df2, col42df2, col43df2...col41df3, col42df3, col43df3...).
I wanted to join the three data frames so that the columns that are specific to each would be appended at the end and I would end up with a data frame with 49 columns, rather than 43.I managed that with:
df_merged <- df1 %>%
left_join(df2)%>%
left_join(df3)
However, something goes wrong during the join because df_merged appears to have 6 NA values while none of the original data frames I joined had any.
Help please?
Thanks!
Since the rows are in the same order across all 3 dataframes, there's no need to use a join. Instead, simply grab the 3 columns you want from the second and third dataframes and attach them to the first, as such:
df_merged <- cbind(df1, df2[, c(41:43)], df3[, c(42:43)])
Here is an example:
df1 <- data.frame(id=c(1,2,3), value=c(5,10,25))
df2 <- data.frame(id=c(1,2,3), value=c(3,6,9), morevalues=c(4,5,9))
library(dplyr)
merged_df <- data.frame(df1, df2[,c(2:3)])
merged_df

merge list of lists in R

I have a list of lists, where some lists are NULL (contain nothing), and some lists contains 12 columns and 1 row. lets say this list of lists is named: pages.
I would like to merge the lists that contain the 12 columns and 1 row into a dataframe. so that I have a final dataframe of 12 columns and x rows.
I first tried:
final_df <- Reduce(function(x,y) merge(x, y, all=TRUE), pages)
which yielded a dataframe with the right 12 columns, but no rows, so it was empty.
I then tried:
listofvectors <- list()
for (i in 1:length(pages)) {listofvectors <- c(listofvectors, pages[[i]])}
which just pasted every list below each other.
I finally tried playing with:
final<-do.call(c, unlist(pages, recursive=FALSE))
which only resulted in a very long value.
What am I missing? Who can help me out? Thanks a lot for your input.
The merge function is for joining data on common column values (commonly called a join). You need to use rbind instead (the r for row, use cbind to stick columns together).
do.call(rbind, pages) # equivalent to rbind(pages[[1]], pages[[2]], ...)
do.call(rbind, pages[lengths(pages) > 0]) # removing the 0-length elements
If you have additional issues, please provide a reproducible example in your question. This code works on this example:
x = list(data.frame(x = 1), NULL, data.frame(x = 2))
do.call(rbind, x)
# x
# 1 1
# 2 2

Sum a variable across dataframes by an ID variable

There are 3 data frames. The ID variable is in the 12th column of each data frame. I created a vector list_cc_q1 that contains all the unique IDs across all data frames (hence each entry in this vector appears in the 12th column of at least one data frame).
I wish to create a vector v1 that adds, for each ID, the values in the 7th column from each data frame which contains that ID (hence v1 would be of the same length as list_cc_q1). Here's the code I'm using:
f1 <- function(x,y){
ifelse(length(get(y)[which(get(y)[x,12]),7])>0, get(y)[which(get(y)[x,12]),7], 0)}
g1 <- function(x){sum(sapply(ls()[1:3], function(y){ f1(x,y)}))}
v1 <- sapply(list_cc_q1, function(z){ g1(z) })
This returns the following error:
Error in get(y)[x, 12] : incorrect number of dimensions
Called from: which(get(y)[x, 12])
I think I've overcomplicated the code, a simpler method will be immensely helpful.
But why doesn't this work?
Not sure I understand correctly, but how about:
library(data.table)
dt <- data.table(value = c(df1[[7]],df2[[7]],df3[[7]]), id = c(df1[[12]],df2[[12]],df3[[12]]))
dt[, .(sum = sum(value)), by = id]
This concatenates the 7th column of each of the three data.frames (df1, df2, df3) to a value column and the 12th column of each of the data.frames (df1, df2, df3) to an id column to form a data.table with two columns (value and id). It then sums the value column by the id column.
EDIT: Your code might not work because of the
ls()[1:3]
The ls() command is executed in the function-environment which does not contain your three data.frames if I see this correctly. You can see this by comparing the following:
ls()[1:3]
# [1] "df1" "df2" "df3"
function_ls <- function(){cat(ls()[1:3])}
function_ls()
# NA NA NA

All vs all set intersection for list values

I have a list L containing dataframes L=(A,B,C,D). Each dataframe has a column z. I would like to perform a set intersection of values in column z and count the numbers for each pairwise comparison of the dataframes in the list. (i.e. values that are shared) Such that I get a final matrix
A B C D
A
B
C
D
Where the values of the matrix contain the sum of the number of shared values. I am not sure which is the most idiomatic way to implement this using R. I could do a for loop where I start with the first member of the list, extract the values of column z perform a set intersection and populate an empty matrix. But there could be better more efficient approach.
Any ideas and implementations?
Example:
df1 <- data.frame(z=c(1,2,3),s=c(4,5,6))
df2 <- data.frame(z=c(3,2,4),s=c(6,5,4))
my.list <- list(df1, df2)
expected output
df1 df2
df1 3 2
df2 2 3
You can possibly try the outer function:
outer(my.list, my.list, function(x, y) Map(function(i, j) length(intersect(i$z, j$z)), x, y))
df1 df2
df1 3 2
df2 2 3

Resources