how to search for column names in a dataframe - r

I have the following data frame df2 and a vector n. How can I create a new data frame where df2 column names are same as vector n
df2 <- data.frame(x1=c(1,6,3),x2=c(4,3,1),x3=c(5,4,6),x4=c(7,6,7))
n<-c("x1","x4")

Any of these would work:
df2[n]
df2[, n] # see note below for caveat
subset(df2, select = n)
Note that in the second one if n can be of length one, i.e. one column, then it returns a vector rather than a data frame and if you want it to always return a data frame you would need instead:
df2[, n, drop = FALSE]

df3 <- subset(df2, select=c("x1", "x4"))
df3
hope it helps

Related

How to replace several variables with several variables from another dataframe in R using a loop?

I would like to replace multiple variables with variables from a second dataframe in R.
df1$var1 <- df2$var1
df1$var2 <- df2$var2
# and so on ...
As you can see the variable names are the same in both dataframes, however, numeric values are slightly different whereas the correct version is in df2 but needs to be in df1. I need to do this for many, many variables in a complex data set and wonder whether someone could help with a more efficient way to code this (possibly without using column references).
Here some example data:
# dataframe 1
var1 <- c(1:10)
var2 <- c(1:10)
df1 <- data.frame(var1,var2)
# dataframe 2
var1 <- c(11:20)
var2 <- c(11:20)
df2 <- data.frame(var1,var2)
# assigning correct values
df1$var1 <- df2$var1
df1$var2 <- df2$var2
As Parfait has said, the current post seems a bit too simplified to give any immediate help but I will try and summarize what you may need for something like this to work.
If the assumption is that df1 and df2 have the same number of rows AND that their orders are already matching, then you can achieve this really easily by the following subset notation:
df1[,c({column names df1}), drop = FALSE] <- df2[, c({column names df2}), drop = FALSE]
Lets say that df1 has columns a, b, and c and you want to replace b and c with two columns of df1 whose columns are x, y, z.
df1[,c("b","c"), drop = FALSE] <- df2[, c("y", "z"), drop = FALSE]
Here we are replacing b with y and c with z. The drop argument is just for added protection against subsetting a data.frame to ensure you don't get a vector.
If you do NOT know the order is correct or one data frame may have a differing size than the other BUT there is a unique identifier between the two data.frames - then I would personally use a function that is designed for merging two data frames. Depending on your preference you can use merge from base or use *_join functions from the dplyr package (my preference).
library(dplyr)
#assuming a and x are unique identifiers that can be matched.
new_df <- left_join(df1, df2, by = c("a"="x"))

Assign value from one data.frame to a specific column of another data.frame in R?

I would like to replace the first value of column Z of DF2 with the last value of column B of DF1. I want to make it general, that means, instead of specifying the last row (row number 10) of DF1 column B, is there a way to use end or anything else that would grab the last value of a particular column (in this case column B of DF1).
library(tidyverse)
set.seed(1500)
DF1 <- data.frame(A = runif(10,1,5), B = runif(10,5,10))
DF2 <- data.frame(X = runif(10,1,5), Z = runif(10,5,10))
DF2[1,2] <- DF1$B[10, 2]
I believe this can help you:
DF2$Z[1]<-DF1$B[dim(DF1)[1]]
We can use nrow(DF1). Either extract using the column index or column name with [[ and then with numeric index for first (1) and last (nrow), do the assignment
DF2[[2]][1] <- DF1[[2]][nrow(DF1)]

How to use laply match to lookup value and append in each row?

I have two data tables as below:
library(data.table)
x <- data.table(id = c(1,1,1,2,2,2,3,3,3,4,4,4), date = as.Date(c("2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03")))
y <- data.table(id=c(1,2,3,4),new_id=c(10,20,30,40))
As mentioned now I want to append the new_id column in the data table x and then later drop column id .
I can do this by
merge(x,y,by="id")
But I wanted to try the lapply .
So I tried
x[,new_id:=0]
nm <- c("new_id")
x[nm] <- lapply(nm, function(z) y[[z]][match(y$id, x$id)])
Also which method will be good if I have wide columns and more rows.
It does not matches the column it seems.
Also which method will be efficient if I have wide columns and more rows.
Any help is appreciated.

Sum a variable across dataframes by an ID variable

There are 3 data frames. The ID variable is in the 12th column of each data frame. I created a vector list_cc_q1 that contains all the unique IDs across all data frames (hence each entry in this vector appears in the 12th column of at least one data frame).
I wish to create a vector v1 that adds, for each ID, the values in the 7th column from each data frame which contains that ID (hence v1 would be of the same length as list_cc_q1). Here's the code I'm using:
f1 <- function(x,y){
ifelse(length(get(y)[which(get(y)[x,12]),7])>0, get(y)[which(get(y)[x,12]),7], 0)}
g1 <- function(x){sum(sapply(ls()[1:3], function(y){ f1(x,y)}))}
v1 <- sapply(list_cc_q1, function(z){ g1(z) })
This returns the following error:
Error in get(y)[x, 12] : incorrect number of dimensions
Called from: which(get(y)[x, 12])
I think I've overcomplicated the code, a simpler method will be immensely helpful.
But why doesn't this work?
Not sure I understand correctly, but how about:
library(data.table)
dt <- data.table(value = c(df1[[7]],df2[[7]],df3[[7]]), id = c(df1[[12]],df2[[12]],df3[[12]]))
dt[, .(sum = sum(value)), by = id]
This concatenates the 7th column of each of the three data.frames (df1, df2, df3) to a value column and the 12th column of each of the data.frames (df1, df2, df3) to an id column to form a data.table with two columns (value and id). It then sums the value column by the id column.
EDIT: Your code might not work because of the
ls()[1:3]
The ls() command is executed in the function-environment which does not contain your three data.frames if I see this correctly. You can see this by comparing the following:
ls()[1:3]
# [1] "df1" "df2" "df3"
function_ls <- function(){cat(ls()[1:3])}
function_ls()
# NA NA NA

Finding closest value by ID with unequal lengths

I have a data frame and a vector of unequal lengths. They do not share an id.
df <- data.frame(
id = factor(rep(1:24, each = 10)),
x = runif(20)*100
)
a <- sort(runif(100*100))
Now, I would really like run over each row of the data frame and find the location in the vector (a) of the closest corresponding value for each id.
For a single value, this is just.
which.min(abs(df[1, 2] - a))
So, if I did it "manually" it would be:
a.location <- c(
which.min(abs(df[1, 2] - a))
which.min(abs(df[2, 2] - a)),
....,
which.min(abs(df[24, 2] - a))
)
But I simply can't wrap my head around how I can do this in a function, when I can't merge the data frame and the vector. I've looked at mapply, but that doesn't go well with unequal lengths and also rowwise from dplyr, but haven't had much luck with that either.
You can use rolling join from data.table package
library(data.table)
setkey(setDT(df), x)
df1 <- data.table(x=a, id1=1:length(a))
setkey(df1, x)
df1[df, roll="nearest"]
id1 column will give you the desired result.

Resources