I need to create a new column in df1 named col_2, and assign it values from another data frame (df2). When the value in col_1 from df1 equals a value in col_a from df2, I want the corresponding value of col_b of df2 assigned to col_2.
The data frames are different sizes.
The data:
col_1 <- c(23,31,98,76,47,65,23,76,3,47)
col_2 <- NA
df1 <- data.frame(col_1, col_2)
col_a <- c(1:100)
col_b <- c(runif(100,0,1))
df2 <- data.frame(col_a, col_b)
I tried the following but none seemed to work... I keep running into the same problem, that the data frames are not of the same length.
for (i in 1:10){
if(df1$col_1[i] == df2$col_a[]){
df1$col_2[i] == df2$col_b[]
}
}
df1$col_2 <- ifelse(df2$col_a %in% df1$col_1, df2$col_b, NA)
df1$col_1[df1$col_1 %in% df2$col_a] <- df2$col_b[df1$col_1 %in% df2$col_a]
We can use left_join
library(dplyr)
left_join(df1, df2, by = c('col_1' = 'col_a'))
I'm coding in R. I have a big data frame (df1) and a little data frame (df2). df2 is a subset of df1, but in a random order. I need to know the row indices of df1 which occur in df2. All of the specific cell values have lots of duplicates. Tapirus terrestris shows up more than once, as does each ModType value. I tried experimenting with which() and grpl() but couldn't get my code to work.
df1 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Panthera onca', 'Leopardus tigrinus' , 'Leopardus tigrinus'),
ModType = c('ANN', 'GAM', 'GAM','RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1019_s3_sd','CHELSAbio1015_s4_sd','CHELSAbio1015_s4_sd'))
df2 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Leopardus tigrinus'),
ModType = c('ANN', 'RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1015_s4_sd'))
Should output an array: 1,4 because df1 rows 1 and 4 occur in df2.
You can create an index column in df1 and merge the datasets.
df1$index <- 1:nrow(df1)
df3 <- merge(df1, df2)
df3$index
#[1] 4 1
You can use match.
df1[match(df2$SpeciesName, df1$SpeciesName), ]
Another option is tidyverse
library(dplyr)
df1 %>%
mutate(index = row_number()) %>%
inner_join(df2)
I have two data frames:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
Here we have to get only matching columns of DF_2$match_col
final_df <- data.frame(A=c("a","a","c","d","e"),B=1:5)
Your question here is not very clear. For youR DF_2, I am not sure if there is a column of B in it. I assume you forgot to include it, as I assume you need that column to perform matching.
Please see below:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
DF_2$B=c(1:3)
DF$A= as.character(DF$A)
DF_2$match_col= as.character(DF_2$match_col)
for(id in 1:nrow(DF_2)){
DF$A[DF$B %in% DF_2$B[id]] <- DF_2$match_col[id]
}
DF
Here my DF matches with your final_df, therefore I presume my assumption is right.
I'd like to use a data frame (Df2) to recode the variables of another data frame (Df1), so that the end result is a data frame that contains text like local/international rather than 1s/2s (Df3). Missingness is present in the Df1 data frame, and I'd like to make sure it's represented as NA.
This is a minimal working example, the actual data set contains more than a hundred variables (all of which are of the character class) with between one and fifteen levels. Any help would be much appreciated.
Starting point (dfs)
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))
Desired outcome (df)
Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))
Thoughts, not really code, so far: (If there's a match between a row of the df2 NameOfVariable and a df1 variable name, as well as a match between a row of df2 VariableLevel and a df1 observation, then paste the corresponding row of df2 VariableDef into df1. Wondering if you can use if statements for it.)
if (Df2["NameOfVariable"]==names(Df1))
{
if (Df2["VariableLevel"]==Df1[ ])
{
Df1[ ] <- paste0("VariableDef")
}
}
Here is on method in base R using match and Map. Map applies a function to corresponding list elements. Here, there are two list elements: Df1 and a list that is composed of the second and third columns of Df2, split by column 1. The second list is reordered to match the order of the names in Df1.
The applied function matches elements in a column Df1 to the corresponding column in the second argument and uses it as an index to return the corresponding name of the Df2 argument. Map returns a list, which is converted to a data.frame with the function of the same name.
data.frame(Map(function(x, y) y[[2]][match(x, y[[1]])],
Df1,
split(Df2[2:3], Df2[1])[names(Df1)]))
this returns
buyer_Q1 seller_Q2 price_Q1_2
1 local internat 50-100K
2 internat local 100-200K
3 local NA 200+K
4 local internat 100-200K
Solution using loop and factors. Be careful. Results seem equivalent but they are not. The function fun return data frame with factors. If needed you can convert them to characters.
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))
Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))
fun <- function(df, mdf) {
for (varn in names(df)) {
dat <- mdf[mdf$NameOfVariable == varn & !is.na(mdf$VariableDef),]
df[[varn]] <- factor(df[[varn]], dat$VariableLevel, dat$VariableDef)
}
return(df)
}
fun(Df1, Df2)
Df3
A solution from dplyr and tidyr. The code will work fine even with warning messages because the columns are in factor. If you don't want to see any warning messages, set stringsAsFactors = FALSE when creating the data frame like the example I provided.
library(dplyr)
library(tidyr)
Df3 <- Df1 %>%
mutate(ID = 1:n()) %>%
gather(NameOfVariable, VariableLevel, -ID) %>%
left_join(Df2, by = c("NameOfVariable", "VariableLevel")) %>%
select(-VariableLevel) %>%
spread(NameOfVariable, VariableDef) %>%
select(-ID)
Df3
buyer_Q1 price_Q1_2 seller_Q2
1 local 50-100K internat
2 internat 100-200K local
3 local 200+K NA
4 local 100-200K internat
DATA
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),
"seller_Q2"=c(2,1,3,2),
"price_Q1_2"=c(2,5,7,5),
stringsAsFactors = FALSE)
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),
"VariableLevel"=c(1,2,1,2,3,2,5,7),
"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"),
stringsAsFactors = FALSE)
I have a larger dataset following the same order, a unique date column, data, unique date column, date, etc. I am trying to subset not just the data column by name but the unique date column also. The code below selects columns based on a list of names, which is part of what I want but any ideas of how I can grab the column immediately before the subsetted column also?
Looking to end up with a DF containing Date1, Fire, Date3, Earth columns (using just the NameList).
Here is my reproducible code:
Cnames <- c("Date1","Fire","Date2","Water","Date3","Earth")
MAINDF <- data.frame(replicate(6,runif(120,-0.03,0.03)))
colnames(MAINDF) <- Cnames
NameList <- c("Fire","Earth")
NewDF <- MAINDF[,colnames(MAINDF) %in% NameList]
How about
NameList <- c("Fire","Earth")
idx <- match(NameList, names(MAINDF))
idx <- sort(c(idx-1, idx))
NewDF <- MAINDF[,idx]
Here we use match() to find the index of the desired column, and then we can use index subtraction to grab the column before it
Use which to get the column numbers from the names, and then it's just simple arithmetic:
col.num <- which(colnames(MAINDF) %in% NameList)
NewDF <- MAINDF[,sort(c(col.num, col.num - 1))]
Produces
Date1 Fire Date3 Earth
1 -0.010908003 0.007700453 -0.022778726 -0.016413307
2 0.022300509 0.021341360 0.014204445 -0.004492150
3 -0.021544992 0.014187158 -0.015174048 -0.000495121
4 -0.010600955 -0.006960160 -0.024535954 -0.024210771
5 -0.004694499 0.007198620 0.005543146 -0.021676692
6 -0.010623787 0.015977135 -0.027741109 -0.021102651
...