Retain previous data frame based on condition in R - r

So I'm trying to update or retain a dataframe df2 based on a certain condition of another data frame df1.
For Example, Assuming df1 get updated for every 30 seconds, so if the number of rows in df1 i.e nrow(df1)!= 0 then df2 <- df1 else if retain the previous values in df2.
NOTE: On the first iteration, df2 can be initialized to a NULL dataframe.
Following is my code
#Initializing df2 as empty dataframe
df2 <- data.frame(weight = integer(),stringAsFactors = FALSE)
#Condition to check if number of rows in df1 != 0
if(nrow(df1) != 0){
df2 <- df1
temp <- df1 #Another copy of df1
}
else{
df2 <- temp
}
Here I created an another data frame called temp to keep a copy of df1 so that it can be used when nrow(df1) == 0. I don't know if the usage of temp is correct or not.

This code will create an empty dataframe named df2. If nrow(df1)>0 then it will effectively assign the contents of df1 to df2. If nrow(df1)==0 then df2 remains empty.
df2 <- data.frame()
if(nrow(df1)>0) df2 <- df1
I have a hard time imagining why this is useful. If, perhaps, you intended to "grow" df2 by appending on whatever is in df1 - which might be more common - then do something like this:
df2 <- data.frame()
if(nrow(df1)>0) df2 <- rbind(df2, df1)

Related

R: Assign values to column, from a column from another data frame, based on a condition (different sized data frames)

I need to create a new column in df1 named col_2, and assign it values from another data frame (df2). When the value in col_1 from df1 equals a value in col_a from df2, I want the corresponding value of col_b of df2 assigned to col_2.
The data frames are different sizes.
The data:
col_1 <- c(23,31,98,76,47,65,23,76,3,47)
col_2 <- NA
df1 <- data.frame(col_1, col_2)
col_a <- c(1:100)
col_b <- c(runif(100,0,1))
df2 <- data.frame(col_a, col_b)
I tried the following but none seemed to work... I keep running into the same problem, that the data frames are not of the same length.
for (i in 1:10){
if(df1$col_1[i] == df2$col_a[]){
df1$col_2[i] == df2$col_b[]
}
}
df1$col_2 <- ifelse(df2$col_a %in% df1$col_1, df2$col_b, NA)
df1$col_1[df1$col_1 %in% df2$col_a] <- df2$col_b[df1$col_1 %in% df2$col_a]
We can use left_join
library(dplyr)
left_join(df1, df2, by = c('col_1' = 'col_a'))

Return the row indices of df1 when those row values occur in df2 in R

I'm coding in R. I have a big data frame (df1) and a little data frame (df2). df2 is a subset of df1, but in a random order. I need to know the row indices of df1 which occur in df2. All of the specific cell values have lots of duplicates. Tapirus terrestris shows up more than once, as does each ModType value. I tried experimenting with which() and grpl() but couldn't get my code to work.
df1 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Panthera onca', 'Leopardus tigrinus' , 'Leopardus tigrinus'),
ModType = c('ANN', 'GAM', 'GAM','RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1019_s3_sd','CHELSAbio1015_s4_sd','CHELSAbio1015_s4_sd'))
df2 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Leopardus tigrinus'),
ModType = c('ANN', 'RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1015_s4_sd'))
Should output an array: 1,4 because df1 rows 1 and 4 occur in df2.
You can create an index column in df1 and merge the datasets.
df1$index <- 1:nrow(df1)
df3 <- merge(df1, df2)
df3$index
#[1] 4 1
You can use match.
df1[match(df2$SpeciesName, df1$SpeciesName), ]
Another option is tidyverse
library(dplyr)
df1 %>%
mutate(index = row_number()) %>%
inner_join(df2)

Replace values in column with matching column in different DF

I have two data frames:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
Here we have to get only matching columns of DF_2$match_col
final_df <- data.frame(A=c("a","a","c","d","e"),B=1:5)
Your question here is not very clear. For youR DF_2, I am not sure if there is a column of B in it. I assume you forgot to include it, as I assume you need that column to perform matching.
Please see below:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
DF_2$B=c(1:3)
DF$A= as.character(DF$A)
DF_2$match_col= as.character(DF_2$match_col)
for(id in 1:nrow(DF_2)){
DF$A[DF$B %in% DF_2$B[id]] <- DF_2$match_col[id]
}
DF
Here my DF matches with your final_df, therefore I presume my assumption is right.

Recoding a large number of variables using another data frame in R

I'd like to use a data frame (Df2) to recode the variables of another data frame (Df1), so that the end result is a data frame that contains text like local/international rather than 1s/2s (Df3). Missingness is present in the Df1 data frame, and I'd like to make sure it's represented as NA.
This is a minimal working example, the actual data set contains more than a hundred variables (all of which are of the character class) with between one and fifteen levels. Any help would be much appreciated.
Starting point (dfs)
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))
Desired outcome (df)
Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))
Thoughts, not really code, so far: (If there's a match between a row of the df2 NameOfVariable and a df1 variable name, as well as a match between a row of df2 VariableLevel and a df1 observation, then paste the corresponding row of df2 VariableDef into df1. Wondering if you can use if statements for it.)
if (Df2["NameOfVariable"]==names(Df1))
{
if (Df2["VariableLevel"]==Df1[ ])
{
Df1[ ] <- paste0("VariableDef")
}
}
Here is on method in base R using match and Map. Map applies a function to corresponding list elements. Here, there are two list elements: Df1 and a list that is composed of the second and third columns of Df2, split by column 1. The second list is reordered to match the order of the names in Df1.
The applied function matches elements in a column Df1 to the corresponding column in the second argument and uses it as an index to return the corresponding name of the Df2 argument. Map returns a list, which is converted to a data.frame with the function of the same name.
data.frame(Map(function(x, y) y[[2]][match(x, y[[1]])],
Df1,
split(Df2[2:3], Df2[1])[names(Df1)]))
this returns
buyer_Q1 seller_Q2 price_Q1_2
1 local internat 50-100K
2 internat local 100-200K
3 local NA 200+K
4 local internat 100-200K
Solution using loop and factors. Be careful. Results seem equivalent but they are not. The function fun return data frame with factors. If needed you can convert them to characters.
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))
Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))
fun <- function(df, mdf) {
for (varn in names(df)) {
dat <- mdf[mdf$NameOfVariable == varn & !is.na(mdf$VariableDef),]
df[[varn]] <- factor(df[[varn]], dat$VariableLevel, dat$VariableDef)
}
return(df)
}
fun(Df1, Df2)
Df3
A solution from dplyr and tidyr. The code will work fine even with warning messages because the columns are in factor. If you don't want to see any warning messages, set stringsAsFactors = FALSE when creating the data frame like the example I provided.
library(dplyr)
library(tidyr)
Df3 <- Df1 %>%
mutate(ID = 1:n()) %>%
gather(NameOfVariable, VariableLevel, -ID) %>%
left_join(Df2, by = c("NameOfVariable", "VariableLevel")) %>%
select(-VariableLevel) %>%
spread(NameOfVariable, VariableDef) %>%
select(-ID)
Df3
buyer_Q1 price_Q1_2 seller_Q2
1 local 50-100K internat
2 internat 100-200K local
3 local 200+K NA
4 local 100-200K internat
DATA
Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),
"seller_Q2"=c(2,1,3,2),
"price_Q1_2"=c(2,5,7,5),
stringsAsFactors = FALSE)
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),
"VariableLevel"=c(1,2,1,2,3,2,5,7),
"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"),
stringsAsFactors = FALSE)

Subset columns based on list of column names and bring the column before it

I have a larger dataset following the same order, a unique date column, data, unique date column, date, etc. I am trying to subset not just the data column by name but the unique date column also. The code below selects columns based on a list of names, which is part of what I want but any ideas of how I can grab the column immediately before the subsetted column also?
Looking to end up with a DF containing Date1, Fire, Date3, Earth columns (using just the NameList).
Here is my reproducible code:
Cnames <- c("Date1","Fire","Date2","Water","Date3","Earth")
MAINDF <- data.frame(replicate(6,runif(120,-0.03,0.03)))
colnames(MAINDF) <- Cnames
NameList <- c("Fire","Earth")
NewDF <- MAINDF[,colnames(MAINDF) %in% NameList]
How about
NameList <- c("Fire","Earth")
idx <- match(NameList, names(MAINDF))
idx <- sort(c(idx-1, idx))
NewDF <- MAINDF[,idx]
Here we use match() to find the index of the desired column, and then we can use index subtraction to grab the column before it
Use which to get the column numbers from the names, and then it's just simple arithmetic:
col.num <- which(colnames(MAINDF) %in% NameList)
NewDF <- MAINDF[,sort(c(col.num, col.num - 1))]
Produces
Date1 Fire Date3 Earth
1 -0.010908003 0.007700453 -0.022778726 -0.016413307
2 0.022300509 0.021341360 0.014204445 -0.004492150
3 -0.021544992 0.014187158 -0.015174048 -0.000495121
4 -0.010600955 -0.006960160 -0.024535954 -0.024210771
5 -0.004694499 0.007198620 0.005543146 -0.021676692
6 -0.010623787 0.015977135 -0.027741109 -0.021102651
...

Resources