Quickly lookup to another data frame using two column values in R - r

I have a data frame (call it 'ModelOutput') with three columns (Trial, DurationRet, DiscountRate) and another (call it 'drdata') with three columns (Scenario, variable, value).
I want to quickly filter drdata$Scenario == ModelOutput$Trial & drdata$variable == ModelOutput$DurationRet to return drdata$value into the ModelOutput$DiscountRate column. Is there a way to do this efficiently?
Here are my two attempts, the first of which fails and the second of which is entirely too slow.
ModelOutput$Trial <- drdata[drdata$Scenario == ModelOutput$Trial & drdata$variable == ModelOutput$DurationRet,"value"]
foreach(row = 1:nrow(ModelOutput)) %do%{
ModelOutput[row, "DiscountRate"] <- drdata[drdata$Scenario == ModelOutput[row, "Trial"] & drdata$variable == as.factor(ModelOutput[row,"DurationRet"]+1),"value"]
}

It took me a minute, but I realized joins could do the job I was looking for.
Here is my final code:
ModelOutput <- ModelOutput %>% full_join(drdata, by = c(Trial = "Scenario", DurationRet = "variable"))

Related

How to replace certain values in a dataset in R

I'm creating a new column in my data set that is a copy of a preexisting column but I need to change 2 of the 7 values to something else.
I have tried doing this
Dataset$category_alt = Dataset$category_title
Dataset[Dataset$category_alt == "Shows"] <- "Other"
Dataset[Dataset$category_alt == "Nonprofits & Activism"] <- "Other"
But I receive this error
Error in [<-.data.frame(*tmp*, USvideos$category_alt == "Shows", value = "Other") :
duplicate subscripts for columns
You're missing the new column name in the assignment.
Dataset$category_alt = Dataset$category_title
Dataset$category_alt[Dataset$category_alt == "Shows"] <- "Other"
Dataset$category_alt[Dataset$category_alt == "Nonprofits & Activism"] <- "Other"
We can also use %in% to make the code clearer and shorter
Dataset$category_alt[Dataset$category_alt %in% c('Shows', 'Nonprofits & Activism')] <-'Other'
I usually prefer to use dplyr in such situations:
library(dplyr)
Dataset %>% mutate(category_alt = replace(category_alt, category_alt %in% c('Shows', 'Nonprofits & Activism'), 'Other))

Nested for-loop with if statement

This is the code that I am trying to run and it's taking a while.
Districts is a data frame of 39299 rows and 16 columns and lm_data is a data frame of 59804 rows and 16 variables. I want to set up a new variable in lm_data called tentativeStartDate which takes on the value of districts$firstDay[j] if a couple of conditions are meant. Is there a more efficient way to do this?
for (i in 1: nrow(lm_data)){
for (j in 1: nrow(districts)){
if (lm_data$DISTORGID[i] == districts$DISTORGID[j] & lm_data$gradeCode[i] == districts$gradeCode[j]){
lm_data$tentativeStartDate[i] = districts$firstDay[j]
}
}
}
Not sure if this will work since I can't test it, but if it does work it should be much faster.
# get the indices
idx <- which(lm_data$DISTORGID == districts$DISTORGID & lm_data$gradeCode == districts$gradeCode)
lm_data$tentativeStartDate[idx] <- districts$firstDay[idx]

IF statement in R. Add new elements in new column if conditions met

I'm trying to add a column in the data frame where the new element in the new column has the value of "1" if the conditions are met for that particular row.
To check the condition I am iterating through another reference data frame.
county_list = (df$county_name[df$wolves_present_in_county==1 & df$year==2015])
for (i in df$county_name) {
for (j in county_list) {
if (df$county_name[i]==county_list[j])
{
df$wolvein2015 = 1
break
}
}
}
Error in Output
Dataset
I think you can do what you want in base R. Here is an example using mtcars:
cars <- mtcars
cars$new <- ifelse(cars$cyl == 4 & cars$mpg > 30, 1, 0)
The new column is added with 0/1's based on conditions of 2 other variables.
BTW, because R is a vectorized paradigm, you should only use for loops as a last resort.
Here is an option with dplyr
library(dplyr)
df %>%
mutate(wolvein2015 = +((year == 2015 & !is.na(year)) &
as.logical(wolves_present_in_county) & !is.na(wolves_present_in_county)))

Find matching individuals in two datasets in R

There are a lot of matching "X" and "Y" questions in R on this site but I think I have a new one. I have two datasets, one is shorter (500 rows) and has one entry per individual. The second is bigger (~20,000 rows) and an individual can have multiple entries. Both have columns for date of birth and gender. My goal is to find people represented in both datasets and to start by finding date of birth and gender matches. My python influenced brain came up with this horrifically slow solution:
dob_big <- c('1975-05-04','1968-02-16','1985-02-28','1980-12-12','1976-06-06','1979-06-24','1981-01-28',
'1985-01-16','1984-03-04','1979-06-26','1988-12-22','1975-10-02','1968-02-04','1972-02-01',
'1981-08-06','1989-01-21','1956-06-25','1986-01-19','1980-03-24','1965-08-16')
gender_big <- c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)
big_df <- data_frame(date_birth = dob_big, gender = gender_big)
dob_small <- c('1985-01-16','1984-03-04','1979-06-26')
gender_small <- c(1,0,1)
small_df <- data_frame(date_birth = dob_small, gender = gender_small)
for (i in 1:length(big_df$date_birth)) {
save_row <- FALSE
for (j in 1:length(small_df$date_birth)) {
if (big_df$date_birth[i] == small_df$date_birth[j]
& big_df$gender[i] == small_df$gender[j]) {
print(paste("Match found at ",i,",",j))
save_row <- TRUE
}
}
if (save_row == TRUE) {
matches <- c(matches,i)
}
}
Is there a more functional solution that would run faster in R?
whichcould be an alternative.
paste0("Match found at ",
which(paste(big_df$date_birth, big_df$gender) %in%
paste(small_df$date_birth, small_df$gender)),
", ",
which(paste(small_df$date_birth, small_df$gender) %in%
paste(big_df$date_birth, big_df$gender)),
collapse = "; ")
If you only want to find those that are represented in both, you could do a merge
merge(big_df,small_df, by = c("date_birth","gender"))

Looping through dataframes in R

I am trying to loop through all the rows of a column in a DataFrame. I read in the csv using data.table. I am new to R and was wondering what way I would go about doing something like this:
for i in row_2_of_dataframe:
if i == 0:
#Do something to that value
else:
#Leave it the way it is
Any help would be great.
I would recommend using the ifelse() function. For example;
mydf$column_name <- ifelse(mydf$column_name == 0, "do something",mydf$column_name)
frame <- data.frame(x = as.character(rep("bye", 11)),
y = as.character(0:10),
stringsAsFactors = FALSE)
for (i in 1:length(frame[, 2])) {
if (frame[, 2][i] == 0) {
frame[, 2][i] <- "hi"
}
}
You don't even really need an else statement.
Furthermore,
frame[, 2]
selects the second column and turns it into a vector.
frame[, 1]
would select the first column.
frame[1, ]
would select the first row.
And so on.
Cheers.

Resources