I'm creating a new column in my data set that is a copy of a preexisting column but I need to change 2 of the 7 values to something else.
I have tried doing this
Dataset$category_alt = Dataset$category_title
Dataset[Dataset$category_alt == "Shows"] <- "Other"
Dataset[Dataset$category_alt == "Nonprofits & Activism"] <- "Other"
But I receive this error
Error in [<-.data.frame(*tmp*, USvideos$category_alt == "Shows", value = "Other") :
duplicate subscripts for columns
You're missing the new column name in the assignment.
Dataset$category_alt = Dataset$category_title
Dataset$category_alt[Dataset$category_alt == "Shows"] <- "Other"
Dataset$category_alt[Dataset$category_alt == "Nonprofits & Activism"] <- "Other"
We can also use %in% to make the code clearer and shorter
Dataset$category_alt[Dataset$category_alt %in% c('Shows', 'Nonprofits & Activism')] <-'Other'
I usually prefer to use dplyr in such situations:
library(dplyr)
Dataset %>% mutate(category_alt = replace(category_alt, category_alt %in% c('Shows', 'Nonprofits & Activism'), 'Other))
This is the code that I am trying to run and it's taking a while.
Districts is a data frame of 39299 rows and 16 columns and lm_data is a data frame of 59804 rows and 16 variables. I want to set up a new variable in lm_data called tentativeStartDate which takes on the value of districts$firstDay[j] if a couple of conditions are meant. Is there a more efficient way to do this?
for (i in 1: nrow(lm_data)){
for (j in 1: nrow(districts)){
if (lm_data$DISTORGID[i] == districts$DISTORGID[j] & lm_data$gradeCode[i] == districts$gradeCode[j]){
lm_data$tentativeStartDate[i] = districts$firstDay[j]
}
}
}
Not sure if this will work since I can't test it, but if it does work it should be much faster.
# get the indices
idx <- which(lm_data$DISTORGID == districts$DISTORGID & lm_data$gradeCode == districts$gradeCode)
lm_data$tentativeStartDate[idx] <- districts$firstDay[idx]
I'm trying to add a column in the data frame where the new element in the new column has the value of "1" if the conditions are met for that particular row.
To check the condition I am iterating through another reference data frame.
county_list = (df$county_name[df$wolves_present_in_county==1 & df$year==2015])
for (i in df$county_name) {
for (j in county_list) {
if (df$county_name[i]==county_list[j])
{
df$wolvein2015 = 1
break
}
}
}
Error in Output
Dataset
I think you can do what you want in base R. Here is an example using mtcars:
cars <- mtcars
cars$new <- ifelse(cars$cyl == 4 & cars$mpg > 30, 1, 0)
The new column is added with 0/1's based on conditions of 2 other variables.
BTW, because R is a vectorized paradigm, you should only use for loops as a last resort.
Here is an option with dplyr
library(dplyr)
df %>%
mutate(wolvein2015 = +((year == 2015 & !is.na(year)) &
as.logical(wolves_present_in_county) & !is.na(wolves_present_in_county)))
There are a lot of matching "X" and "Y" questions in R on this site but I think I have a new one. I have two datasets, one is shorter (500 rows) and has one entry per individual. The second is bigger (~20,000 rows) and an individual can have multiple entries. Both have columns for date of birth and gender. My goal is to find people represented in both datasets and to start by finding date of birth and gender matches. My python influenced brain came up with this horrifically slow solution:
dob_big <- c('1975-05-04','1968-02-16','1985-02-28','1980-12-12','1976-06-06','1979-06-24','1981-01-28',
'1985-01-16','1984-03-04','1979-06-26','1988-12-22','1975-10-02','1968-02-04','1972-02-01',
'1981-08-06','1989-01-21','1956-06-25','1986-01-19','1980-03-24','1965-08-16')
gender_big <- c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)
big_df <- data_frame(date_birth = dob_big, gender = gender_big)
dob_small <- c('1985-01-16','1984-03-04','1979-06-26')
gender_small <- c(1,0,1)
small_df <- data_frame(date_birth = dob_small, gender = gender_small)
for (i in 1:length(big_df$date_birth)) {
save_row <- FALSE
for (j in 1:length(small_df$date_birth)) {
if (big_df$date_birth[i] == small_df$date_birth[j]
& big_df$gender[i] == small_df$gender[j]) {
print(paste("Match found at ",i,",",j))
save_row <- TRUE
}
}
if (save_row == TRUE) {
matches <- c(matches,i)
}
}
Is there a more functional solution that would run faster in R?
whichcould be an alternative.
paste0("Match found at ",
which(paste(big_df$date_birth, big_df$gender) %in%
paste(small_df$date_birth, small_df$gender)),
", ",
which(paste(small_df$date_birth, small_df$gender) %in%
paste(big_df$date_birth, big_df$gender)),
collapse = "; ")
If you only want to find those that are represented in both, you could do a merge
merge(big_df,small_df, by = c("date_birth","gender"))
I am trying to loop through all the rows of a column in a DataFrame. I read in the csv using data.table. I am new to R and was wondering what way I would go about doing something like this:
for i in row_2_of_dataframe:
if i == 0:
#Do something to that value
else:
#Leave it the way it is
Any help would be great.
I would recommend using the ifelse() function. For example;
mydf$column_name <- ifelse(mydf$column_name == 0, "do something",mydf$column_name)
frame <- data.frame(x = as.character(rep("bye", 11)),
y = as.character(0:10),
stringsAsFactors = FALSE)
for (i in 1:length(frame[, 2])) {
if (frame[, 2][i] == 0) {
frame[, 2][i] <- "hi"
}
}
You don't even really need an else statement.
Furthermore,
frame[, 2]
selects the second column and turns it into a vector.
frame[, 1]
would select the first column.
frame[1, ]
would select the first row.
And so on.
Cheers.