Combining For and If loop in R - r

I wish to merge tables in R only if that variable name exists. For the same, I have made a variable with the various table names that may or may not exist. And then added a "for" and "if" loop to combine the tables. All the tables if they exist, have a common "names" column. The code entered by me is as follows:
Designation.Attrition1<- data.frame(names)
x<- c("despivot2020new", "despivot2019new", "despivot2018new", "despivot2017new")
for( i in 1: length(x)){if (exists(x[i])){Designation.Attrition1<- merge(Designation.Attrition1, x[i] , by = "names")}}
However, I'm getting the error as "Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column".
One of the reasons for the error, maybe that the merge function fails to consider the element of x as variable name.

x[i] is still a string and not a dataframe. Try to get the data first before merging.
for( i in seq_along(x)) {
if (exists(x[i])) {
Designation.Attrition1 <- merge(Designation.Attrition1,get(x[i]),by = 'names')
}
})

Related

Why does using paste in for loop return error?

I have a few problems concerning the same topic.
(1) I am trying to loop over:
premium1999 <- as.data.frame(coef(summary(data1999_mod))[c(19:44), 1])
for 10 years, in which I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(paste0('data',year,'_mod')))[c(19:44), 1])
}
Note:
for data1999_mod is regression results that I want extract some of its estimators as a dataframe vector.
The coef(summary(data1999_mod)) looks like this:
#A matrix: ... of type dbl
Estimate Std. Error t value Pr(>|t|)
age 0.0388573570 2.196772e-03 17.6883885 3.362887e-6
age_sqr -0.0003065876 2.790296e-05 -10.9876373 5.826926e-28
relation 0.0724525759 9.168118e-03 7.9026659 2.950318e-15
sex -0.1348453659 8.970138e-03 -15.0326966 1.201003e-50
marital 0.0782049161 8.928773e-03 8.7587533 2.217825e-18
reg 0.1691004469 1.132230e-02 14.9351735 5.082589e-50
...
However, it returns Error: $ operator is invalid for atomic vectors, even if I did not use $ operator here.
(2) Also,
I want to create a column 'year' containing repeated values of the associated year and am trying to loop over this:
premium1999$year <- 1999
In which I wrote:
for (i in seq(1999,2008)) {
assign(paste0('premium',i)[['year']], i)
}
In this case, it returns Error in paste0("premium", i)[["year"]]: subscript out of bounds
(3) Moreover, I'd like to repeat some rows and loop over:
premium1999 <- rbind(premium1999, premium1999[rep(1, 2),])
for 10 years again and I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(paste0('premium',year), paste0('premium',year)[rep(1, 2),])
}
This time it returns Error in paste0("premium", year)[rep(1, 2), ]: incorrect number of dimensions
I also tried to loop over a few other similar things but I always get Error.
Each code works fine individually.
I could not find what I did wrong. Any help or suggestions would be very highly appreciated.
The problem with the code is that the paste0() function returns the character and not calling the object that is having the name as this character. For example, paste0('data',year,'_mod') returns a character vector of length 1, i.e., "data1999_mod" and not calling the object data1999_mod.
For easy understanding, there is huge a difference between, "data1999_mod"["Estimate"] and data1999_mod["Estimate"]. Subsetting as data frame merely by paste0() function returns the former, however, the expected output will be given by the latter only. That is why you are getting, Error: $ operator is invalid for atomic vectors.
The same error is found in all of your codes. On order to call the object by the output of a paste0() function, we need to enclose is by get().
As, you have not supplied the reproducible sample, I couldn't test it. However, you can try running these.
#(1)
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(get(paste0('data',year,'_mod'))))[c(19:44), 1])
}
#(2)
for (i in seq(1999,2008)) {
assign(get(paste0('premium',i))[['year']], i)
}
#(3)
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(get(paste0('premium',year)), get(paste0('premium',year))[rep(1, 2),])
}

Parameterize name of output dataframe in global environment, assigned to from a function

Trying to pass into a function what I want it to name the dataframe it creates, then save it to global environment.
I am trying to automate creating dataframes that are subsets of other dataframes by filtering for a value; since I'm creating 43 of these, I'm writing a function that can automatically:
a) subset rows containing a certain string into it's own data.frame then
b) name a dataframe after that string and save it to my global environment. (The string in a) is also the suffix I want it to name the data.frame after in b))
I can do a) fine but am having trouble with b).
Say I have a dataset which includes a column named "Team" (detailing whose team that member belongs to):
original.df <- read_csv("../original_data_set")
I create a function to split that dataset according to values in one of its columns...
split.function <- function(string){
x <- original.df
as.name(string) <<- filter(x, str_detect(`Team`, string))
}
... then save the dataframe with the name:
split.by.candidate('Team.Curt')
I keep getting:
> Error in as.name(x) <<- filter(y, str_detect(`Receiving Committee`, x)) :
object 'x' not found
But I just want to see Team.Curt saved as a data.frame in my global environment when I do this with rows including the term Team.Curt
You can use assign to create objects based on a string:
split.function <- function(string){
x <- original.df
assign(string, filter(x, str_detect(`Team`, string)), envir = .GlobalEnv)
}
Here, envir = .GlobalEnv is used to assign the value to the global environment.
Both <- and <<- assignments require that the statement hardcodes the object name. Since you want to parameterize the name, as in your cases, you must use assign().
<<- is merely a variant of <- that can be used inside a function, and does a bottom-up search of environments until it either reaches the top (.GlobalEnv) or finds an existing object of that name. In your case that's unnecessary and slightly dangerous, since if an object of that name existed in some environment halfway up the hierarchy, you'd pick it up and assign to it instead.
So just use assign(..., envir = .GlobalEnv) instead.
But both <<- or assigning directly into .GlobalEnv within functions are strongly discouraged as being disasters in waiting, or "life by a volcano" (burns-stat.com/pages/Tutor/R_inferno.pdf). See the caveats at Assign multiple objects to .GlobalEnv from within a function. tidyverse is probably a better approach for managing multiple dataframes.

User-defined function producing 'Could not find function' error

My dataset test[[1]] can be found here.
I'm defining a function and using it in a for loop in the following code. The function is supposed to concatenate strings such as (test[[1]], '$', names(test[[1]])[1])) before converting them into an R variable. So in this example, these strings go in and out comes test[[1]]$V1.
I then iterate the function over the variables in test[[1]].
Unfortunately, I keep getting this error: Error in stvar(test[[1]], j) <- NULL : could not find function "stvar<-".
stvar <- function(df,num) {
eval(parse(text=paste(deparse(substitute(df)),'$',names(df)[num],sep='')))
}
for (j in 1:length(names(test[[1]]))){
if (trimws(as.character(stvar(test[[1]],j)[1]))=="Div" &
grepl("^M",stvar(test[[1]],j)[3])==0) {
stvar(test[[1]],j) <- NULL
}
}
Also, not sure if this is important, but the for-loop finds columns containing certain characteristics (first observation == "Div", third observation doesn't start with 'M') and removes matching columns.
Is there a way I can make the loop recognize my function?

R Functions with variables

I have a function where im trying to compare a dataframe column to a ref table of type character. I have downloaded some data from the Norwegian central statistics office with popular first names. I want to add a column to my data frame which is basically a 1 or a 0 if the name appears in the list (1 being a boy 0 being a girl). Im getting the following error with the code
*Error in match(x, table, nomatch = 0L) : object 'x' not found*
Data frame is train.
Reference data is male_names
male_names <- read.csv("~/R/Functions_Practice/NO/BoysNames_Data.csv", sep=";",as.is = TRUE)[ ,1]
get.sex <- function(x, ref)
for (i in ref)
{
if(x %in% ref)
{return (1)}
}
# set default for column
train$sex <- 2
# Update column if it appears in the names list
train$sex <- sapply(train$sex, FUN=get.sex(x,male_names))
I would then use the function to run the second Girls Name file against the table and set the flag for each record to zero where that occurs
Can anyone help
When using sapply, you don't write arguments directly in the FUN parameter.
train$sex <- sapply(train$sex, FUN=get.sex,ref = male_names)
It is implied that train$sex is the x argument, and all other parameters are passed after that (in this case, it's just ref) and are explicitly defined.
Edit:
As joran noted, in this case sapply isn't particularly useful, and you can do the results in one line:
train$sex = (train$sex %in% male_names)*1
%in% can be used when the argument on the left is a vector, so you don't have to loop over it. Multiplying the result by one converts logical (boolean) values into integers. 1*TRUE yields 1, and 1*FALSE yields 0.

changing first colname in a list of dataframes

I've got a list of dataframes and am trying to change the first colname using the lapply method
frames<-lapply(frames,function(x){ colnames(frames[[x]])[1]<-"date"})
is returning the error
Error in `*tmp*`[[x]] : invalid subscript type 'list'
I am not sure why it would produce this error as my understanding is that this should apply
colname[1]<-"date"
to every data frame in the list
If anyone can tell me the root of this error I would be very grateful!
You do not need to reference the frames list inside of lapply. Your function treats x as an element in the list, frames. Try this:
frames <- lapply(frames, function(x) { colnames(x)[1] <- "date"; return(x) })

Resources