R: using gsub in a for loop - r

Low level R user here.
I have 3 population data frames (low.proj, med.proj, high.proj) with the exact same number of rows and columns I'm trying to clean and reshape.
I want to eliminate some extra commas in the country column all three of the frames so I'm trying this loop with gsub:
for(i in c("low.proj", "med.proj", "high.proj")){
i$Country <- gsub(",","",i[,"Country"])
}
When I run this I get the error "Error in i[, "Country"] : incorrect number of dimensions"
When I run the code without the loop:
low.proj$Country <- gsub(",","",low.proj[,"Country"])
It works. What causes this error and how do I fix it?

In order to retrieve the contents of the object with the string contained in i use get() to put new data in that object use assign
for(i in c("low.proj", "med.proj", "high.proj")){
tmp <- get(i)
tmp$Country <- gsub(",","",tmp[,"Country"])
assign(i, tmp)
}

You're indexing the wrong variable:
i$Country <- gsub(",","",i[,"Country"])
i is a string, so i$Country doesn't have any meaning.

Related

R - Dataframe - Replace characters in a string using two columns containing lists of positions and characters

First time poster, so please let me know how to improve my question if needed. Certainly eager to improve.
I have a data frame that has a column of strings that I need to replace characters in using a column containing a list of multiple positions and another column with a list of characters for those positions. Using example data:
#create the values to build the data frame for use case
food <-"pasta"
string <-"bacorogi"
pos <- c(1,4,7)
chars <- c("m","a","n")
#convert vectors to lists
poslist <-list(pos)
charlist <-list(chars)
#create data frame
df <-data.frame(cbind(food,
poslist,
charlist,
string))
I figured out how to do this when the string, positions, and characters exist as separate vectors using:
for(i in seq_along(pos)) substring(string, pos[i], pos[i]) <- chars[i]
string
[1] "macaroni"
When I try to apply this to the data frame I run into an error:
for(i in seq_along(df$pos)) substring(df$string, df$pos[i], df$pos[i]) <- df$chars[i]
Error in `substring<-`(`*tmp*`, df$pos[i], df$pos[i], value = df$chars[i]) :
(list) object cannot be coerced to type 'integer'
To try to properly apply this to a data frame, I tried below and got an error:
for(i in seq_len(nrow(df))) substring(df$string, df$poslist[i], df$poslist[i]) <- df$charlist[i]
Error in `substring<-`(`*tmp*`, df$poslist[i], df$poslist[i], value = df$charlist[i]) :
(list) object cannot be coerced to type 'integer'
I am not really sure how to get around this problem or how to adapt this to a data frame.
I do have more rows in my data frame, but I figured if someone could help me figure out how to do this to one row, I could take it from there. Thanks for any input you can provide!
The nesting is one level deep. So, extract that element and loop
for(i in seq_along(df$pos[[1]])) {
substring(df$string[[1]], df$pos[[1]][i],
df$pos[[1]][i]) <- df$charlist[[1]][i]
}
-output
df$string
#$string
#[1] "macaroni"
If there are more rows, do a nested loop
for(i in seq_along(df$pos)) {
for(j in seq_along(df$pos[[i]])) {
substring(df$string[[i]], df$pos[[i]][j],
df$pos[[i]][j]) <- df$charlist[[i]][j]
}
}
df$string
#$string
#[1] "macaroni"

I keep getting the error message"Error in eff_weights[i, ] <- eff.port$pw : number of items to replace is not a multiple of replacement length"

I am new to R and really need some help. I keep getting the error message
"Error in eff_weights[i, ] <- eff.port$pw : number of items to
replace is not a multiple of replacement length"
when I run the loop. Can someone help me figure out what I am doing wrong. Thank you so much in advance!
# Create for loop to find efficient frontier
for (i in 1 : length(grid)) {
eff.port <- portfolio.optim(returns, pm = grid[i], shorts =TRUE)
vector_pm[i] <- eff.port$pm
vector_psd[i] <- eff.port$ps
eff_weights[i, ] <- eff.port$pw
}
Without a sample of your data or dummy data to reproduce the problem it is hard to provide a certain solution. However, in your loop you assign a vector of values from a column, eff.port$pw, to the ith row of a dataframe or matrix, eff_weights[i, ]. The error message is saying the are different lengths - use the length() or dim() functions to compare the lengths of these two. Your vector eff.port$pw and row eff_weight[i,] must be the same length.

How to transfer multiple columns into numeric & find correlation coefficients

I have a dataset "res.sav" that I read in via haven. It contains 20 columns, called "Genes1_Acc4", "Genes2_Acc4" etc. I am trying to find a correlation coefficient between those and another column called "Condition". I want to separately list all coefficients.
I created two functions, cor.condition.cols and cor.func to do that. The first iterates through the filenames and works just fine. The second was supposed to give me my correlations which didn't work at all. I also created a new "cor.condition.Genes" which I would like to fill with the correlations, ideally as a matrix or dataframe.
I have tried to iterate through the columns with two functions. However, when I try to pass it, I get the error: "NAs introduced by conversion". This wouldn't be the end of the world (I tried also suppressWarning()). But the bigger problem I have that it seems like my function does not convert said columns into the numeric type I need for my cor() function. I receive the "y must be numeric" error when trying to run the cor() function. I tried to put several arguments within and without '' or "" without success.
When I ran str(cor.condition.cols) I only receive character strings, which makes me think that my function somehow messes up with the as.numeric function. Any suggestions of how else I could iter through these columns and transfer them?
Thanks guys :)
cor.condition.cols <- lapply(1:20, function(x){paste0("res$Genes", x, "_Acc4")})
#save acc_4 columns as numeric columns and calculate correlations
res <- (as.numeric("cor.condition.cols"))
cor.func <- function(x){
cor(res$Condition, x, use="complete.obs", method="pearson")
}
cor.condition.Genes <- cor.func(cor.condition.cols)
You can do:
cor.condition.cols <- paste0("Genes", 1:20, "_Acc4")
res2 <- as.numeric(as.matrix(res[cor.condition.cols]))
cor.condition.Genes <- cor(res2, res$Condition, use="complete.obs", method="pearson")
eventually the short variant:
cor.condition.cols <- paste0("Genes", 1:20, "_Acc4")
cor.condition.Genes <- cor(res[cor.condition.cols], res$Condition, use="complete.obs")
Here is an example with other data:
cor(iris[-(4:5)], iris[[4]])

How to remove the first row from multiple dataframes?

I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions
If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.
say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}

Is there a more efficient/clean approach to an eval(parse(paste0( set up?

Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.
Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )
#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`

Resources