Dynamically assign variable names for vectors in R? - r

I'm new to R and I am trying to create variables referencing vectors within a for loop, where the index of the loop will be appended to the variable name. However, the following code below, where I'm trying to insert the new vectors into the appropriate place in the larger data frame, is not working and I've tried many variations of get(), as.vector(), eval() etc. in the data frame construction function.
I want num_incorrect.8 and num_incorrect.9 to be vectors with a value of 0 and then be inserted into mytable.
cols_to_update <- c(8,9)
for (i in cols_to_update)
{
#column name of insertion point
insertion_point <- paste("num_correct",".",i,sep="")
#create the num_incorrect col -- as a vector of 0s
assign(paste("num_incorrect",".",i,sep=""), c(0))
#index of insertion point
thespot <- which(names(mytable)==insertion_point)
#insert the num_incorrect vector and rebuild mytable
mytable <- data.frame(mytable[1:thespot], as.vector(paste("num_incorrect",".",i,sep="")), mytable[(thespot+1):ncol(mytable)])
#update values
mytable[paste("num_incorrect",".",i,sep="")] <- mytable[paste("num_tries",".",i,sep="")] - mytable[paste("num_correct",".",i,sep="")]
}
When I look at how the column insertion went, it looks like this:
[626] "num_correct.8"
[627] "as.vector.paste..num_incorrect........i..sep........2"
...
[734] "num_correct.9"
[735] "as.vector.paste..num_incorrect........i..sep........3"
Basically, it looks like it's taking my commands as literal text. The last line of code works as expected and creates new columns at the end of the data frame (since the line before it didn't insert the column into the proper place):
[1224] "num_incorrect.8"
[1225] "num_incorrect.9"
I am kind of out of ideas, so if someone could please give me an explanation of what's wrong and why, and how to fix it, I would appreciate it. Thanks!

The mistake is in the second last lines of your code, excluding the comments where you are creating the vector and adding it to your data frame.
You just need to add the vector and update the name. You can remove the assign function as it's not creating a vector instead just assigning a value of 0 to the variable.
Instead of the second last line of your code put the code below and it should work.
#insert the vector at the desired location
mytable <- data.frame(mytable[1:thespot], newCol = vector(mode='numeric',length = nrow(mytable)), mytable[(thespot+1):ncol(mytable)])
#update the name of new location
names(mytable)[thespot + 1] = paste("num_incorrect",".",i,sep="")

Related

How to create a new variable in R that returns 1 if a case has a missing value while another variable has an observed value?

I have two variables containing missing data loon and profstat. For a better overview of the data that are missing and are needed to impute, I wanted to create an additional variable problem in the data frame, that would return for each case 1 if loon is missing and profstat is observed, and 0 if otherwise. I have generated the following code, which only gives me as output x[] = 1. Any solution to this problem?
{
problem <- dim(length(t))
for (i in 1:nrow(dflapopofficial))
{
if (is.na(dflapopofficial$loon[i])==TRUE & is.na(dflapopofficial$profstat[i])==FALSE) {
dflapopofficial$problem[i]=1
} else {
dflapopofficial$problem[i]=0
}
return(problem)
}
There are a few things that could be improved here:
Remember, many operations in R are vectorized. You don't need to loop through each element in a vector when doing logical checks etc.
is.na(some_condition) == TRUE is just the same as is.na(some_condition) and is.na(some_condition) == FALSE is the same as !is.na(some_condition)
If you want to write a new column inside a dataframe, and you are referring to several variables in that dataframe, using within can save you a lot of typing - particularly if your dataframe has a long name
You are returning problem, yet in your loop, you are writing to dflapipofficial$problem which is a different variable.
If you want to write 1s and 0s, you can implicitly convert logical to numeric using +(logical_vector)
Putting all this together, you can replace your whole loop with a single line:
within(dflapopofficial, problem <- +(is.na(loon) & !is.na(profstat)))
Remember to store the result, either back to the dataframe or to a copy of it, like
df <- within(dflapopofficial, problem <- +(is.na(loon) & !is.na(profstat)))
So that df is just a vopy of dflapopofficial with your extra column.

How to remove the first row from multiple dataframes?

I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions
If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.
say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}

Cannot assign column name by indexing within function argument

I am learning R, and I am trying to understand the indexing properties. I cannot seem to understand why the following code to change a column name does not work:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# why the following row does not work?
names(state.all["States"]) <- "Test"
colnames(state.all)
While this works:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# This work
names(state.all)[which(colnames(state.all)=="States")] <- "Test"
colnames(state.all)
Shouldn't the function be able to overwrite the name of the column also in the first example? Is it something to do with the local vs. global environment?
Thanks in advance!
What you're trying to do is replacing the name of column number 9.
the expression which(colnames(state.all)=="States") results in the index if the column named "States" (if there is any) and then takes this index and replaces the value in the names vector.
the expression state.all["States"] just returns the values of this column so of course nothing will happen.
I suggest something like colnames(state.all)[which(colnames(state.all)=="States")] <- "Test".

Using Loop variable to access and write specific data.frames

I wrote a script, that reads CSV-Data with help of user input. For example when the user enters "20 40 160" the CSV files 1, 2 and 3 are read and saved as the data.frames d20, d40 and d160 in my global enviroment/workspace. The variable vel has the values for the user input.
Now for the actual question:
Im trying to manipulate the read data in a loop with the vel variable. For example:
for (i in vel)
{
newVariable"i" <- d"i"[6]
}
I know thats not the correct syntax for the programming, but what im trying to do ist to write a newVariable with a specific row from a specific data frame d.
The result should be:
newVariable20 = d20[20]
newVariable40 = d40[20]
newVariable160 = d160[20]
So I think the actual question is, how do I use the Loop Variable for calling out the names of the created data frames and for writing new variables.
There are a couple of ways to do this. One is to store all of your dataframes in a list originally. There are a couple ways to do this. Start with an empty list and then put each df into the next position in the list. Note that you have to use list(df) because a dataframe is actually already a list and gets messed up if you don't do this.
list_of_df <- list();
list_of_df[1] <- list(df1);
list_of_df["df20"] <- list(df2)
This makes it easy to loop through the dataframes. If you want column 4 of dataframe 2 you just put in
list_of_df[[2]][,4]
# Same thing different code
list_of_df[["df20"]][,4]
The double brackets [[2]] give you the value that is stored in the list at position 2 (instead of [2] which gives you a list containing the value and metadata). The next [,4] says that from the dataframe we just got the value of, we now want to get every row of the 4th column. Note that this will output a vector and not a dataframe.
Or in a loop:
for(df in list_of_df) {
print(df)
}

R: add column to dataframe, named based on formula

More 'feels like it should be' simple stuff which seems to be eluding me today. Thanks in advance for assistance.
Within a loop, that's within a function, I'm trying to add a column, and name it based on a formula.
I can bind a column & its name is taken from the bound object: data<-cbind(data,bothdata)
I can bind a column & manually name the bound object: data<-cbind(data,newname=bothdata)
I can bind a column which is the product of an equation & manually name the bound object: data<-cbind(data,newname2=bothdata-1)
Or another way: data <- transform(data, newColumn = bothdata-1)
What I can't do is have the name be the product of a formula. My actual formula-derived example name is paste("E_wgt",rev(which(rev(Esteps) == q))-1,"%") & equation for column: baddata - q.
A simpler one: data<-cbind(data,paste("magic",100,"beans")=bothdata-1). This fails because cbind isn't expecting the = even though it's fine in previous examples. Same fail for transform.
My first thought was assign but while I've used this successfully for creating forumla-named objects, I can't see how to get it to work for formula-named columns.
If I use an intermediary step to put the naming formula in an object container then use that, e.g.:
name <- paste("magic",100,"beans")
data<-cbind(data,name=bothdata-1)
the column name is "name" not "magic100beans". If I assign the equation result to an formula-named object:
assign(paste("magic",100,"beans"),bothdata-1)
Then try to cbind that via get:
data<-cbind(data,get(paste("magic",100,"beans")))
The column is called "get(paste("magic",100,"beans"))". Boo! Any thoughts anyone? It occurs to me that I can do cbind then separately colnames(data)[ncol(data)] <- paste("magic",100,"beans")) which I guess I'll settle for for now, but would still be interested to find if there was a direct way.
Thanks.
Chances are that cbind is overkill for your use case. In almost every instance, you can simply mutate the underlying data frame using data$newname2 <- data$bothdata - 1.
In the case where the name of the column is dynamic, you can just refer to it using the [[ operator -- data[["newcol"]] <- data$newname + 1. See ?'[' and ?'[.data.frame' for other tips and usages.
EDIT: Incorporated #Marek's suggestion for [["newcol"]] instead of [, "newcol"]
It may help you to know that data$col1 is the same than data[,"col1"] which is the same than data[,x] if x is "col1". This is how I usually access/set columns programmatically.
So this should work:
name <- paste("magic",100,"beans")
data[,name] <- obsdata-1
Note that you don't have to use the temporary variable name. This is equivalent to:
data$magic100beans <- obsdata-1
Itself equivalent, for a data.frame, to:
data<-cbind(data, magic100beans=bothdata-1)
Just so you know, you could also set the names afterwards:
old_names <- names(data)
name <- paste("magic",100,"beans")
data <- cbind(data, bothdata-1)
data <- setNames(data, c(old_names, name))
# or
names(data) <- c(old_names, name)

Resources