R looping through categorical variables

R looping through categorical variables - r

I have the variables "data$LINE" and "data$STATE" and I want to use a loop to rename them with out the "data$". Then I want the same loop to input "LINE" and "STATE" into my code, but I also want "data$LINE" and "data$STATE" to run in the loop as well. The reason is that I have two graphing functions, one depending one the name i.e. "LINE", the other depending on "data$LINE" and I want these two graphs to run for "LINE" and "STATE" so after the loop, 4 graphs are made.
The code is:
for(variable in c(data$LINE, data$STATE)) {
variable_name <- sub("data$", replacement = "", "data$variable", fixed = TRUE)
whatevergraphingfunction(variable)
differentgraphingfunction(variable_name)
}
This is not working right (it doesn't seem to create the 2 variable_names), any help would be appreciated.

I think your issue is in for(variable in c(data$LINE, data$STATE)) , because you are trying to concatenate 'symbols' start with strings for the for function, and them convert them to names after sub() has run.
#start with variables as strings
for(variable in c("data$LINE", "data$STATE")) {
#Remove the 'data$'
variable_name <- sub("data$", replacement = "", variable, fixed = TRUE)
# convert strings back to names
as.name(variable)
as.name(variable_name)
# Graph
whatevergraphingfunction(variable)
differentgraphingfunction(variable_name)
}
Without knowing how LINE and STATE will be used, I am not sure if you need to call them as names. If not, just delete the as.name(variable_name) line.
Let me know if this worked for you.

Related

R- Mutate multiple iterations with incremental variable name within for-loop

I am attempting to have multiple mutates within a for loop, but each of the mutate should create a new variable in an increasing numeric. Am I able to do this within one block of code without repeating the same line with the new variable name? In this example, I am attempting to find the datediff between 2 dates within each mutuate function
ex:
for (i in c(1:nrow(pd)))
{
result <- pd %>%
mutate(n1.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n1.Status_dt, units = c("days"))),
n2.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n2.Status_dt, units = c("days"))),
n3.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n3.Status_dt, units = c("days"))))
}
Ideally, I'd like one line where I'm able to loop and create n1-n3 without writing this in 3 lines.

The across pattern recommended by #akrun is the way such problems are intended to be solved. But if you want to use a for loop, perhaps you are looking for something like the following:
in_cols = c("n1.Stats_dt", "n2.Stats_dt", "n3.Stats_dt")
out_cols = c("n1.DateDiff", "n2.DateDiff", "n3.DateDiff")
for(ii in 1:3){
this_in = in_cols[ii]
this_out = out_cols[ii]
df = df %>%
mutate(!!sym(this_out) := difftime(pd, !!sym(this_in)))
}
Notes:
!!sym(.) is used to turn a text string into a variable
:= is equivalen to = but allows us to use !!sym(.) on the left-hand side

Problems with renaming columns via variables in R

I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"

We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"

How to give a formula values in a loop

I have a program that is supposed to create a pdf file of actograms given a csv of activity and time. I need to loop through multiple activity columns, one for each subject. The first activity column is column 3. Here is the relevant code:
pdf("All Actograms.pdf")
for(i in 3:(length(dat) - 1)) {
activity <- colnames(dat)[i]
# Plot the actogram
print(actogram(activity~datetime, dat=dat, col="black", main=colnames(dat)[i], strip.left.format="%m/%d", doublePlot = TRUE, scale=0.75))
}
dev.off()
When I call my actogram function, I get the error "non-numeric argument to binary operator." The problem is the formula "activity~datetime," because datetime is a column name and activity should be too. If I try it out of the loop, with the name of an activity column rather than a variable containing the name, it works fine. Upon debugging, I found the actogram function is receiving the string "activity," rather than the variable activity. I don't really understand formulas, but I want to know if there's any way to accomplish what I'm trying to do, which is loop through many columns, changing the column before the "~" each time I call the actogram function. I'm very new to R.
Thanks!

We do not have the data you are working on but I think the simplest thing you can do is the following:
pdf("All Actograms.pdf")
for(i in 3:(length(dat) - 1)) {
activity <- colnames(dat)[i]#save the name of the column I
colnames(dat)[i] <- "activity" # change the name of column I to activity
# Plot the actogram
print(actogram(activity~datetime, dat=dat, col="black", main=activity, strip.left.format="%m/%d", doublePlot = TRUE, scale=0.75))
colnames(dat)[i] <- activity # change back the name of the column I to its original name
}
dev.off()
Hopefully it works.

Concatenating string names of variables matlabfile in R

I have matlab files with an integer in each name of my variables inside (except the first one). I want to loop to concatenate the name of the integers.
There is my code:
library('R.matlab')
mat <- readMat('SeriesContPJM.mat')
#str(mat)
#typeof(mat)
#mat[[1]]
write.csv(mat$vol.PJM$data[[4]][[1]], "PJM.csv")
i = 2
while (i < 7)
{
write.csv(get(paste("mat$vol.PJM", as.character(i), "$data[[4]][[1]]", sep = "")), paste(paste("PJM", as.character(i), sep="_"), "csv", sep ="."))
i = i + 1
}
I have write.csv(mat$vol.PJM$data[[4]][[1]], "PJM.csv") which gives me the good ouput. I would like the same for the other variable names in the loop but I get the following ouput:
+ Error in get(paste("mat$vol.PJM", as.character(i), "$data[[4]][[1]]", (from importpjm.R#10) :
objet 'mat$vol.PJM2$data[[4]][[1]]' introuvable
"introuvable" means "not found" in French.

Here you're mixing where you need to use get and where you need to use eval(parse()).
You can use get with a string variable game, e.g., get("mtcars"), but [ is a function that needs to be evaluated.
get("mtcars[2, 2]") won't work because you don't have a variable named "mtcars[2, 2]", you have a variable named "mtcars" and a function named "[" that can take arguments 2, 2.
eval(parse(text = "mtcars[2, 2]")) will work because it doesn't just look for a variable, it actually evaluates the text string as if you typed it into the command line.
So, you could rewrite your loop, replacing the get(...) with eval(parse(text = ...)) and it would probably work, assuming the strings you've pasted together have the right syntax. But this can be difficult to read and debug. In 6 months, if you look back at this code and need to understand it or modify it, it will be confusing.
Another way to do it would be to use [[ with strings to extract sublists. Rather than mess with eval(parse()) I would do this:
vols = paste0("vol.PJM", 2:7)
for (vol in vols) {
write.csv(mat[[vol]][["data"]][[4]][[1]],
paste0(vol, ".csv"))
}
I think it's more readable, and it's easy to debug the vols vector beforehand to make sure all your names are correct. Similarly, if you want to loop over all elements, you could initialize the vols as something like names(mat), or use some sort of regex criteria to extract the appropriate sublists from names(mat).

The way R handles subseting

I'm having some trouble understanding how R handles subsetting internally and this is causing me some issues while trying to build some functions. Take the following code:
f <- function(directory, variable, number_seq) {
##Create a empty data frame
new_frame <- data.frame()
## Add every data frame in the directory whose name is in the number_seq to new_frame
## the file variable specify the path to the file
for (i in number_seq){
file <- paste("~/", directory, "/",sprintf("%03d", i), ".csv", sep = "")
x <- read.csv(file)
new_frame <- rbind.data.frame(new_frame, x)
}
## calculate and return the mean
mean(new_frame[, variable], na.rm = TRUE)*
}
*While calculating the mean I tried to subset first using the $ sign new_frame$variable and the subset function subset( new_frame, select = variable but it would only return a None value. It only worked when I used new_frame[, variable].
Can anyone explain why the other subseting didn't work? It took me a really long time to figure it out and even though I managed to make it work I still don't know why it didn't work in the other ways and I really wanna look inside the black box so I won't have the same issues in the future.
Thanks for the help.

This behavior has to do with the fact that you are subsetting inside a function.
Both new_frame$variable and subset(new_frame, select = variable) look for a column in the dataframe withe name variable.
On the other hand, using new_frame[, variable] uses the variablename in f(directory, variable, number_seq) to select the column.

The dollar sign ($) can only be used with literal column names. That avoids confusion with
dd<-data.frame(
id=1:4,
var=rnorm(4),
value=runif(4)
)
var <- "value"
dd$var
In this case if $ took variables or column names, which do you expect? The dd$var column or the dd$value column (because var == "value"). That's why the dd[, var] way is different because it only takes character vectors, not expressions referring to column names. You will get dd$value with dd[, var]
I'm not quite sure why you got None with subset() I was unable to replicate that problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R looping through categorical variables - r

Related

R- Mutate multiple iterations with incremental variable name within for-loop

Problems with renaming columns via variables in R

How to give a formula values in a loop

Concatenating string names of variables matlabfile in R

The way R handles subseting

Categories

Resources