rename columns for group of data frames using name() - r

I would like to rename a bunch of data frames with name function but not able to use lapply or loop.
I have group of data frames name qcew.2007, qcew.2014, etc... I have vector with name I would like all of data frame to have. They are all the same. the vector is name colnm:
colnm = c("area_fips" , "own_code", "industry_code", "agglvl_code") # example shortened
# groups has names of all data frames and goes to 2013
group =c("qcew.2007", "qcew.2008", "qcew.2009")
# using lapply
names <- lapply(group, function(d){
n = paste0(d)
names(n) = colnm
})
# using loop does not work either
for (i in seq(group)) {
names(group[[i]]) = colnm
}
Neither option works, as it is saying I am comparing vectors with uneven lengths. I must be missing something obvious. Thanks

Here you go. You need to use get otherwise you're assigning names to the character vectors in group:
# sample data
qcew.2007 <- data.frame(a=1, b=2, c=3, d=4)
qcew.2008 <- data.frame(a=3, b=4, c=5, d=6)
qcew.2009 <- data.frame(a=5, b=6, c=7, d=8)
for(i in 1:3)
assign(group[i], `names<-`(get(group[i]), colnm))
names(qcew.2007)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
names(qcew.2008)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
names(qcew.2009)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
Here you use get to get the object named in each position in group and then use assign to reassign the modified object (modified by changing column names) back into that named object.

Also:
list2env(lapply(mget(group), setNames, colnm),envir=.GlobalEnv)
names(qcew.2007)
#[1] "area_fips" "own_code" "industry_code" "agglvl_code"

Related

R - Why does my function appear to work but not update tibbles within a list?

I have a list of tibbles and use the code below
I expect that each tibble in the list will have a column added with a factor (one of 16 levels). I can see exactly what I want being printed to the console, but in the global environment, my list of tibbles remains the same. What am I doing wrong?
fn <- function(df){
df$col1 = cut(df$col1, 16)
return(df)
for (df in listoftibbles){
df <- fn(df)
print (df)
}
In the for loop, it is not updating the element in listoftibbles i.e.. if we want to do this, loop over the sequence of the object and update by assignment
for(ind in seq_along(listoftibbles)) {
listoftibbles[[ind]] <- fn(listoftibbles[[ind]])
}
The for loop in R is a for each loop which returns the values of the list element. The 'df' object is a temporary object created on the fly. This can be checked via ls (suppose there are 2 list elements)
> ls(pattern = '^df$')
character(0) # no object
> for(df in listoftibbles) print(ls(pattern = '^df$'))
[1] "df"
[1] "df"
> ls(pattern = '^df$')
[1] "df" # now there is an object
The value of the object 'df' will be the last tibble of listoftibbles
and the address can be checked as well
> for(df in listoftibbles) print(tracemem(df))
[1] "<0x7fe59eb4f6c0>"
[1] "<0x7fe598361ac8>"
> tracemem(df) # last object created
[1] "<0x7fe598361ac8>"
We can use lapply (First posted here)
listoftibbles <- lapply(listoftibbles, fn)
Or this doesn't need any function
listoftibbles <- lapply(listoftibbles, transform, col1 = cut(col1, 16))
or with map
library(dplyr)
library(purrr)
listoftibbles <- map(listoftibbles, mutate, col1 = cut(col1, 16))
Assign the output back to listoftibbles.
for (i in seq_along(listoftibbles)) {
listoftibbles[[i]] <- fn(listoftibbles[[i]])
print (listoftibbles[[i]])
}

Standardizing and renaming variables in a data.frame in R?

I'm trying to standardize all variables in a given data.frame, and add these standardized variables to the original data.frame with some prefix name like "s." (e.g., if original variable's name was wt, the standard one is s.wt).
My function below does that but I'm wondering why I can NOT access the new standardized variables? (see example below)
standard <- function(dataframe = mtcars){
var.names <- names(dataframe)
dataframe$s <- as.data.frame(lapply(dataframe[, var.names], scale))
dataframe
}
# Example:
(d <- standard()) # HERE I see the new standardized variables with prefix "s."
d$s.wt # NULL # but HERE I can't access the standard variables!!!
Assuming that we need to have new columns with suffix 's', in the OP's function, the assignment is to a single column i.e. dataframe$s while the number of scaled columns returned by the list is the same as the number of columns in the dataset. So, we can use paste to create new column with suffix 's'
standard <- function(dataframe = mtcars){
var.names <- names(dataframe)
dataframe[paste0("s.", var.names)] <- lapply(dataframe[var.names], function(x) c(scale(x)))
dataframe
}
standard()$s.wt
#[1] -0.610399567 -0.349785269 -0.917004624 -0.002299538 0.227654255 0.248094592 0.360516446 -0.027849959 -0.068730634 0.227654255 0.227654255
#[12] 0.871524874 0.524039143 0.575139986 2.077504765 2.255335698 2.174596366 -1.039646647 -1.637526508 -1.412682800 -0.768812180 0.309415603
#[23] 0.222544170 0.636460997 0.641571082 -1.310481114 -1.100967659 -1.741772228 -0.048290296 -0.457097039 0.360516446 -0.446876870
NOTE: The output of scale is a matrix with a single column, By using c, it is converted to a vectorr
Also, we can apply the function on the entire dataset
mtcars[paste0("s.", names(mtcars))] <- scale(mtcars)
identical(mtcars$s.wt, standard()$s.wt)
#[1] TRUE

How to batch process some frames with different dimension but same name pattern some how by R

In the R environment, I have already have some variable, their name:
id_01_r
id_02_l
id_05_l
id_06_r
id_07_l
id_09_1
id_11_l
So, their pattern seems like id_ and follows two figures, then _ and r or l randomly.
Each of them corresponds to one frame but different dim() output.
Also, there are some other variables in the environment, so first I should extract these frames. For this, I'm going to adopt:
> a <- list(ls()[grep("id*",ls())])` #a little sample for just id* I know
But, this function put them as one element, so I don't think it's good way
> length(a) [1] 1
I know how to read them in like below, but now for extact and same processes, I'm so confused.
i_set <- Sys.glob(paths='mypath/////id*.txt')
for (i in i_set) {
assign(substring(i, startx, endx),read.table(file=i,header=F))
}
Here, the key point is I want to do a series of same data processing for each of these frames. But based on these, what can I do instead of one by one?
Thanks your kind consideration.
Here is an example:
id_01_r <- iris
id_02_l <- mtcars
foo <- 42
vars <- grep("^id_\\d{2}_[rl]$", ls(), value = TRUE)
# [1] "id_01_r" "id_02_l"
process_data <- function(df) {
dim(df)
}
processed_data <- lapply(
mget(vars),
process_data
)
# $id_01_r
# [1] 150 5
#
# $id_02_l
# [1] 32 11

Repeat n times a subset and plot in R

first of all thanks to this forum because I've finded a lot of answers!!
Now my time to ask for help.
I can solve this.... function?, loop?... didn't find a good example
# from a data.frame = data
# A:F are name of columns
x<-unique(data$A) # in the example c('var1','var2','var3','var4')
y<-unique(data$B) # in the example c('varA','varB')
z<-unique(data$C) # in the example c('var1a','var2a')
# I NEED TO REPEAT THIS (based on x_y_z combinations)#
x_y_z<-subset(data,data$A==x & data$B==y & data$C==z)
plot_x_y<-qic(y=D
,n=E
,x=F
,data=x_y_z
,chart = 'p')
The idea is to repeat the subsetting and make a plot for each x_y_z combination. The subset should have the name of the variables combinated separated by a '_'.
I guess it should work like this:
var1_varA_var1a<-subset(data,data$A==var1 & data$B==varA & data$C==var1a)
plot_var1_varA<-qic(y=D
,n=E
,x=F
,data=var1_varA_var1a
,chart = 'p')
And obtain all this plots:
plot_var1_varA_var1a
plot_var1_varB_var1a
plot_var1_varA_var2a
plot_var1_varB_var2a
plot_var2_varA_var1a
plot_var2_varB_var1a
plot_var2_varA_var2a
plot_var2_varB_var2a
plot_var3_varA_var1a
plot_var3_varB_var1a
plot_var3_varA_var2a
plot_var3_varB_var2a
plot_var4_varA_var1a
plot_var4_varB_var1a
plot_var4_varA_var2a
plot_var4_varB_var2a
Sorry for the basic question, but I'm stuck on this.
Cristobal
Consider by which slices like subset but allows dataframe operations in its FUN arg. And use a list of plots instead of many separately named plot objects.
plot_list <- by(data, data[, c("A","B","C")], FUN = function(df) {
qic(y = D,n = E,x = F, data = df, chart = 'p')
})
Should you want to rename this list:
dfnames <- expand.grid(x,y,z)
listnames <- vapply(1:nrow(dfnames), function(i)
paste(dfnames$Var1[[i]], dfnames$Var2[[i]], dfnames$Var3[[i]], sep="_"), character(1))
# [1] "var1_varA_var1a" "var2_varA_var1a" "var3_varA_var1a" "var4_varA_var1a" "var1_varB_var1a"
# [6] "var2_varB_var1a" "var3_varB_var1a" "var4_varB_var1a" "var1_varA_var2a" "var2_varA_var2a"
# [11] "var3_varA_var2a" "var4_varA_var2a" "var1_varB_var2a" "var2_varB_var2a" "var3_varB_var2a"
# [16] "var4_varB_var2a"
# RENAME LIST ELEMENTS
plot_list <- setNames(plot_list, paste0("plot_", listnames))
plot_list$plot_var1_varA_var1a # FIRST PLOT
plot_list$plot_var2_varA_var1a # SECOND PLOT
...

how to add value to existing variable from inside a loop?

I want to add a computed value to an existing vector from within a loop in which the wanted vector is called from within the loop . that is im looking for some function that is similar to assign() function but that will enable me to add values to an existing variables and not creating new variables.
example:
say I have 3 variabels :
sp=3
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name,rnorm(5))
}
and now I want to access the last value in each of the variabels, double it and add the resault to the vector:
for(i in 1:sp){
name<-paste("sp",i,sep="")
name[6]<-name[5]*2
}
the problem here is that "name" is a string, how can R identify it as a veriable name and access it?
What you are asking for is something like this:
get(name)
In your code it would like this:
v <- 1:10
var <- "v"
for (i in v){
tmp <- get(var)
tmp[6] <- tmp[5]*2
assign(var, tmp)
}
# [1] 1 2 3 4 5 10 7 8 9 10
Does that help you in any way?
However, I agree with the other answer, that lists and the lapply/sapply-functions are better suited!
This is how you can do this with a list:
sp=3
mylist <- vector(mode = "list", length = sp) #initialize a list
names(mylist) <- paste0("sp",seq_len(sp)) #set the names
for(i in 1:sp){
mylist[[i]] <- rnorm(5)
}
for(i in 1:sp){
mylist[[i]] <- c(mylist[[i]], mylist[[i]][5] * 2)
}
mylist
#$sp1
#[1] 0.6974563 0.7714190 1.1980534 0.6011610 -1.5884306 -3.1768611
#
#$sp2
#[1] -0.2276942 0.2982770 0.5504381 -0.2096708 -1.9199551 -3.8399102
#
#$sp3
#[1] 0.235280995 0.276813498 0.002567075 -0.774551774 0.766898045 1.533796089
You can then access the list elements as described in help("["), i.e., mylist$sp1, mylist[["sp1"]], etc.
Of course, this is still very inefficient code and it could be improved a lot. E.g., since all three variables are of same type and length, they really should be combined into a matrix, which could be filled with one call to rnorm and which would also allow doing the second operation with vectorized operations.
#Roland is absolutely right and you absolutely should use a list for this type of problem. It's cleaner and easier to work with. Here's another way of working with what you have (It can be easily generalised):
sp <- replicate(3, rnorm(5), simplify=FALSE)
names(sp) <- paste0("sp", 1:3)
sp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149
newsp <- lapply(sp, function(x){x[6] <- x[5]*2; x})
newsp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466 -1.7340933
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023 -1.5066046
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149 -1.3226297
EDIT: If you are truly, sincerely dedicated to doing this despite being recommended otherwise, you can do it this way:
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name, `[<-`(get(name), 6, `[`(get(name), 5) * 2))
}

Resources