Parsing colnames text string as expression in R - r

I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))

You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.

Related

Using loop to create modicate dataframe

I have been struggling with finding a way to create a new data frame using a loop, where the main goal is to filter the data when is >= 0.5.
I´m using Rstudio; however, python is an option too.
Here is how looks like my data frame (csv file) and some lines of the script (incomplete):
df <- read.table(choose.files(), header = T, sep = ",", comment.char = "")
Site,Partition,alpha,beta,omega,alpha=beta,LRT,p-value,Total branch length
1,1,"0.000","0.000","NaN","0.000","0.000","1.000","0.000"
2,1,"0.060","0.046","0.774","0.048","0.049","0.825","0.000"
Then I use select function to take only two columns that interest me:
sdf <- subset(df, select = c("ï..Site", "alpha.beta"))
ï..Site alpha.beta
1 1 0.000
2 2 0.048
...
Then I thought in use a loop to create a new csv file, when the second column has a value >= 0.5 print this value, it doesn´t have a value that satisfies this requisite pass and print a 0.
Here I try differents ways; obviously neither works for me. Here are the last lines that I tried.
for (i in names(sdf1)) {
f_sdf1 <- sdf1[sdf1[, i] >= 0.5]
write.csv(f_sdf1, paste0(i, ".csv"))
}
So in this post I´m looking for some ideas to generate this script. Maybe it´s simple, but in this case, I need to ask how?
You could use subset to filter your data as in
# first get some example data
expl <- data.frame(site = 1:10, alpha.beta = runif(10))
print(expl)
# now do the filtering
expl.filtered = subset(expl, alpha.beta >= .5)
print(expl.filtered)
# Now write.table or write.csv...

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

New data frame after function is empty

I prepare a function to have a temporary dataframe, but whent i apply this function on my old dataframe , the temporary dataframe is empty. How can i solve this ?
I tried this code :
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat["vname"]
locci_1 <- sample(dat["loc1"], replace = F)
locci_2 <- sample(dat["loc2"], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= "data_a",vname="pop",loc1="PA1",loc2="PA2")
I've tried to convert the data_a with
data_a <- as.matrix(data_a)
and
popu <- sample(dat[,1], replace = F)
but they didn't work too
Thank's :)
There are maybe multiple issues. First, when you have created your data frame, be aware that data.frame function family treat string as a factor by default. It may be not what you want.
Then #NURAIMIAZIMAH is right, your function needs a data frame to work properly, so :
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
is a good start.
Moreover, you give value to vector like vname, loc1 and loc2. But you only use the name of these objects in your function, because you forgot to remove quotation mark.
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[loc1], replace = F)
locci_2 <- sample(dat[loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
Now your function should work, but maybe not in the way you would like to. Because there won't be any permutations in your data_3 table. If you look carefully, the type of return of this part of the code dat[loc1] is a data frame. You certainly want a vector to permute your data, so you have to subset your data frame like this : dat[,loc1].
This code below should do what you expect.
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[,loc1], replace = F)
locci_2 <- sample(dat[,loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
See you.

How can I write this loop?

I have five dataframes (a-f), each of which has a column 'nq'. I want to find the max, min and average of the nq columns
classes <- c("a","b","c","d","e","f")
for (i in classes){
format(max(i$nq), scientific = TRUE)
format(min(i$nq), scientific = TRUE)
format(mean(i$nq), scientific = TRUE)
}
But the code is not working. Can you please help?
You can't use a character value as a data.frame name. The value "a" is not the same as the data.frame a.
You probably shouldn't have a bunch of data.frames lying around. You probably want to have them all in a list. Then you can lapply over them to get results.
mydata <- list(
a = data.frame(nq=runif(10)),
b = data.frame(nq=runif(10)),
c = data.frame(nq=runif(10)),
d = data.frame(nq=runif(10))
)
then you can do
lapply(mydata, function(x)
format(c(max(x$nq), min(x$nq), mean(x$nq)), scientific = TRUE)
)
to get all the values at once.
The reason it is not working is because 'i' is a character/string. As already mentioned by Mr.Flick you have to make it into a list.
Alternatively, you instead of writing i$nq in your loop you can write get(i)$nq. The get() function will search the workspace for an object by name and it will return the object itself. However, this is not as clean as making it into a list and using lapply.

R + match values at scale (using apply?)

Is there a way to make matching values at scale more programmatic? Basically what I want to do is add a bunch of columns for value lookups onto a dataframe, but I don't want to write the match[] argument every time. It seems like this would be a use case for mapply but I can't quite figure out how to use it here. Any suggestions?
Here's the data:
data <- data.frame(
region = sample(c("northeast","midwest","west"), 50, replace = T),
climate = sample(c("dry","cold","arid"), 50, replace = T),
industry = sample(c("tech","energy","manuf"), 50, replace = T))
And the corresponding lookup tables:
lookups <- data.frame(
orig_val = c("northeast","midwest","west","dry","cold","arid","tech","energy","manuf"),
look_val = c("dir1","dir2","dir3","temp1","temp2","temp3","job1","job2","job3")
)
So now what I want to do is: First add a column to "data" that's called "reg_lookups" and it will match the region to its appropriate value in "lookups". Do the same for "climate_lookups" and so on.
Right now, I've got this mess:
data$reg_lookup <- lookups$look_val[match(data$region, lookups$orig_val)]
data$clim_lookup <- lookups$look_val[match(data$climate, lookups$orig_val)]
data$indus_lookup <- lookups$look_val[match(data$industry, lookups$orig_val)]
I've tried using a function to do this, but the function doesn't seem to work, so then applying that to mapply is a no-go (plus I'm confused about how the mapply syntax would work here):
match_fun <- function(df, newval, df_look, lookup_val, var, ref_val) {
df$newval <- df_look$lookup_val[match(df$var, df_look$ref_val)]
return(df)
}
data2 <- match_fun(data, reg_2, lookups, look_val, region, orig_val)
I think you're just trying to do this:
data <- merge(data,lookups[1:3,],by.x = "region",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[4:6,],by.x = "climate",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[7:9,],by.x = "industry",by.y = "orig_val",all.x = TRUE)
But it would be much better to store the lookups either in separate data frames. That way you can control the names of the new columns more easily. It would also allow you to do something like this:
lookups1 <- split(lookups,rep(1:3,each = 3))
colnames(lookups1[[1]]) <- c('region','reg_lookup')
colnames(lookups1[[2]]) <- c('climate','clim_lookup')
colnames(lookups1[[3]]) <- c('industry','indus_lookup')
do.call(cbind,mapply(merge,
x = list(data[,1,drop = FALSE],data[,2,drop =FALSE],data[,3,drop = FALSE]),
y = lookups1,
moreArgs = list(all.x = TRUE),
SIMPLIFY = FALSE))
and you should be able to wrap that do.call bit in a function.
I used data[,1,drop = FALSE] in order to preserve them as one column data frames.
The way you structure mapply calls is to pass named arguments as lists (the x = and y = parts). I wanted to be sure to preserve all the rows from data, so I passed all.x = TRUE via moreArgs, so that gets passed each time merge is called. Finally, I need to stitch them all together myself, so I turned off SIMPLIFY.

Resources