paste input name between words for save it using write.table - r

im super newbie on R and i have been learning for myself for a few weeks already due my work degree.
Im almost done with the statistical analysis that i need, but it is through an ugly and messy code, that is, repeating lot of codes for several data frames, to apply different statistical tests, save results, etc.
Well now, for personal interest, want to write this better, but im totally trapped in my ignorance and really need a push to get the idea, please.
For example, i want to create a function that measure the correlation on all the data tables im using and save those results as a tables using the input name as part of the output name.
I mean, if we had the iris data but measured on different seasons, e.g. iris_fall, iris_winter, iris_spring and iris_summer, after apply cor(X) method to each one, i want to save those results as tables called like "mCoriris_fall.txt", "mCoriris_winter.txt", "mCoriris_spring.txt" and "mCoriris_summer.txt" respectively.
My useless code for now say:
cor_PQ<-function(X) {
cor_PQ<-cor(X, use="pairwise.complete.obs")
return(cor_PQ)
}
savecor<-function(t) {
outputname<-(paste0("mCor",t)) #HOW DO I CALL THE NAME OF THE INPUT? t is cor_PQ result matrix.
savecor<-write.table(t, file=paste0(outputname,".txt"))
return(savecor)
}
cor_PQ(Iris_fall)
I expect to get cor result and save it as a table in my workspace, using the input name as part of the output name.
Im aware this are 2 separates functions and the one to write table should be inside the function for cor(x), but i cant understand how.
I have been reading a lot but i just cant fit all in my head.
Thanks to anyone who can help me.
Regards.
UNTIL HERE IT HAS BEEN SOLVED...
But after making a list with my 14 data frames to apply cor and other methods, the write.table function overwrite the 14 cor results on 1 single doc. This is my code.
PQ_files<-list.files(path="C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs",pattern="\\_PQ.txt")
PQ_data<-lapply(PQ_files, read.table)
names(PQ_data)<-gsub("\\_PQ.txt","", PQ_files)
PQ_data
cor_PQ<-function(X) {
cor_PQ<-cor(X, use="pairwise.complete.obs")
outputname.txt<-paste0("mCor",deparse(substitute(X)),".txt")
write.table(cor_PQ, file=outputname.txt)
outputname.pdf<-paste0("Cor",deparse(substitute(X)),".pdf")
pdf(outputname.pdf)
plot(X)
dev.off()
return(cor_PQ)
}
for (i in seq_along(PQ_data)){
Correlaciones<-lapply(PQ_data,cor_PQ)
}
Correlaciones
On SUM: seems to work almost good, until the write.table and plot(x) overwrite the outputs from the 14 dataframes on my PQ_data withe the name mCor[[i]] and CorX[[i]], respectively.
Should i define [i] somehow to have each results with the right name?
Also, when i run Correlaciones at the end, i can see the cor result for the 14 dataframes in one single dataframe, but i dont know how to split them correctly.
I guess almost there.
THANKS AGAIN!

You can combine the two functions and use deparse substitute to get input names as string
cor_PQ <- function(X) {
cor_PQ<-cor(X, use="pairwise.complete.obs")
outputname<- paste0("mCor",deparse(substitute(X)), ".txt")
write.table(t, file=outputname)
return(cor_PQ)
}
and then call
cor_PQ(Iris_fall)

Related

write.table inside a function applied to a list of data frames overwrite outputs

I almost finish a messy code to apply several statistical methods/test to 11 data frames from different watersheds with physico-chemical parameters as variables. I reach the goal, but I need to do this functional.
So to start i made a function to compute correlation, and save the results as .txt tables and .pdf images.
It works great when run the function to one dataframe at the time (for that you should import each dataframe separately using read.table, which is not written in the code below).
As i want it functional, made a list of the 11 dataframes and use lapply to run the function to each one. It works in the sense that gives me one list (corr) containing the correlation results of each dataframe.
Here comes the issues:
The list cor with correlation results for each dataframe looks like has values instead of data frames, so i dont know how to access or save them (see the corr list in the Environment/Data window). Well, until here, at least looks like correlation results exists somewhere.
The second problem is that when i run corr<-lapply(PQ_data, cor_PQ), which has a line to save the outputs as tables (.txt) and images (.pdf) using part of the name of the original dataframe computed (e.g first element of PQ_data is "AgIX_E_PQ" so table and plot of cor_PQ(PQ_data[["AgIX_E_PQ"]] should get the names "mCorAgIX_E_PQ.txt" and "CorAgIX_E_PQ.pdf" respectively), im getting just one output (mCorX[[I]].txt and CorX[[i]].pdf) with the last dataframe correlation result. That is, tables and images for each dataframe correlation result are overwritten into this generics mCorX[[I]].txt, CorX[[i]].pdf files.
Now i guess have to define 'i' or something to avoid this. Should i define cor_PQ function for PQ_data instead X?
If anyone can see where im failing, i will appreciate any help to solve this, please.
My data: PQ_data /save it in your workspace and fix setwd with it.
My code:
rm(list=ls(all=TRUE))
cat("\014")
setwd("C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs") #my workspace
PQ_files<-list.files(path="C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs",
pattern="\\_PQ.txt") #my list of 14 dataframes in my workspace.
PQ_data<-lapply(PQ_files, read.table) #read tables of the 14 dataframes in the list.
names(PQ_data)<-gsub("\\_PQ.txt","", PQ_files) #name the 14 dataframes with their original names.
#FUNCTION TO COMPUTE CORRELATIONS, SAVE TABLES AND PLOTS.
cor_PQ<-function(X) {
corPQ<-cor(X, use="pairwise.complete.obs")
outputname.txt<-paste0("mCor",deparse(substitute(X)),".txt")
write.table(corPQ, file=outputname.txt)
outputname.pdf<-paste0("Cor",deparse(substitute(X)),".pdf")
pdf(outputname.pdf)
plot(X)
dev.off()
return(corPQ)
}
corr<-lapply(PQ_data, cor_PQ)
After this, as i said, a get a list called "corr" with 11 elements containing correlation results from each dataframe in my list (PQ_data), but i cant access them as tables when i pin the "corr" list in my environment/data window (they dont show the blue R arrow to expand the element).
`
And i get only 2 output files called mCorX[[I]].txt and CorX[[i]].pdf showing only the last dataframe correlation result because the write.table and .pdf functions overwrite the results of the 10 previous calculations.
Again, i will appreciate any help. I really need a push to catch the idea.
Thanks!!!
lapply doesn't send names of the list to the function. So although the function works for individual files it doesn't work with list of files. Also since there are no names to the files all the files generated are given the same name, hence all the new files overwrite the previously existing files and in the end you get output with only 1 file which is the last element in your list. You can use the below function where we send the names as different parameter to assign the name to the files.
cor_PQ<-function(X, Y) {
corPQ<-cor(X, use="pairwise.complete.obs")
outputname.txt<-paste0("mCor",Y,".txt")
write.table(corPQ, file= outputname.txt)
outputname.pdf<-paste0("Cor",Y,".pdf")
pdf(outputname.pdf)
plot(X)
dev.off()
return(corPQ)
}
Now use Map to apply the same function.
Map(cor_PQ, PQ_data, names(PQ_data))
We can also use imap from purrr to apply this function.
purrr::imap(PQ_data, cor_PQ)

Using For-Loop With Strings

I'm learning R and trying to use it for a statistical analysis at the same time.
Here, I am in the first part of the work: I am writing matrices and doing some simple things with them, in order to work later with these.
punti<-c(0,1,2,4)
t1<-matrix(c(-8,36,-8,-20,51,-17,-17,-17,57,-19,-19,-19,35,-8,-19,-8,0,0,0,0,-20,-20,-20,60,
-8,-8,-28,44,-8,-8,39,-23,-8,-19,35,-8,57,-8,-41,-8,-8,55,-8,-39,-8,-8,41,-25,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),ncol=4,byrow=T)
colnames(t1) <- c("20","1","28","19")
r1<-matrix(c(12,1,19,9,20,20,11,20,20,11,20,28,0,0,0,12,19,19,20,19,28,15,28,19,11,28,1,
33,20,28,31,1,19,17,28,19,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),ncol=3,byrow=T)
pt1<-rbind(sort(colSums(t1)),sort(punti))
colnames(r1)<-c("Valore","Vincitore","Perdente")
r1<-as.data.frame(r1)
But I have more matrices t_ and r_ so I would like to run a for-loop like:
for (i in 1:150)
{
pt[i]<-rbind(sort(colSums(t[i])),sort(punti))
colnames(r[i])<-c("Valore","Vincitore","Perdente")
r[i]<-as.data.frame(r[i])
}
This one just won't work because r_, t_ and pt_ are strings, but you get both the idea and that I would not like to copy-paste these three lines and manually edit the [i] 150 times. Is there a way to do it?
personally i don't advise dynamically and automatically creating lots of variables in the global environment, and would advise you to think about how you can accomplish your goals without such an approach. with that said, if you feel you really need to dynamically create all these variables, you may benefit from the assign function.
it could work like so:
for (i in 1:150)
{
assign(paste0('p',i),rbind(sort(colSums(t[i])),sort(punti)))
}
the first argument in the assign function is the formula for the variable name and how it is created; the second argument is what you wish to assign to the variable being created.

How to make loops in R that operate on and return multiple objects

This is my first post, and I think I have looked thoroughly for my answer with no luck, but I might not be typing in the right search terms, since I am relatively new to R. I apologize if this has been answered before and if it has a link would be greatly appreciated.
In essence, I am trying to make a loop that will operate on a set of data frames that I have read into R from .txt files using read.table. I am working with simulated vegetation data organized into many species by site matrices, so it would be best for me if I could create loops that will just operate on the objects I have read in using some functions I have made and then put out new objects into my workspace with a specific naming pattern (e.g. put "_av" on the end of the name of the object operated on when creating a new object).
for convenience sake, lets say I have only four matrices I want to work with, all which contain the phrase "mod" for model. I have read that I can put these data frames into a list of data frames by the following code:
list.mods=lapply(ls(pattern="mod"),get)
This does create a list which I have been having trouble on getting my functions to actually operate on. From what I read this is the best way to make a list of objects you want to operate on.
So lets say that list.mods is now my list of operable matrices - mod1, mod2, mod3, and mod4. Also, lets say I have a function that simply calculates Bray-Curtis dissimilarity as follows:
bc=function(x){
vegdist(x,method="bray")
}
I can use this by typing in:
mod1.bc=bc(mod1)
That works. But it seems like I should be able to apply my list of models to the function bc and have it output the models with a pattern mod1.bc, mod2.bc, mod3.bc, and mod4.bc. I cannot get my list of files to work in the function much less save each operation as a new object with a patterned name.
What am I doing wrong? In the end I might have as many as a hundred models or more and would really appreciate being able to create a list of items that I can run through loops.
Thanks in advance.
You can use lapply again:
new.list.mods <- lapply(list.mods, bc)
This will return a new list in which each element is the result of applying bc to the corresponding element of list.mods.
The 'apply' family of functions in R basically allows you to save typing. If that's easier for you to understand, you can use a 'for loop' instead. Of course you will need to know how to access elements in a list for that. There is a question about that.
How about collecting the names of the models/objects you want into a list:
mod_list <- sapply(ls(pattern = "mod"), as.name)
and then looping over them with your function:
output_list <- lapply(eval(mod_list), bc)
With this approach you avoid creating the potentially large and redundant list.mods object in your example. Also, I think this will result in conveniently named lists.

Looping in R to create transformed variables

I have a dataset of 80 variables, and I want to loop though a subset of 50 of them and construct returns. I have a list of the names of the variables for which I want to construct returns, and am attempting to use the dplyr command mutate to construct the variables in a loop. Specifically my code is:
for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") = (i - lag(i,1))/lag(i,1))}
where returnvars is my list, and alldta is my dataset. When I run this code outside the loop with just one of the `i' values, it works fine. The code for that looks like this:
alldta <- mutate(alldta,rVar = (Var- lag(Var,1))/lag(Var,1))
However, when I run it in the loop (e.g., attempting to do the previous line of code 50 times for 50 different variables), I get the following error:
Error: unexpected '=' in:
"for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") ="
I am unsure why this issue is coming up. I have looked into a number of ways to try and do this, and have attempted solutions that use lapply as well, without success.
Any help would be much appreciated! If there is an easy way to do this with one of the apply commands as well, that would be great. I did not provide a dataset because my question is not data specific, I'm simply trying to understand, as a relative R beginner, how to construct many transformed variables at once and add them to my data frame.
EDIT: As per Frank's comment, I updated the code to the following:
for (i in returnvars) {
varname <- paste("r",i,sep="")
alldta <- mutate(alldta,varname = (i - lag(i,1))/lag(i,1))}
This fixes the previous error, but I am still not referencing the variable correctly, so I get the error
Error in "Var" - lag("Var", 1) :
non-numeric argument to binary operator
Which I assume is because R sees my variable name Var as a string, rather than as a variable. How would I correctly reference the variable in my dataset alldta? I tried get(i) and alldta$get(i), both without success.
I'm also still open to (and actively curious about), more R-style ways to do this entire process, as opposed to using a loop.
Using mutate inside a loop might not be a good idea either. I am not sure if mutate makes a copy of the data frame but its generally not a good practice to grow a data frame inside a loop. Instead create a separate data frame with the output and then name the columns based on your logic.
result = do.call(rbind,lapply(returnvars,function(i) {...})
names(result) = paste("r",returnvars,sep="")
After playing around with this more, I discovered (thanks to Frank's suggestion), that the following works:
extended <- alldta # Make a copy of my dataset
for (i in returnvars) {
varname <- paste("r",i,sep="")
extended[[varname]] = (extended[[i]] - lag(extended[[i]],1))/lag(extended[[i]],1)}
This is still not very R-styled in that I am using a loop, but for a task that is only repeating about 50 times, this shouldn't be a large issue.

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

Resources