Saving R loop output into e.g. csv file - r

I am aware that this probably trivial but I can not solve it. And I have edited the original question a bit as I realized it was not very logical. See the code below:
u1 <- rnorm(30)
usq <- 0
for(i in 1:5)
{
usq[i] <- u1[i]*u1[i]
print(usq[i])
}
The output is (your might be different in terms of numbers):
[1] 0.3501974
[1] 0.01937814
[1] 0.4053783
[1] 0.0005323552
[1] 1.459631
All I want to do is to save this output as e.g. CSV file with one or two columns. I happy to be pointed out to any spot where this question was answered. I could not for the life of me find it... I have tried:
write.csv(matrix(1:5, ncol=1), "Results.csv")

I'm assuming the numbers are stand-ins for something a lot more complex - for loops in R are generally avoided - and it's rare that there isn't a better options somewhere.
To answer your question though, you need (as Gregor pointed out) write.csv. The one addition I would make is that the first time you call it, you need to make a new file.
write.csv(output_as_dataframe_or_matrix, file = "path_to_file.csv")
after that, you need to tell it not to overwrite what you had before:
write.csv(output_as_dataframe_or_matrix, file = "path_to_file.csv", append = TRUE)
Here's how you can achieve this (note that I'm removing the for loops. R will perform many operations on each element of a vector, and its much more efficient when you have it work this way.
u1 <- rnorm(30
usq <- u1^2 # or u1 * u1
print(usq)
write.csv(usq, "Results.csv")

This was probably not the most eloquently written question (my apologies for that) but the answer to it should be that the "empty container" has to be created first and then filled with numbers from the loop calculations. To solve my issue from above the following code did the job:
output<-matrix(0,5,1)
output<-as.data.frame(output)
u1 <- rnorm(30)
usq <- 0
for(i in 1:5)
{
usq[i] <- u1[i]*u1[i]
print(usq[i])
output$V1[i]<-usq[i]
}
write.csv(output, "Results.csv")

Related

How to list results for several calculations in R

I have loaded two source files, performed some iterative calculations, and then i need to display/export the results. There are hundreds of iterative calculations, hence hundreds of results. However, only results of the final calculation is displayed.
In this example, i have shortened the list of calculations to only 3. Please refer to line 7 (k in 1:3). How do i get R to display result of all calculations?
Many thanks in advance to those who can offer help. If this question has already been asked before, a link would be great. I could not find this probably because i do not know the right terms to search for.
# Load files
d1<-read.csv('testhourly.csv',sep=",",header=F)
names(d1)<-c("elapsedtime","units")
d2<-read.csv('testevent.csv',sep=",",header=F)
names(d2)<-c("eventno","starttime","endtime","starttemp","endtemp")
# Perform for calculations 1 to 3
for(k in 1:3){
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
z <- (x-y)}
results <- sum(z)
# Export results
write.csv(results, file = "results.csv")
You are not saving your output inside the loop for every iteration, so your loop only returns the final value of the last iteration.
temp=vector("list",3)
for(k in 1:3) {
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
temp[[k]] <- (x-y)
}
results <- sum(unlist(temp))

How to efficiently iterate through a complicated function that outputs a dataframe?

I essentially need to iterate through a set of values for parameters A,B,C to generate a table of results that will help me analyze the importance of such parameters. This is for a program in R.
Let's say that:
A goes from rangeA = 1:10
B goes from rangeB = 11:20
C goes from rangeC = 21:30
The simplest (not most efficient) solution that I currently use goes something like this:
### here I create this empty dataframe because I add on each tmp calc later
res <- data.frame()
### here i just create a random dataframe for replicative purposes
dataset <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
ParameterAdjustment() <- function{
for(a in rangeA){
for(b in rangeB){
for(c in rangeC){
### this is a complicated calculation that is much more
### difficult than the replicable example below
tmp <- CalculateSomething(dataset,a,b,c)
### an example calculation
### EDIT NEW EXAMPLE CALCULATION
tmp <- colMeans(dataset+a*b*c)
tmp <- data.frame(data.frame(t(tmp),sd(tmp))
res <- rbind(res,tmp)
}
}
}
return(res)
}
My problem is that this works fine with my original dataset that runs calculations on a 7000x500 dataframe. However, my new datasets are much larger and performance has become a significant issue. Can anyone suggest or help with a more efficient solution? Thank you.
Not sure what language the above is, so not sure how relevant this is but here goes: Are you outputting/sending the data as you go or collecting all the display-results in memory then outputting them all in one go at the end? When I've encountered similar problems with large datasets and this approach has helped me out a few times. For example, sending 10,000s of data-points back to the client for a graph, rather than generating an array of all those points and sending that, I output to screen after each point and then free up the memory. It still takes a while but that's unavoidable. The important bit is that it doesn't crash.

Faster alternative methods to for-loop in R for pattern matching

I am working on a problem in which I have to two data frames data and abbreviations and I would like to replace all the abbreviations present in data to their respective full forms. Till now I was using for-loops in the following manner
abb <- c()
for(i in 1:length(data$text)){
for(j in 1:length(AbbreviationList$Abb)){
abb <- paste("(\\b", AbbreviationList$Abb[j], "\\b)", sep="")
data$text[i] <- gsub(abb, AbbreviationList$Fullform[j], tolower(data$text[i]))
}
}
The abbreviation data frame looks something like the image below and can be generated using the following code
Abbreviation <- c(c("hru", "how are you"),
c("asap", "as soon as possible"),
c("bf", "boyfriend"),
c("ur", "your"),
c("u", "you"),
c("afk", "away from keyboard"))
Abbreviation <- data.frame(matrix(Abbreviation, ncol=2, byrow=T), row.names=NULL)
names(Abbreviation) <- c("abb","Fullform")
And the data is merely a data frame with 1 columns having text strings in each rows which can also be generated using the following code.
data <- data.frame(unlist(c("its good to see you, hru doing?",
"I am near bridge come ASAP",
"Can u tell me the method u used for",
"afk so couldn't respond to ur mails",
"asmof I dont know who is your bf?")))
names(data) <- "text"
Initially, I had data frame with around 1000 observations and abbreviation of around 100. So, I was able to run the analysis. But now the data has increased to almost 50000 and I am facing difficulty in processing it as there are two for-loops which makes the process very slow. Can you suggest some better alternatives to for-loop and explain with an example how to use it in this situation. If this problem can be solved faster via vectorization method then please suggest how to do that as well.
Thanks for the help!
This should be faster, and without side effect.
mapply(function(x,y){
abb <- paste0("(\\b", x, "\\b)")
gsub(abb, y, tolower(data$text))
},abriv$Abb,abriv$Fullform)
gsub is vectorized so no you give it a character vector where matches are sought. Here I give it data$text
I use mapply to avoid the side effect of for.
First of all, clearly there is no need to compile the regular expressions with each iteration of the loop. Also, there is no need to actually loop over data$text: in R, very often you can use a vector where a value could do -- and R will go through all the elements of the vector and return a vector of the same length.
Abbreviation$regex <- sprintf( "(\\b%s\\b)", Abbreviation$abb )
for( j in 1:length( Abbreviation$abb ) ) {
data$text <- gsub( Abbreviation$regex[j],
Abbreviation$Fullform[j], data$text,
ignore.case= T )
}
The above code works with the example data.

Cannot create an empty vector and append new elements in R

I am just beginning to learn R and am having an issue that is leaving me fairly confused. My goal is to create an empty vector and append elements to it. Seems easy enough, but solutions that I have seen on stackoverflow don't seem to be working.
To wit,
> a <- numeric()
> append(a,1)
[1] 1
> a
numeric(0)
I can't quite figure out what I'm doing wrong. Anyone want to help a newbie?
append does something that is somewhat different from what you are thinking. See ?append.
In particular, note that append does not modify its argument. It returns the result.
You want the function c:
> a <- numeric()
> a <- c(a, 1)
> a
[1] 1
Your a vector is not being passed by reference, so when it is modified you have to store it back into a. You cannot access a and expect it to be updated.
You just need to assign the return value to your vector, just as Matt did:
> a <- numeric()
> a <- append(a, 1)
> a
[1] 1
Matt is right that c() is preferable (fewer keystrokes and more versatile) though your use of append() is fine.

Assigning output of a function to two variables in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
function with multiple outputs
This seems like an easy question, but I can't figure it out and I haven't had luck in the R manuals I've looked at. I want to find dim(x), but I want to assign dim(x)[1] to a and dim(x)[2] to b in a single line.
I've tried [a b] <- dim(x) and c(a, b) <- dim(x), but neither has worked. Is there a one-line way to do this? It seems like a very basic thing that should be easy to handle.
This may not be as simple of a solution as you had wanted, but this gets the job done. It's also a very handy tool in the future, should you need to assign multiple variables at once (and you don't know how many values you have).
Output <- SomeFunction(x)
VariablesList <- letters[1:length(Output)]
for (i in seq(1, length(Output), by = 1)) {
assign(VariablesList[i], Output[i])
}
Loops aren't the most efficient things in R, but I've used this multiple times. I personally find it especially useful when gathering information from a folder with an unknown number of entries.
EDIT: And in this case, Output could be any length (as long as VariablesList is longer).
EDIT #2: Changed up the VariablesList vector to allow for more values, as Liz suggested.
You can also write your own function that will always make a global a and b. But this isn't advisable:
mydim <- function(x) {
out <- dim(x)
a <<- out[1]
b <<- out[2]
}
The "R" way to do this is to output the results as a list or vector just like the built in function does and access them as needed:
out <- dim(x)
out[1]
out[2]
R has excellent list and vector comprehension that many other languages lack and thus doesn't have this multiple assignment feature. Instead it has a rich set of functions to reach into complex data structures without looping constructs.
Doesn't look like there is a way to do this. Really the only way to deal with it is to add a couple of extra lines:
temp <- dim(x)
a <- temp[1]
b <- temp[2]
It depends what is in a and b. If they are just numbers try to return a vector like this:
dim <- function(x,y)
return(c(x,y))
dim(1,2)[1]
# [1] 1
dim(1,2)[2]
# [1] 2
If a and b are something else, you might want to return a list
dim <- function(x,y)
return(list(item1=x:y,item2=(2*x):(2*y)))
dim(1,2)[[1]]
[1] 1 2
dim(1,2)[[2]]
[1] 2 3 4
EDIT:
try this: x <- c(1,2); names(x) <- c("a","b")

Resources