Loop over string variables in R - r

When programming in Stata I often find myself using the loop index in the programming. For example, I'll loop over a list of the variables nominalprice and realprice:
local list = "nominalprice realprice"
foreach i of local list {
summarize `i'
twoway (scatter `i' time)
graph export "C:\TimePlot-`i'.png"
}
This will plot the time series of nominal and real prices and export one graph called TimePlot-nominalprice.png and another called TimePlot-realprice.png.
In R the method I've come up with to do the same thing would be:
clist <- c("nominalprice", "realprice")
for (i in clist) {
e <- paste("png(\"c:/TimePlot-",i,".png\")", sep="")
eval(parse(text=e))
plot(time, eval(parse(text=i)))
dev.off()
}
This R code looks unintuitive and messy to me and I haven't found a good way to do this sort of thing in R yet. Maybe I'm just not thinking about the problem the right way? Can you suggest a better way to loop using strings?

As other people have intimated, this would be easier if you had a dataframe with columns named nominalprice and realprice. If you do not, you could always use get. You shouldn't need parse at all here.
clist <- c("nominalprice", "realprice")
for (i in clist) {
png(paste("c:/TimePlot-",i,".png"), sep="")
plot(time, get(i))
dev.off()
}

If your main issue is the need to type eval(parse(text=i)) instead of ``i'`, you could create a simpler-to-use functions for evaluating expressions from strings:
e = function(expr) eval(parse(text=expr))
Then the R example could be simplified to:
clist <- c("nominalprice", "realprice")
for (i in clist) {
png(paste("c:/TimePlot-", i, ".png", sep=""))
plot(time, e(i))
dev.off()
}

Using ggplot2 and reshape:
library(ggplot2)
library(reshape)
df <- data.frame(nominalprice=rexp(10), time=1:10)
df <- transform(df, realprice=nominalprice*runif(10,.9,1.1))
dfm <- melt(df, id.var=c("time"))
qplot(time, value, facets=~variable, data=dfm)

I don't see what's especially wrong with your original solution, except that I don't know why you're using the eval() function. That doesn't seem necessary to me.
You can also use an apply function, such as lapply. Here's a working example. I created dummy data as a zoo() time series (this isn't necessary, but since you're working with time series data anyway):
# x <- some time series data
time <- as.Date("2003-02-01") + c(1, 3, 7, 9, 14) - 1
x <- zoo(data.frame(nominalprice=rnorm(5),realprice=rnorm(5)), time)
lapply(c("nominalprice", "realprice"), function(c.name, x) {
png(paste("c:/TimePlot-", c.name, ".png", sep=""))
plot(x[,c.name], main=c.name)
dev.off()
}, x=x)

Related

I want to create a data.frame with the values that I print of this loop in R

When I run this Loop I can print the results and I want to create a data frame with this data but I cant. Until now I have this:
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
for (i in 1:numfiles) {
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
print(d40)
print(filenames[i])
}
This is not the most efficient way to do this, but it takes advantage of what code you've already written. First, you'll create an empty data frame with the columns you want, but filled with NA. Then, in each iteration of the loop, you'll fill one row of the data frame.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Create an empty data.frame
df <- data.frame(filename = rep(NA, numfiles), d40 = rep(NA, numfiles))
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
# Fill row i of the data frame
df[i,"filename"] = filenames[i]
df[i,"d40"] = d40
}
Hope that does it! Good luck :)
There are a lot of ways to do what you are asking. Also, without a reproducible example it is difficult to validate that code will run. I couldn't tell what type of data was in each of your variable so I just guessed that they were mostly characters with one numeric. You'll need to change the code if that's not true.
The following method is using base R (no other packages). It builds off of what you have done. There are other ways to do this using map, do.call, or apply. But it's important to be able to run through a loop.
As someone commented, your code is just re-writing itself every loop. Luckily you have the variable i that you can use to specify where things go.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Declare an empty dataframe for efficiency purposes
df <- data.frame(
ts = rep(NA_character_,numfiles),
tss = rep(NA_character_,numfiles),
tssc = rep(NA_character_,numfiles),
d40 = rep(NA_real_,numfiles),
stringsAsFactors = FALSE
)
# Loop through the files and fill in the data
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
df$ts[i] <- subset(file, file$name == "plantNutrientUptake")
df$tss[i] <- subset (ts, ts$path == "//plants/nitrate")
df$tssc[i] <- tss[,2:3]
df$d40[i] <- tssc[41,2]
print(d40)
print(filenames[i])
}
You'll notice a few things about this code that are extra.
First, I'm declaring the variable type for each column explicitly. You can use rep(NA,numfiles) but that leave R to guess what the column should be. This may not be a problem for you if all of your variables are obviously of the same type. But imagine you have a variable a = c("1","A","B") of all characters. R will go through the first iteration of the loop and guess that the column is numeric. Then on the second run of the loop will crash when it runs into a character.
Next, I'm declaring the entire dataframe before entering the loop. When people tell you that loops in [modern] R are slow it is often because you are re-allocating memory every loop. By declaring the entire dataframe up front you speed up the loop significantly. This also allows you to reference any cell in the dataframe...which is exactly what you want to do in the loop.
Finally, I'm using the $ syntax to make things clear. Writing df[i,"d40"] <- d40 is the same as writing df$d40[i] <- d40. I just think it is clear to use the second method. This is a matter of personal preference.

R, script/function for retrieving more stocks

I'm a newbye in R and I've seen several posts about downloading more stocks, but for a reason or another they don't work as suggested.
My purpose is to download a vector of stocks and create a whole xts-matrix containing only Close prices for every stock (so a nobservations x 3 columns).
Anyway, I'd like to start from a basic script that doesn't work properly:
library(quantmod)
ticker=c("KO","AAPL","^GSPC")
for (i in 1:length(ticker)) {
simbol=as.xts(na.omit(getSymbols(ticker[i],from="2016-01-01",auto.assign=F)))
new=Cl(simbol)
merge(new[i])
}
It would be even better to write a function(symbols) that allows me to call whenever I need to just change the name of the stocks to download.
Thanks to everyone
This is how I would do what you want with a function wrapper (which is a pretty common kind of manipulation with xts):
ticker=c("KO","AAPL","^GSPC")
collect_close_series <- function(ticker) {
# Preallocate a list to store the result from each loop iteration (Note: lapply is another alternative to a direct loop)
lst <- vector("list", length(ticker))
for (i in 1:length(ticker)) {
symbol <- na.omit(getSymbols(ticker[i],from="2016-01-01",auto.assign = FALSE))
lst[[i]] <- Cl(symbol)
}
# You have a list of close prices. You can combine the objects in the list compactly using do.call; this is a common "data manipulation pattern" with xts objects.
rr <- do.call(what = merge, lst)
rr
}
out <- collect_close_series(ticker)
More advanced (better code design): You could write cleaner code by writing a function that handles each symbol (rather than a function that wraps and passes in all the symbols together) and then run lapply on it:
per_sym_close <- function(tick) {
symbol <- na.omit(getSymbols(tick,from="2016-01-01",auto.assign = FALSE))
Cl(symbol)
}
out2 <- do.call(merge, lapply(X = ticker, FUN = per_sym_close))
This gives the same result.
Hope this helps getting you started toward writing good R code!

R, from a list create plots and save it with his name

I have a list, which contains 75 matrix with their names, and I want to do a plot for each matrix, and save each plot with the name that the matrix have.
My code do the plots with a loop and it works, I get 75 correct plots, but the problem is that the name of the plot file is like a vector "c(99,86,94....)",too long and I don´t know which one is.
I´m ussing that code, probably isn´t the best. I´m a beginner, and I have been looking for a solution one week, but it was impossible.
for (i in ssamblist) {
svg(paste("Corr",i,".svg", sep=""),width = 45, height = 45)
pairs(~CDWA+CDWM+HI+NGM2+TKW+YIELD10+GDD_EA,
data=i,lower.panel=panel.smooth, upper.panel=panel.cor,
pch=0, main=i)
dev.off()}
How put to a each plot his name?.
I try change "i" for names(i), but the name was the name of the first column,and only creates one plot. I try to do it with lapply but I could't.
PS: the plots are huge, and I have to expand the margins. I´m using Rstudio.
Thank you¡
Using for loop or apply:
# dummy data
ssamblist <- list(a = mtcars[1:10, 1:4], b = mtcars[11:20, 1:4], c = mtcars[21:30, 1:4])
# using for loop
for(i in names(ssamblist)) {
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()}
# using apply
sapply(names(ssamblist), function(i){
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()})

R overlay multiple plots in a loop

So I've created a loop that makes 10 individual plots:
for (k in 1:nrow(sites)) {
temp_title <- paste("site",k, "county", sites[k,2],"site",sites[k,3])
l <- which(hourly_nj_table$County.Code==sites[k,2]&hourly_nj_table$Site.Num==sites[k,3])#grab data for each site individually
temp_filename <- paste("/filepath",temp_title,".pdf")
PM_site <- hourly_nj_table[l,]
PM_site$realTime <- as.numeric(PM_site$Time.Local)
PM_mean_site <- aggregate(PM_site, by=list(PM_site$Time.Local),FUN="mean",na.rm=TRUE)
plot(PM_mean_site$realTime,PM_mean_site$Sample.Measurement, type="l",lwd=10,main=paste(temp_title),xlab="LocalTime",ylab="Ozone (ppm)")#,ylim=c(0,0.05))
}
But I would like to see how they compare on the same axis. Normally (if i'm just hardcoding it) I would add a new parameter and then create the next plot, but i'm unsure how to incorporate that into a loop.
The data all comes from one csv file if that helps..
Thanks!
You're really very close. Plot() gets the ball rolling, lines() will allow you to draw inside the plot:
for (k in 1:nrow(sites)) {
temp_title <- paste("site",k, "county", sites[k,2],"site",sites[k,3])
l <- which(hourly_nj_table$County.Code==sites[k,2]&hourly_nj_table$Site.Num==sites[k,3])#grab data for each site individually
temp_filename <- paste("/Users/bob111higgins/Documents/School/College/Rutgers/Atmospheric Research",temp_title,".pdf")
PM_site <- hourly_nj_table[l,]
PM_site$realTime <- as.numeric(PM_site$Time.Local)
PM_mean_site <- aggregate(PM_site, by=list(PM_site$Time.Local),FUN="mean",na.rm=TRUE) #Make it average by time of day so can make time series plots.
ifesle(k ==1 ,
plot(PM_mean_site$realTime,PM_mean_site$Sample.Measurement, type="l",lwd=10,main=paste(temp_title),xlab="LocalTime",ylab="Ozone (ppm)")#,ylim=c(0,0.05)),
lines(PM_mean_site$realTime,PM_mean_site$Sample.Measurement, lwd=10))
}
I'm sure there are better ways to go about this, but this is how I've done it in the past.

read, manipulate and export multiple .dta Files using a for Loop in R

I have multiple time series (each in a seperate file), which I need to adjust seasonally using the season package in R and store the adjusted series each in a seperate file again in a different directory.
The Code works for a single county.
So I tried to use a for Loop but R is unable to use the read.dta with a wildcard.
I'm new to R and using usually Stata so the question is maybe quite stupid and my code quite messy.
Sorry and Thanks in advance
Nathan
for(i in 1:402)
{
alo[i] <- read.dta("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/SINGLE_SERIES/County[i]")
alo_ts[i] <-ts(alo[i], freq = 12, start = 2007)
m[i] <- seas(alo_ts[i])
original[i]<-as.data.frame(original(m[i]))
adjusted[i]<-as.data.frame(final(m[i]))
trend[i]<-as.data.frame(trend(m[i]))
irregular[i]<-as.data.frame(irregular(m[i]))
County[i] <- data.frame(cbind(adjusted[i],original[i],trend[i],irregular[i], deparse.level =1))
write.dta(County[i], "/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/ADJUSTED_SERIES/County[i].dta")
}
This is a good place to use a function and the *apply family. As noted in a comment, your main problem is likely to be that you're using Stata-like character string construction that will not work in R. You need to use paste (or paste0, as here) rather than just passing the indexing variable directly in the string like in Stata. Here's some code:
f <- function(i) {
d <- read.dta(paste0("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/SINGLE_SERIES/County",i,".dta"))
alo_ts <- ts(d, freq = 12, start = 2007)
m <- seas(alo_ts)
original <- as.data.frame(original(m))
adjusted <- as.data.frame(final(m))
trend <- as.data.frame(trend(m))
irregular <- as.data.frame(irregular(m))
County <- cbind(adjusted,original,trend,irregular, deparse.level = 1)
write.dta(County, paste0("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/ADJUSTED_SERIES/County",i,".dta"))
invisible(County)
}
# return a list of all of the resulting datasets
lapply(1:402, f)
It would probably also be a good idea to take advantage of relative directories by first setting your working directory:
setwd("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/")
Then you can simply the above paths to:
d <- read.dta(paste0("./SINGLE_SERIES/County",i,".dta"))
and
write.dta(County, paste0("./ADJUSTED_SERIES/County",i,".dta"))
which will make your code more readable and reproducible should, for example, someone ever run it on another computer.

Resources