How draw a loess line in ts plot - r

I tried hours to figure out how I can make my loess line work. The problem is I do not know much (lets say near nothing). I only have to use R for one course in university.
I created a fake table the real table is for download here
I have to make a timeline plot that worked surprisingly well. But now I have to add two loess lines with different spans. My Problem is I don't know how the command really works. I mean I know it should be something like loess(..~.., data=..). The step where I'm stuck is marked with "WHAT BELONGS HERE" in the given code below.
table <- structure(list(
Months = c("1980-06", "1980-07", "1980-08", "1980-09",
"1980-10", "1980-11", "1980-12", "1981-01"),
Total = c(75000, 70000, 60000, 73000, 72000, 71000, 76000, 71000)),
.Names = c("Monts", "Total of Killed Pigs"),
row.names = c(NA, 4L), class = "data.frame")
ts.obj <- ts(table$`Total of Killed Pigs`, start = c(1980, 1), frequency = 2)
plot(ts.obj)
trend1 <- loess(# **WHAT BELONGS HERE?**, data = table, span =1)
predict1 <- predict(trend1)
lines(predict1, col ="blue")
That is my original code:
obj <- read.csv(file="PATH/monthly-total-number-of-pigs-sla.csv", header=TRUE, sep=",")
ts.obj <- ts(obj$Monthly.total.number.of.pigs.slaughtered.in.Victoria..Jan.1980...August.1995, start = c(1980, 1), frequency = 12)
plot(ts.obj)
trend1 <- loess (WHAT BELONGS HERE?, data = obj, span =1)
predict1 <- predict (trend1)
lines(predict1, col="blue")

We can do away with the data argument as the time series is univariate (just one variable).
The formula ts.obj ~ index(ts.obj) can be read as
value as a function of time
as ts.obj will give you the values, and index(ts.obj) will give you the time index for those values, and the tilde ~ specifies that the first is a function of, or dependent on, the other.
library(zoo) # for index()
plot(ts.obj)
trend1 <- loess(ts.obj ~ index(ts.obj), span=1)
trend2 <- loess(ts.obj ~ index(ts.obj), span=2)
trend3 <- loess(ts.obj ~ index(ts.obj), span=3)
pred <- sapply(list(trend1, trend2, trend3), predict)
matlines(index(ts.obj), pred, lty=1, col=c("blue", "red", "orange"))
zoo isn't strictly required. If you replace index(ts.obj) with as.numeric(time(ts.obj)) you should be fine, I think.

In case you were wanting to go with ggplot2:
library(ggplot2)
library(dplyr)
table <- structure(list(
Months = c("1980-06", "1980-07", "1980-08", "1980-09",
"1980-10", "1980-11", "1980-12", "1981-01"),
Total = c(75000, 70000, 60000, 73000, 72000, 71000, 76000, 71000)),
.Names = c("Months", "Total"),
row.names = c(NA, 8L), class = "data.frame")
Change to proper dates:
table <- table %>% mutate(Months = as.Date(paste0(Months,"-01")))
Plot:
ggplot(table, aes(x=Months, y=Total)) +
geom_line() +
geom_smooth(span=1, se= FALSE, color ="red") +
geom_smooth(span=2, se= FALSE, color ="green") +
geom_smooth(span=3, se= FALSE) +
theme_minimal()

Related

Plotting every three rows from data frame

I would like to make some plots from my data. Unfortunately, it is hard to predict how many plots I will generate because it depends on data and may be different. It is a reason why I would like to make it easy adjustable. However, it will be most often a plot from group of 3 rows each time.
So, I would like to plot from rows 1:3, 4-6,7-9, etc.
This is data:
> dput(DF_final)
structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
Column ln_AC should be set as an Y axis and column TP as X axis. First of all I would like to have all of them on separate graphs next to each other (remember about issue that the number of plots may be igh at some point) and if possible everything at the same graph. It should be a point plot with trend line.
Is it also possible to get a slope, SD slope, R^2 on a plot from linear regression ?
I manage to do it a for a single plot but regression line looks strange...
The code below was used to generate this plot and regression line.
fit <- lm(DF_final$ln_AC~DF_final$TP, data=DF_final)
plot(DF_final[1:3,7], DF_final[1:3,3], type = "p", ylim = c(-10,0), xlim=c(0,100), col = "red")
lines(DF_final$TP, fitted(fit), col="blue")
In base R (without so many packages), you can do:
# splits every 3 rows
DF = split(DF_final,gsub("_[^ ]*","",rownames(DF_final) ))
# you can also do
# DF = split(DF_final,(1:nrow(DF_final) - 1) %/%3 ))
To store your values:
slopes = vector("numeric",3)
names(slopes) = names(DF)
rsq = vector("numeric",3)
names(rsq) = names(DF)
To plot:
par(mfrow=c(1,3))
for(i in names(DF)){
fit <- lm(ln_AC~TP, data=DF[[i]])
plot(DF[[i]]$TP, DF[[i]]$ln_AC, type = "p", col = "red",main=i)
abline(fit, col="blue")
slopes[i]=round(fit$coefficients[2],digits=2)
rsq[i]=round(summary(fit)$r.squared,digits=2)
mtext(side=1,paste("slope=",slopes[i],"\nrsq=",rsq[i]),
padj=-2,cex=0.7)
}
And your values:
slopes
Jack Tim Tom
-0.01 -0.01 -0.10
rsq
Jack Tim Tom
0.29 0.99 0.75
If I understand correctly, the reason you want 3 observation per graph is because you have different individuals (Jack,Tim,Tom) . Is that so?
If you don't want to worry about that number, you can do this
# move rownames to column
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person) # remove TP from names
# better to use library(data.table) for this step
data <- melt(data,id.vars=c("person","TP","ln_AC"))
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person)
This results in a plot like #giocomai, but it will work also if you have 4,5,6 or whatever persons in your data.
---- Edit
If you want to add R2 values, you can do something like this. Note, that it may not be the best and elegant solution, but it works.
data <- data.frame(...)
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person)
# run lm for all persons and save them in a data.frame
nomi <- unique(data$person)
#lmStats <- data.frame()
lmStats <- sapply(nomi,
function(ita){
model <- lm(ln_AC~TP,data= data[which(data$person == ita),])
lmStat <- summary(model)
# I only save r2, but you can get all the statistics you need
lmRow <- data.frame("r2" = lmStat$r.squared )
#lmStats <- rbind(lmStats,lmRow)
}
)
lmStats <- do.call(rbind,lmStats)
# format the output,and create a dataframe we will use to annotate facet_grid
lmStats <- as.data.frame(lmStats)
rownames(lmStats) <- gsub("\\..*","",rownames(lmStats))
lmStats$person <- rownames(lmStats)
colnames(lmStats)[1] <- "r2"
lmStats$r2 <- round(lmStats$r2,2)
lmStats$TP <- 40
lmStats$ln_AC <- 0
lmStats$lab <- paste0("r2= ",lmStats$r2)
# melt and add r2 column to the data (not necessary, but I like to have everything I plot in teh data)
data <- melt(data,id.vars=c("person","TP","ln_AC"))
data$r2 <- lmStats[match(data$person,rownames(lmStats)),1]
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person) +
geom_text(data=lmStats,label=lmStats$lab)
An easier way (less steps) would be to use facet_grid(~r2), so that you have the R.square value in the title.
If I understand correctly what you mean, assuming you will always have three observation per graph, your main issue would be creating a categorical variable to separate them. Here's one way to accomplish it. Depending on the layout you prefer, you may want to check facet_wrap instead of facet_grid.
library("dplyr")
library("ggplot2")
DF_final <- structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
DF_final %>%
mutate(id = as.character(sapply(1:(nrow(DF_final)/3), rep, 3))) %>%
ggplot(aes(x=TP, y=ln_AC)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(~id)
Created on 2020-02-06 by the reprex package (v0.3.0)

how to print some regression info on a figure

I have a data like this
df<- structure(list(How = c(3.1e-05, 0.000114, 0.000417, 0.00153,
0.00561, 0.0206, 0.0754, 0.277, 1.01, 3.72), Where = c(1, 0.948118156866697,
0.920303987764611, 1.03610743904536, 1.08332987533419, 0.960086785898477,
0.765642506120658, 0.572520170014998, 0.375835106792894, 0.254180720963181
)), class = "data.frame", row.names = c(NA, -10L))
library(drc)
I make my model like this
fit <- drm(formula = Where ~ How, data = df,
fct = LL.4(names=c("Slope","Lower Limit","Upper Limit", "EC50")))
Then I plot it like this
plot(NULL, xlim = c(0.000001, 4), ylim = c(0.01, 1.2),log = "x")
points(df$How, df$Where, pch = 20)
x1 = seq(0.000001, 4, by=0.0001)
y1 = coef(fit)[3] + (coef(fit)[2] - coef(fit)[3])/(1+(x1/coef(fit)[4])^((-1)*coef(fit)[1]))
lines(x1,y1)
Now I want to be able to print the following information inside the figure
max(df$How)
min(df$How)
coef(fit)[2]
coef(fit)[3]
(-1)*coef(fit)[1]
coef(fit)[4]
I tried to do it like this
text(labels = bquote(FirstT~"="~.(round(max(df$How)))))
text(labels = bquote(SecondT~"="~.(round(min(df$How))))
text(labels = bquote(A[min]~"="~.(round(coef(fit)[2]))))
text(labels = bquote(A[max]~"="~.(coef(fit)[3]))))
text(labels = paste0("Slope = ", round((-1)*coef(fit)[1])))
which of course does not work. I am more into an automatic way to find a place in right left corner of the figure that print these info
In the code below, we get the plot area coordinate ranges with par("usr") and then use those and the data point locations to automatically place the labels in the desired locations.
# Reduce margins
par(mar=c(5,4,0.5,0.5))
# Get extreme coordinates of plot area
p = par("usr")
p[1:2] = 10^p[1:2] # Because xscale is logged
text(max(df$How), df$Where[which.max(df$How)],
labels = bquote(FirstT~"="~.(round(max(df$How)))), pos=1)
text(min(df$How), df$Where[which.min(df$How)],
labels = bquote(SecondT~"="~.(round(min(df$How)))), pos=1)
text(1.1*p[1], p[3] + 0.02*diff(p[3:4]),
labels = bquote(A[min]~"="~.(round(coef(fit)[2]))), adj=c(0,0))

plot(var()) displays two different plots, how do I merge them into one? Also having two y axis

> dput(head(inputData))
structure(list(Date = c("2018:07:00", "2018:06:00", "2018:05:00",
"2018:04:00", "2018:03:00", "2018:02:00"), IIP = c(125.8, 127.5,
129.7, 122.6, 140.3, 127.4), CPI = c(139.8, 138.5, 137.8, 137.1,
136.5, 136.4), `Term Spread` = c(1.580025, 1.89438, 2.020112,
1.899074, 1.470544, 1.776862), RealMoney = c(142713.9916, 140728.6495,
140032.2762, 139845.5215, 139816.4682, 139625.865), NSE50 = c(10991.15682,
10742.97381, 10664.44773, 10472.93333, 10232.61842, 10533.10526
), CallMoneyRate = c(6.161175, 6.10112, 5.912088, 5.902226, 5.949956,
5.925538), STCreditSpread = c(-0.4977, -0.3619, 0.4923, 0.1592,
0.3819, -0.1363)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I want to make my autoregressive plot like this plot:
#------> importing all libraries
library(readr)
install.packages("lubridtae")
library("lubridate")
install.packages("forecast")
library('ggplot2')
library('fpp')
library('forecast')
library('tseries')
#--------->reading data
inputData <- read_csv("C:/Users/sanat/Downloads/exercise_1.csv")
#--------->calculating the lag=1 for NSE50
diff_NSE50<-(diff(inputData$NSE50, lag = 1, differences = 1)/lag(inputData$NSE50))
diff_RealM2<-(diff(inputData$RealMoney, lag = 1, differences = 1)/lag(inputData$RealMoney))
plot.ts(diff_NSE50)
#--------->
lm_fit = dynlm(IIP ~ CallMoneyRate + STCreditSpread + diff_NSE50 + diff_RealM2, data = inputData)
summary(lm_fit)
#--------->
inputData_ts = ts(inputData, frequency = 12, start = 2012)
#--------->area of my doubt is here
VAR_data <- window(ts.union(ts(inputData$IIP), ts(inputData$CallMoneyRate)))
VAR_est <- VAR(y = VAR_data, p = 12)
plot(VAR_est)
I want to my plots to get plotted together in same plot. How do I serparate the var() plots to two separate ones.
Current plot:
My dataset :
dataset
Okay, so this still needs some work, but it should set the right framework for you. I would look more into working with the ggplot2 for future.
Few extra packages needed, namely library(vars) and library(dynlm).
Starting from,
VAR_est <- VAR(y = VAR_data, p = 12)
Now we extract the values we want from the VAR_est object.
y <- as.numeric(VAR_est$y[,1])
z <- as.numeric(VAR_est$y[,2])
x <- 1:length(y)
## second data set on a very different scale
par(mar = c(5, 4, 4, 4) + 0.3) # Leave space for z axis
plot(x, y, type = "l") # first plot
par(new = TRUE)
plot(x, z, type = "l", axes = FALSE, bty = "n", xlab = "", ylab = "")
axis(side=4, at = pretty(range(z)))
mtext("z", side=4, line=3)
I will leave you to add the dotted lines on etc...
Hint: Decompose the VAR_est object, for example, VAR_est$datamat, then see which bit of data corresponds to the part of the plot you want.
Used some of this

Plotting data in a list in R

I have a bunch of .csv files that I want to read into a list, then create plots.
I've tried the code below, but get an error when trying to cbind. Below is the dput from 2 example files. Each file represents weather data from seperate stations. Ideally I would plot prcp data (column) from each file into one plot window. I don't have much experience working with data in a list.
file1 <- structure(list(mxtmp = c(18.974, 20.767, 21.326, 19.669, 18.609,
21.322), mntmp = c(4.026, 5.935, 8.671, 6.785, 3.493, 6.647),
prcp = c(0.009, 0.046, 0.193, 0.345, 0.113, 0.187)), .Names = c("mxtmp",
"mntmp", "prcp"), row.names = c(NA, 6L), class = "data.frame")
.
file2 <- structure(list(mxtmp = c(18.974, 20.767, 21.326, 19.669, 18.609,
21.322), mntmp = c(4.026, 5.935, 8.671, 6.785, 3.493, 6.647),
prcp = c(0.009, 0.046, 0.193, 0.345, 0.113, 0.187)), .Names = c("mxtmp",
"mntmp", "prcp"), row.names = c(NA, 6L), class = "data.frame")
I read these files from a directory into a list:
myFiles <- list.files(full.names = F, pattern = ".csv")
my.data <- lapply(myFiles, read_csv)
my.data
names(my.data) <- gsub("\\.csv", " ", myFiles)
I get an error on the line below:
my.data <- lapply(my.data, function(x) cbind(x = seq_along(x), y = x))
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 3, 34333
list.names <- names(my.data)
lns <- sapply(my.data, nrow)
my.data <- as.data.frame(do.call("cbind", my.data))
my.data$group <- rep(list.names, lns)
My plot code:
library(ggplot2)
ggplot(my.data, aes(x = x, y = y, colour = group)) +
theme_bw() +
geom_line(linetype = "dotted")
If you don't need to keep the data frames around for anything else, then you can just read and plot all at once. The column names in your plot code don't match the column names in your data frames. So here's a general approach that you'll need to tailor to your actual data. The code below reads each data frame and creates a plot from it and then returns a list containing the plots:
plot.list = lapply(myFiles, function(file) {
df = read_csv(file)
ggplot(df, aes(x = x, y = y, colour = group)) +
theme_bw() +
geom_line(linetype = "dotted")
})
# Lay out all the plots together
library(gridExtra)
do.call(grid.arrange, plot.list)

time series plot in R

My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])

Resources