I want to follow up on this thread, as it's over 2 years old and I also can't comment because I do not have enough posts, yet.
plot acf of several timeseries in one plot
I would like to understand why all additional lines start at lag=1 and not 0. How can I have them start at lag=0 like the first one?
Also, is there a way to extend the x-axis to negative values? When I do pairwise ccf, I get values from -10 to +10, which nicely shows the pattern I'm looking for, but with acf I only get lags of either -6 or +6.
Also, please apologize my ignorance, but what does the dashed blue line at 0.4 represent? Significance? I've seen the line at various values in different examples.
Thank you in advance.
Here the code, I basically used the same as in the link above.
> data3
Maui8 Maui7 Maui6 Olowalu Maalaea
1 1.01532397 0.7583463 -1.45102480 0.37355214 0.093384619
2 0.84997103 0.7802248 -1.47906584 0.57370139 0.000741584
3 0.65297103 0.9325412 -1.31256709 0.29211557 0.077706758
4 0.42029456 0.8041302 -1.36599992 0.15763796 0.018583624
5 0.15063769 0.5932333 -1.00933326 -0.03478742 0.073490340
6 0.14522593 0.4739607 -0.82896012 0.22469641 0.226357256
7 0.03779456 0.4774847 -0.09524122 0.42900612 0.194261484
8 -0.39651917 -0.2433839 0.07535580 -0.03204488 0.384578649
9 -0.99220544 -1.3080379 0.07143167 -0.57821403 0.012594818
10 -1.58116623 -1.3739277 -0.28876112 -1.34129239 -0.543698715
11 -1.68365642 -1.5527201 0.35511326 -0.99125508 -0.574656426
12 -1.67555838 -1.6044574 0.21679237 -1.05519787 -0.731770854
13 -1.64012701 -1.6975577 0.68442918 -1.20809587 -0.888636526
14 -1.22618583 -1.3975012 0.94365182 -0.84284090 -0.611341749
15 -1.12916623 -0.8248387 1.05953344 -0.86989314 -0.242448715
16 -1.11394684 -0.3294150 1.41744881 -0.45954904 -0.331766245
17 -0.41821140 -0.4312582 1.19811924 -0.45322699 -0.384893352
18 0.22428860 -0.2696410 1.14340119 -0.28008162 -0.323007387
19 0.69397114 -0.1249800 1.12954154 0.48571412 0.074298377
20 1.55118345 1.1953590 0.91711047 1.47251236 0.802606648
21 1.76527075 1.6837135 0.50540620 1.30325798 0.951992613
22 1.34356440 1.6247940 -0.09836573 1.21764394 0.794730708
23 1.59601480 0.9492149 -0.69564643 0.87988078 0.490006397
24 1.41023107 0.8847163 -1.09236948 0.73676048 0.436886096
> ACF<-acf(data3)
> plot(ACF, type="l", max.mfrow=1)
> lines(ACF$acf[-1, 2,3], lty=1, col="red", lwd=1)
> lines(ACF$acf[-1, 2,4], lty=1, col="green", lwd=1)
Related
attempting to produce a stacked barplot (something like the plot below, except for multiple years and stacked bars). Ideally the x-axis would be months J, F, M...repeating (realise row names cannot duplicate but I wondered if there was a way to label the axis and then group by year) and there would be 2 y-axes - same as the example. I'm trying to plot the 2 * 'RainAvg' columns as stacked bars against the right-hand axis, and then the 2 * 'GRACEAnom' columns as 2 lines relating to the left axis. Not sure where to begin....any help appreciated as always - hopefully this is clear. I've added the first few rows of my data below the image:
> head(Figures, 34)
DecimDate GRACEAnomLVB RainAvgLVB GRACEAnomVNB RainAvgVNB
1 2003.000 13.46956583 5.749109 6.15705017 3.478762
2 2003.083 6.31473051 5.331211 0.97906465 2.873399
3 2003.167 3.63883171 10.363173 0.77220028 8.090037
4 2003.250 6.49458212 17.210327 1.24673188 17.405001
5 2003.333 11.33909662 14.840302 5.56158736 15.673977
6 2003.417 9.38271799 7.536387 6.00824271 9.961779
7 2003.500 7.42633936 7.322593 6.45489806 9.617705
8 2003.583 3.60612356 11.447746 5.60098976 15.430943
9 2003.667 3.44546767 7.968092 6.63687748 8.056800
10 2003.750 2.75612873 8.769927 5.22673658 8.333266
11 2003.833 5.30475366 9.782655 6.91241363 9.305419
12 2003.917 8.68239955 7.474251 7.37673817 5.731811
13 2004.000 5.48150209 9.109684 4.04360382 5.772269
14 2004.083 2.62570392 6.976879 -0.71817402 3.780555
15 2004.167 1.45723630 10.559618 -2.23807975 6.471265
16 2004.250 5.98037042 17.895779 0.04639658 17.677118
17 2004.333 7.35279067 7.203534 3.23732162 8.284600
18 2004.417 1.41878133 4.536058 0.41008077 6.321057
19 2004.500 -0.89443672 5.439750 0.09167621 7.704055
20 2004.583 -3.98526800 9.248759 -0.22851368 12.973643
21 2004.667 -4.91880694 12.214854 -0.30143818 12.626995
22 2004.750 -4.13842871 10.903502 1.08566462 11.491835
23 2004.833 1.04833693 15.731056 4.50875694 12.300916
24 2004.917 2.93758790 8.431368 3.10471313 3.997466
...and so on until December 2012.
I'm not quite clear on a couple of items in the description of your chart such as whether you're looking for one chart for all years or one for each year but the following code might help get you started. The basic idea is to draw the bar chart and then rescale the plot window for the line plots. Chart titles and labels are added as required.
org_mar <- par()$mar
par(mar=c(5,4,4,5)+.1)
Figures <- as.matrix(Figures)
nrow_F <- nrow(Figures)
x_labs <- cbind(1:nrow_F,c("J","F","M","A","M","J","J","A","S","O","N","D") )[,2]
# make bar chart
barplot(t(Figures[,c("RainAvgLVB","RainAvgVNB")]), yaxt="n", names.arg=x_labs,
xlab = "Monthly", font.lab=2, xlim= 1.2*c(1,nrow_F)-.5)
axis(side=4)
mtext("Mean Monthly Rainfall (mm)", side=4, line=2.5, font=2)
abline(h=0)
# rescale the plot window and draw the line plots
plot.window(xlim=c(1,nrow_F), ylim=range(Figures[,c("GRACEAnomLVB","GRACEAnomVNB")]))
axis(side=2)
mtext("Water Storage Anomalay (cm)", side=2, line=2.5, font=2)
abline(v=par()$usr[1])
lines( Figures[,2], col="black", lty=1, lwd=2)
lines( Figures[,4], col="blue", lty=2, lwd=2)
par(mar=org_mar)
This should make a chart like the following:
I have a monthly weather dataset and I want to plot a line graph.
My dataset is here:
weather.data2:
date mtemp mrh ah1 ah2 vaporpressure
1 31/01/2008 15.95161 74.96774 10.463958 10.376739 12.60586
2 29/02/2008 13.32759 71.96552 8.506296 8.457573 10.32157
3 31/03/2008 19.98065 76.00000 13.461108 13.301972 16.07004
4 30/04/2008 23.06667 85.06667 17.884817 17.612111 21.20251
5 31/05/2008 25.34194 82.96774 19.904886 19.548480 23.47831
6 30/06/2008 26.67000 88.13333 22.655861 22.217597 26.65403
7 31/07/2008 28.37097 82.16129 23.216533 22.715155 27.21262
8 31/08/2008 28.38387 79.45161 22.520920 22.034029 26.39536
9 30/09/2008 28.96667 74.56667 21.834234 21.345684 25.55925
10 31/10/2008 26.50000 77.03226 19.685226 19.308482 23.16607
11 30/11/2008 21.94667 65.33333 13.473522 13.271739 15.98306
12 31/12/2008 18.43548 63.38710 10.184461 10.081156 12.20581
13 31/01/2009 15.32258 63.87097 8.663397 8.597653 10.45324
14 28/02/2009 20.51071 81.28571 14.778456 14.596660 17.62418
15 31/03/2009 19.69032 83.09677 14.448571 14.280276 17.25859
16 30/04/2009 22.02333 77.13333 15.350085 15.134001 18.23880
17 31/05/2009 25.53548 78.29032 19.013323 18.669040 22.41749
18 30/06/2009 28.14333 81.36667 22.795169 22.309445 26.72967
19 31/07/2009 29.04839 80.77419 23.784844 23.249975 27.83724
20 31/08/2009 29.43226 79.96774 24.035433 23.482366 28.10789
21 30/09/2009 28.82667 78.46667 22.788483 22.282172 26.68366
22 31/10/2009 26.16774 73.06452 18.258184 17.917379 21.50479
23 30/11/2009 20.48000 72.20000 13.498049 13.315853 16.06684
24 31/12/2009 17.31290 78.06452 11.815604 11.705578 14.19231
Here is my plot:
weather.data2$date=as.Date(as.character(weather.data$date),format="%d/%m/%Y")
windows(width=7*1.5,height=12/2)
par(mar=c(4,4,2,5))
plot(weather.data2$date,weather.data2$ah1,ylim=c(-2,30),type='l',col="blue", xlab="month", ylab=NA)
par(new=TRUE)
plot(weather.data2$date,weather.data2$ah2,ylim=c(-2,30),type='l',col="green", xlab="", ylab=NA)
par(new=TRUE)
plot(weather.data2$date,weather.data2$mtemp,ylim=c(-2,30),type='l',col="red", xlab="", ylab=NA)
par(new=TRUE)
plot(weather.data2$date,weather.data2$mrh,ylim=c(-2,100),type='l',col="orange", axes=F, xlab=NA, ylab=NA)
axis(side=4)
mtext(side=4,line=3,"Relative Humiditiy (%)")
par(new=TRUE)
plot(weather.data2$date,weather.data2$vaporpressure,ylim=c(-2,30),type='l',col="steelblue", xlab="", ylab=NA)
mtext(side=2,line=3,"Temperature (C)/Vapour Pressure (mb)/Absolute humidity(g/m^3)")
legend("bottomright", c("Relative Humidiity","Temperature","Vapour Pressure","Absolute Humidity 1","Absolute Humidity 2"),lty=1,col = c("orange","red","steelblue","blue","green"),bty="n")
legend("bottomleft",c("Household Contact Enrollment Date"),pch=19,col=c("red"),bty="n")
But then my when I plotted it, it looked like this...
I want it to be looking like this, not this exactly though (this is a loess regression fitted to a daily average, that is why I calculated the monthly average so that it will hopefully look better than the one below)
Is the following helping you at all?
x$date=as.Date(x$date, format='%d/%m/%Y')
library(reshape2)
library(ggplot2)
x=melt(x,id='date',value.name='VALUE',variable.name='FACTOR')
x$VALUE=as.numeric(x$VALUE)
ggplot(x, aes(date, VALUE, group=FACTOR, color=FACTOR))+geom_line()
Using ggplot2 you cannot have two axes, but you can do faceting and improve how it looks. Is this at the right direction?
How to create a categorical bubble plot, using GNU R, similar to that used in systematic mapping studies (see below)?
EDIT: ok, here's what I've tried so far. First, my dataset (Var1 goes to the x-axis, Var2 goes to the y-axis):
> grid
Var1 Var2 count
1 Does.Not.apply Does.Not.apply 53
2 Not.specified Does.Not.apply 15
3 Active.Learning..general. Does.Not.apply 1
4 Problem.based.Learning Does.Not.apply 2
5 Project.Method Does.Not.apply 4
6 Case.based.Learning Does.Not.apply 22
7 Peer.Learning Does.Not.apply 6
10 Other Does.Not.apply 1
11 Does.Not.apply Not.specified 15
12 Not.specified Not.specified 15
21 Does.Not.apply Active.Learning..general. 1
23 Active.Learning..general. Active.Learning..general. 1
31 Does.Not.apply Problem.based.Learning 2
34 Problem.based.Learning Problem.based.Learning 2
41 Does.Not.apply Project.Method 4
45 Project.Method Project.Method 4
51 Does.Not.apply Case.based.Learning 22
56 Case.based.Learning Case.based.Learning 22
61 Does.Not.apply Peer.Learning 6
67 Peer.Learning Peer.Learning 6
91 Does.Not.apply Other 1
100 Other Other 1
Then, trying to plot the data:
# Based on http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/
grid <- subset(grid, count > 0)
radius <- sqrt( grid$count / pi )
symbols(grid$Var1, grid$Var2, radius, inches=0.30, xlab="Research type", ylab="Research area")
text(grid$Var1, grid$Var2, grid$count, cex=0.5)
Here's the result:
Problems: axis labels are wrong, the dashed grid lines are missing.
Here is ggplot2 solution. First, added radius as new variable to your data frame.
grid$radius <- sqrt( grid$count / pi )
You should play around with size of the points and text labels inside the plot to perfect fit.
library(ggplot2)
ggplot(grid,aes(Var1,Var2))+
geom_point(aes(size=radius*7.5),shape=21,fill="white")+
geom_text(aes(label=count),size=4)+
scale_size_identity()+
theme(panel.grid.major=element_line(linetype=2,color="black"),
axis.text.x=element_text(angle=90,hjust=1,vjust=0))
This will get you started by adding the tick marks to your xaxis.
To add the lines, just add a line at each level
ggs <- subset(gg, count > 0)
radius <- sqrt( ggs$count / pi )
# ggs$Var1 <- as.character(ggs$Var1)
# set up your tick marks
# (this can all be put into a single line in `axis`, but it's placed separate here to be more readable)
#--------------
# at which values to place the x tick marks
x_at <- seq_along(levels(gg$Var1))
# the string to place at each tick mark
x_labels <- levels(gg$Var1)
# use xaxt="n" to supress the standard axis ticks
symbols(ggs$Var1, ggs$Var2, radius, inches=0.30, xlab="Research type", ylab="Research area", xaxt="n")
axis(side=1, at=x_at, labels=x_labels)
text(ggs$Var1, ggs$Var2, ggs$count, cex=0.5)
also, notice that instead of calling the object grid I called it gg, and then ggs for the subset. grid is a function in R. While it is "allowed" to overwrite the function with an object, it is not recommended and can lead to annoying bugs down the line.
Here a version using levelplot from latticeExtra.
library(latticeExtra)
levelplot(count~Var1*Var2,data=dat,
panel=function(x,y,z,...)
{
panel.abline(h=x,v=y,lty=2)
cex <- scale(z)*3
panel.levelplot.points(x,y,z,...,cex=5)
panel.text(x,y,label=z,cex=0.8)
},scales=(x=list(abbreviate=TRUE))) ## to get short labels
To get the size of bubble proprtional to the count , you can do this
library(latticeExtra)
levelplot(count~Var1*Var2,data=dat,
panel=function(x,y,z,...)
{
panel.abline(h=x,v=y,lty=2)
cex <- scale(z)*3
panel.levelplot.points(x,y,z,...,cex=5)
panel.text(x,y,label=z,cex=0.8)
})
I don't display it since the render is not clear as in the fix size case.
I have created the best fit for a non linear function. It seems to be working correctly:
#define a function
fncTtr <- function(n,d) (d/n)*((sqrt(1+2*(n/d))-1))
#fit
dFit <- nls(dData$ttr~fncTtr(dData$n,d),data=dData,start=list(d=25),trace=T)
summary(dFit)
plot(dData$ttr~dData$n,main="Fitted d value",pch=19,)
xl <- seq(min(dData$n),max(dData$n), (max(dData$n) - min(dData$n))/1000)
lines(xl,predict(dFit,newdata=xl,col=blue)
The plot for my observations are coming out correctly. I am having problems to display the best fit curve on my plot. I create the xl independent variable with 1000 values and I want to define the new values using the best fit. When I call the "lines" procedure, I get the error message:
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
If I try to execute only the predict function:
a <-predict(dFit,newdata=xl)
str(a)
I can see that xl has 1000 components but "a" has only 16 components. Shouldn't I have the same number of values in a?
data used:
n ttr d
1 35 0.6951 27.739
2 36 0.6925 28.072
3 37 0.6905 28.507
4 38 0.6887 28.946
5 39 0.6790 28.003
6 40 0.6703 27.247
7 41 0.6566 25.735
8 42 0.6605 26.981
9 43 0.6567 27.016
10 44 0.6466 26.026
11 45 0.6531 27.667
12 46 0.6461 27.128
13 47 0.6336 25.751
14 48 0.6225 24.636
15 49 0.6214 24.992
16 50 0.6248 26.011
Ok, I think I found the solution, however I'm not sure I would be able to explain it.
When calling predict.nls, what you're inputting to argument newdata has to be named according to the variable with which you're predicting (here n) and the name has to match that given in the original call to nls.
#Here I replaced dData$n with n
dFit <- nls(ttr~fncTtr(n,d),data=dData,start=list(d=25),trace=T)
plot(dData$ttr~dData$n,main="Fitted d value",pch=19,)
xl <- seq(min(dData$n),max(dData$n), (max(dData$n) - min(dData$n))/1000)
a <- predict(dFit,newdata=list(n=xl))
length(a)==length(xl)
[1] TRUE
lines(xl,a,col="blue")
I have a dataset that looks like so:
x y
1 0.0000 0.4459183993
2 125.1128 0.4068805502
3 250.2257 0.3678521348
4 375.3385 0.3294434397
5 500.4513 0.2922601919
6 625.5642 0.2566381551
7 750.6770 0.2229130927
8 875.7898 0.1914207684
9 1000.9026 0.1624969456
10 1126.0155 0.1364773879
11 1251.1283 0.1136978589
12 1376.2411 0.0944717371
13 1501.3540 0.0786550515
14 1626.4668 0.0656763159
15 1751.5796 0.0549476349
16 1876.6925 0.0458811131
17 2001.8053 0.0378895151
18 2126.9181 0.0304416321
19 2252.0309 0.0231041362
20 2377.1438 0.0154535572
21 2502.2566 0.0070928195
22 2627.3694 -0.0020708606
23 2752.4823 -0.0119351534
24 2877.5951 -0.0223944877
25 3002.7079 -0.0332811155
26 3127.8208 -0.0442410358
27 3252.9336 -0.0548855203
...
Full data available here.
It's easier to see visually by plotting x and y with a zero intercept line:
ggplot(dat,aes(x,y)) + geom_line() + geom_hline(yintercept=0)
You can see the plot here (if you don't want to download the data and plot it yourself.)
I want to pick out 'patches' defined as the distance along x from when the line goes above zero on the y till it goes below zero. This will always happen at least once (since the line starts above zero), but can happen many times.
Picking out the first patch is easy.
patch1=dat[min(which(dat$y<=0.000001)),]
But how would I loop through and pick up subsequent patches?
Here's a complete working solution:
# sample data
df <- data.frame(x=1:10, y=rnorm(10))
# find positive changes in "y"
idx <- which(c(FALSE, diff(df$y > 0) == 1))
# get the change in "x"
patches <- diff(c(0, df[idx, "x"]))