Applying window constraint in DTW - r

I use dtw package in r.
I am comparing the results of the simulation against those of experiment results.
I need to adjust the time shift (neither the scale in x-axis nor in y-axis), because the timing of events can be a little bit different.
I use the following function.
alignment<-dtw(exp, sim, keep=TRUE,
window.type='sakoechiba',
window.size=1);
Even though I set the window.size=1, the points that are quite far to each other are matched together.
How can I fix this issue? thanks,

Related

Normalise positively skewed data with heavy tailed distribution

I am trying to figure out how to normalised some positively skewed data.
data
I really need it to have some parvence of positive distribution, but I have already tried log-transforming and it simply does not work. I get this kind of distribution.
log.data
I also tried sqrt(), but still no joy.
Should I just get rid of some of the extreme values on the tail? Why is log() not really doing much in terms of normalising my data?
Log transforming your data won't necessarily make it unskewed, but it does reduce the data range in the axis it was applied. Read this paper about using log transformations.
Nevertheless, a simple log transformation formatted your x-axis from a 1.2 e+07 range to a 0.2 range according to your image.

How to reproduce this graphical explanation (a scatter plot) of how covariance works?

I found this graphical intuitive explanation of covariance:
32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest)
The whole material can be found at:
https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean
I would like to recreate this sort of graphical illustration in R, but I'm not sufficiently familiar with R's plotting tools. I don't even know where to start in order to get those colored rectangles between each pair of data points, let alone make them semi-transparent.
I think this could make a very efficient teaching tool.
The cor.rect.plot function in the TeachingDemos package makes plots similar to what is shown. You can modify the code for the function to make the plot even more similar if you desire.

Align bars in ciplot

I'm working with the ciplot graphing module for Stata and am encountering a problem with the alignment of bars when I use the by() option. Here's a trivial example demonstrating the issue:
webuse citytemp, clear
ciplot heatdd cooldd, by(region) horizontal recast(conn)
So, the graph shows means and confidence intervals for two variables across categories of the region variable. The bars for the different variables do not align horizontally, though. For each region, the point and bar for heatdd is one line above, and the point and bar for cooldd is one line below, the category label. I would like these to be on the same line, but I can't figure out how to achieve it.
I'm open to solutions that do not involve ciplot, but I have found it to be useful for the specific task I'm working on.
This is my program (in Stata terms, downloadable via ssc install ciplot) so I can speak confidently. (On Statalist, it's expected that you explain the exact provenance of user-written programs; that would be good practice here too.)
It's not a bug; it's a feature (supposedly).
The offsets are entirely deliberate, to avoid messes when two or more intervals would just overlap and occlude each other, which is entirely likely when groups or comparable variables have similar values, which in turn is common when you do this. Even in your example, intervals for heating and cooling degree-days for the South would overlap otherwise, so the graph makes the point for me.
I can see that it's not what you want, but
There is no option in ciplot to remove the offset. I can see a case for one, but
My advice is now to use statsby to get a reduced dataset containing the confidence interval information, and then the graphics are typically a couple of command lines and you get to choose what you want. This approach is documented in a paper easily accessible from the Stata Journal.
You are always welcome to clone the program and modify the code using a different program name, with notional mention of the original.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

R: update plot [xy]lims with new points() or lines() additions?

Background:
I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.
Here's my problem:
I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?
I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.
Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?
I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.
require(ggplot2)
make some data and get the range:
foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))
make an initial ggplot object and plot it:
p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p
make some more data and add it to the plot
foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))
p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))
plot the new object
p
I think (1) is the best option. I actually don't think this isn't elegant. I think it would be more computationally intensive to redraw every time you hit a point greater than xlim or ylim.
Also, I saw in Peter Hoff's book about Bayesian statistics a cool use of ts() instead of lines() for cumulative sums/means. It looks pretty spiffy:

Resources