R: How does stat_density_ridges modify and plot data? - r

<Disclaimer(s) - (1) This is my first post, so please be gentle, specifically regarding formatting and (2) I did try to dig as much as I could on this topic before posting the question here>
I have a simple data vector containing returns of 40 portfolios on the same day:
Year Return
Now -17.39862061
Now -12.98954582
Now -12.98954582
Now -12.86928749
Now -12.37044334
Now -11.07007504
Now -10.68971539
Now -10.07578182
Now -9.984867096
Now -8.764036179
Now -8.698093414
Now -8.594026566
Now -8.193638802
Now -7.818599701
Now -7.622627735
Now -7.535216808
Now -7.391239166
Now -7.331315517
Now -5.58059597
Now -5.579797268
Now -4.525201797
Now -3.735909224
Now -2.687532902
Now -2.65363884
Now -2.177522898
Now -1.977644682
Now -1.353205681
Now -0.042584345
Now 0.096564181
Now 0.275416046
Now 0.638839543
Now 1.959529042
Now 3.715519428
Now 4.842819691
Now 5.475946426
Now 6.380955219
Now 6.535937309
Now 8.421762466
Now 8.556800842
Now 10.39185524
I am trying to plot these returns to compare versus other days (so the rest of my history e.g.). I tried to use stat_density_ridges as per the code block below
ggplot(data = data.plot, aes(x = Return, y = Year, fill = factor(..quantile..))) +
stat_density_ridges(geom = "density_ridges_gradient",calc_ecdf = TRUE,
quantiles = c(0.025, 0.5, 0.975),
quantile_lines = TRUE)
As you can see - the "year" in this case is the same i.e. there is no height parameter, yet I get a nice ridg(y) chart. While the chart is beautiful to behold, and very very awesome, I am at a loss to determine how the plotting function is computing the density in this case, specially the height.
This is the output chart I get (I have omitted the formatting code here since it doesn't make a difference to my question):
Portfolio Return Distribution Plots - US versus Europe
I tried digging into the code of the function itself, but came up with a total blank. The documentation didn't help (except perhaps give me a hint that the function plots continous distributions).
Any help, or guidance, or even a nudge in the right direction would be extremely helpful.

Related

Empirical Cumulative Density Function - R software

I have a problem with plotting ECDF. I try to reverse the x axis value like 1-(the function).
Because I wanna have smaller in the beginning of the graph and decreasing like in my reference graph.
load("91-20.RData")
ts <- data.frame(dat91,dat92,dat93,dat94,dat95,dat96,dat97,
dat98,dat99,dat00,dat11,dat12,dat12,dat13,
dat14,dat15,dat16,dat17,dat18,dat19,dat20)
ts
tsclean <- na.omit(ts)
#--------------------------------------------------------
ggplot(tsclean, aes(tsclean$dat91)) +
stat_ecdf(geom = "step")
This graph what i have, but i wanna duplicate like the reference
load("91-20.RData")
ts <- data.frame(dat91,dat92,dat93,dat94,dat95,dat96,dat97,
dat98,dat99,dat00,dat11,dat12,dat12,dat13,
dat14,dat15,dat16,dat17,dat18,dat19,dat20)
ts
tsclean <- na.omit(ts)
I think the graph you're looking for is called an "exceedance" graph. A web search finds some resources; try a web search for "R exceedance graph".
EDIT: This is more suitable as a comment than an answer, but my web browser is being unhelpful at the moment; sorry for the distraction.

Finding the x-value at a certain y-value on a ggplot

I am currently having some difficulties trying to find the Effective Concentration of 50% for one of my datasets. To shortly summarize what it is, it is data on how levels of glutathione in cells depleted from 100% when exposed to a substance known as HEMA.
GSH50 <- read.table("Master list for all GSH data T9 TVN.csv", header = TRUE, sep = ";", dec = ",")
After some further subsetting, I end up with a plot like this
GSH plot
I have several more plots in addition to this, so I need to find the EC50 value for everyone so I can then compare them with each other (the problem is consistent on several plots, so if it can be fixed here it should be fixed on the others as well).
From an earlier dataset with almost the same setup (the only difference being x-axis values) I managed to get fairly correct EC50 using a setup like this:
HG <- approxfun(x, y)
optimize(function(t0) abs(HG(t0) - 50), interval = range(x))
Where I then got my EC50 value from the optimize function. However, it does not work on this data for some reason, as if I input the value from optimize, I end up getting this GSH plot instead.
If somebody has any idea how I can fix this issue, it would be most appreciated.
Edit
If you want a reproducible dataset I gathered the averages of the data, and as such the plot should still be similar to the GSH plots I have shown:
Concentration <- seq(from = 0, to = 9, by=1)
GSH <- c(100, 67.405, 47.78, 39.2325, 33.97, 28.435, 26.97, 24.5125, 23.5275, 21.565)
df <- data.frame(Concentration, GSH)
ggplot(df, aes(Concentration, GSH)) + geom_smooth()
I am quite certain that the dose is high enough to reach the lower level, but I have not stored the model somewhere. I hope the example data provided is enough.
Edit2
I should mention that the approx and optimize code does work for the example when we use geom_lines(), but for some reason, it is not as accurate on geom_smooth().

Mosaic plot and text values

I created structable from Titanic dataset and used mosaic function for it. Everything worked great, hovewer I also wanted to label each box from mosaic plot with quantity of titanic passangers given their Class, Survival and Sex. As it turns out, I am not able to do that. I know I need to use labeling_cells to achive that, hovewer i am not able to use it (and i wan't able to find any example) in combination with stuctable and below code.
library("vcd")
struct <- structable(~ Class + Survived + Sex, data = Titanic)
mosaic(struct, data = Titanic, shade = TRUE, direction = "v")
If I understand your question correctly, then the last example in ?labeling_cells is pretty close to what you want to do. Using your example, the labeling_cells() can be added afterwards provided that the viewport tree is not popped. The only aspect that is somewhat awkward is that the struct object has to be a regular table again for the labeling. I have to ask David, the main author, whether this could be handled automatically.
mosaic(struct, shade = TRUE, direction = "v", pop = FALSE)
labeling_cells(text = as.table(struct), margin = 0)(as.table(struct))
Fixed in upstream in vcd 1.4-4, but note that you can simply use
mosaic(struct, labeling = labeling_values)

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

What does negative length vectors in a wireframe plot (lattice package) means?

I want to plot a wireframe in R using the lattice package. However, I get the following error message "error using packet 1 negative length vectors are not allowed". The data looks like the following:
> result_mean
experiment alpha beta packet
1 0 1.0 1 3.000000
2 0 1.1 1 2.571429
The command to create the data is the following
png(file=paste("foobar.png"),width=1280, height=1280);
plot <- wireframe(result_mean$packet ~ result_mean$alpha * result_mean$beta,
data=result_mean, scales = list(arrows=FALSE, cex= .45, col = "black", font = 3),
drape = TRUE, colorkey = TRUE, main = "Foo",
col.regions = terrain.colors(100),
screen = list(z = -60, x = -60),
xlab="alpha", ylab="beta", zlab="mean \npackets");
print(plot);
dev.off();
I'm wondering what this error message means and if there is a good way to debug this?
Thanks in advance!
Debugging lattice graphics is a bit difficult because (a) the code is complex and multi-layered and (b) the errors get trapped in a way that makes them hard to intercept. However, you can at least get some way in diagnosing the problem.
First create a minimal example. I suspected that your problem was that your data fall on a single line, so I created data that looked like that:
d <- data.frame(x=c(1,1.1),
y=c(1,1),
z=c(2,3))
library(lattice)
wireframe(z~y*x,data=d)
Now confirm that fully three-dimensional data (data that define a plane) work just fine:
d2 <- data.frame(expand.grid(x=c(1,1.1),
y=c(1,1.1)),
z=1:4)
wireframe(z~y*x,data=d2)
So the question is really -- did you intend to draw a wireframe of two points lying on a line? If so, what did you want to have appear in the plot? You could hack things a little bit to set the y values to differ by a tiny bit -- I tried it, though, and got no wireframe appearing (but no error either).
edit: I did a bit more tracing, with various debug() incantations (and searching the source code of the lattice package and R itself for "negative length") to deduce the following: within a function called lattice:::panel.3dwire, there is a call to a C function wireframePanelCalculations, which you can see at https://r-forge.r-project.org/scm/viewvc.php/pkg/src/threeDplot.c?view=markup&root=lattice
Within this function:
nh = (nx-1) * (ny-1) * ng; /* number of quadrilaterals */
sHeights = PROTECT(allocVector(REALSXP, nh));
In this case nx is zero, so this code is asking R to allocate a negative-length vector, which is where the error comes from.
In this case, though, I think the diagnosis is more useful than the explicit debugging.

Resources