r - different result with boxplot/quantile function - r

I have created a boxplot with the code
boxplot(X,horizontal = TRUE, axes = FALSE, staplewex = 1, boxwex = 0.05)
text(x = boxplot.stats(X[,1])$stats, labels = boxplot.stats(X[,1])$stats, y = 1.04).
However, the results I get from the text/boxplot function are different from then of the quantile(X[,1],0.25) or summarize(X) functions. I think boxplot maybe uses other definitions. But I was confucsed by the boxplot documentation since it is not very readable. Maybe someone can explain the differences to me!
Thank you for your help!
Simon

Related

Lag 0 is not plotted in GGCcf

With the following code I plotted the Cross Correlation of my data. All works wonderful, however the visualization does not depict Lag 0, which is highly important for my studies.
p= ggCcf(
df_ccf$Asia_Co,
df_ccf$EU_USA,
lag.max = 10,
type = c("correlation", "covariance"),
plot = TRUE,
na.action = na.contiguous)
plot(p)
The plot is looking like that:
Head of data:
I encountered the same issue; it might be an issue/bug with 'ggCcf' from the forecast library. I couldn't get ggCcf to work, no matter what I tried. Anyone who wants to reproduce this behaviour, try:
ggCcf(c(1,2,3,4),c(2,3,4,6))
The workaround is using regular/base R ccf:
max_lag = 10
result = ccf(series1, series2, lag.max = max_lag)
y = results$acf
x = c(-max_lag:max_lag)
You can use these two series to plot the ccf using ggplot2 and choosing an appropriate ylim.
The downside of this all is less conveniance, but the upside is that you can add some flair/styling to your plot now that you are doing everything yourself anyway ;).

Change name of groups in bal.plot

I am trying to visualize results from MatchIt procedure with bal.plot() from cobalt package.
It works just fine, except I would like to change the lables for the group which by default are "Unadjusted sample" and "Adjusted sample".
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group)
Author of cobalt package here. Thank you for using my package!
Edit. Original post at the bottom.
I just added some functionality to bal.plot for this in the development version of cobalt, which can be installed with devtools::install_github("ngreifer/cobalt"). Use the sample.names argument to supply a vector of names to give bal.plot and they'll appear in the facet labels. The vector should be as long as the number of samples (in your case, 2). Your new code should look like this:
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group,
sample.names = c("UNWEIGHTED", "WEIGHTED"))
Of course you can change the names. If you don't want to install the development version of cobalt (it't not guaranteed to be stable), you can use my solutions below.
I didn't intend bal.plot to be used for publication so I didn't make it super flexible, unlike love.plot. One thing you can do is manually program the histograms using ggplot2. Of course, this requires you learning how to use ggplot2, which can be a challenge, and looking at the source code of bal.plot probably won't help because of all the checks and transformations that occur. Here's some code that might work for you:
unweighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = 1,
adj = "UNWEIGHTED",
stringsAsFactors = FALSE)
weighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = AHEAD_nomiss$att.weights,
adj = "WEIGHTED",
stringsAsFactors = FALSE)
data <- rbind(unweighted, weighted)
ggplot(data, aes(x = KCH_TKS, fill = treat)) +
geom_histogram(aes(weight = weights), bins = 10, alpha = .4, color = "black") +
facet_grid(~adj)
One way you can hack bal.plot is to provide a set of weights that are all equal to 1 as well as your desired weights and leave which at its default. If you give the weights names, those names will appear on the facet labels. So, for your example, try
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS",
type = "histogram", mirror = F,
weights = list(UNWEIGHTED = rep(1, nrow(AHEAD_nomiss),
WEIGHTED = AHEAD_nomiss$att.weights),
treat = AHEAD_nomiss$group)
You should see that "UNWEIGHTED" and "WEIGHTED" are the new facet label names. You can of course change them to be whatever you want.

R-package beeswarm generates same x-coordinates

I am working on a script where I need to calculate the coordinates for a beeswarm plot without immediately plotting. When I use beeswarm, I get x-coordinates that aren't swarmed, and more or less the same value:
But if I generate the same plot again it swarms correctly:
And if I use dev.off() I again get no swarming:
The code I used:
n <- 250
df = data.frame(x = floor(runif(n, 0, 5)),
y = rnorm(n = n, mean = 500, sd = 100))
#Plot 1:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
#Plot 2:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
dev.off()
#Plot 3:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
It seems to me like beeswarm uses something like the current plot parameters (or however it is called) to do the swarming and therefore chokes when a plot isn't showing. I have tried to play around with beeswarm parameters such as spacing, breaks, corral, corralWidth, priority, and xlim, but it does not make a difference. FYI: If do.plot is set to TRUE the x-coordinates are calculated correctly, but this is not helpful as I don't want to plot immediately.
Any tips or comments are greatly appreciated!
You're right; beeswarm uses the current plot parameters to calculate the amount of space to leave between points. It seems that setting "do.plot=FALSE" does not do what one would expect, and I'm not sure why I included this parameter.
If you want to control the parameters manually, you could use the functions swarmx or swarmy instead. These functions must be applied to each group separately, e.g.
dfsplitswarmed <- by(df, df$x, function(aa) swarmx(aa$x, aa$y, xsize = 0.075, ysize = 7.5, cex = 1, log = ""))
dfswarmed <- do.call(rbind, dfsplitswarmed)
plot(dfswarmed)
In this case, I set the xsize and ysize values based on what the function would default to for this particular data set. If you can find a set of xsize/ysize values that work for your data, this approach might work for you.
Otherwise, perhaps a simpler approach would be to leave do.plot=TRUE, and then discard the plots.

R: PCA plot with different colors for Sites

I´m recently trying to analyse my data and want to make the graphs a little nicer but I´m failing at this.
So I have a data set with 144 sites and 5 environmental variables. It´s basically about the substrate composition around an island and the fish abundance. On this island there is supposed to be a difference in the substrate composition between the north and the southside. Right now I am doing a pca and with the biplot function it works quite fine, but I would like to change the plot a bit.
I need one where the sites are just points and not numbered, arrows point to the different variable and the sites are colored according to their location (north or southside). So I tried everything i could find.
Most examples where with the dune data and suggested something like this:
library(vegan)
library(biplot)
data(dune)
mod <- rda(dune, scale = TRUE)
biplot(mod, scaling = 3, type = c("text", "points"))
So according to this I would just need to say text and points and R would label the variables and just make points for the sites. When i do this, however I get the Error:
Error in plot.default(x, type = "n", xlim = xlim, ylim = ylim, col = col[1L], :
formal argument "type" matched by multiple actual arguments
No idea how to get around this.
So next strategy I found, is to make a plot manually like this:
require("vegan")
data(dune, dune.env)
mod <- rda(dune, scale = TRUE)
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod,display="species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("bottomright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
This works fine so far as well, I get different colors and points, but now the arrows are missing. So I found that this should be corrected easy, if i just put "display="bp"" in the text line. But this doesn´t work either. Everytime I put "bp" R says:
Error in match.arg(display) :
argument "display" is missing, with no default
So I´m kind of desperate now. I looked through all the answers here and I don´t understand why display="bp" and type=c("text","points") is not working for me.
If anyone has an idea i would be super grateful.
https://www.dropbox.com/sh/y8xzq0bs6mus727/AADmasrXxUp6JTTHN5Gr9eufa?dl=0
This is the link to my dropbox folder. It contains my R-script and the csv files. The one named environmentalvariables_Kon1 also contains the data about north and southside.
So yeah...if anyone could help me. That would be awesome. I really don´t know what to do anymore.
Best regards,
Nancy
You can add arrows with arrows(). See the code for vegan:::biplot.rda to see how it works in the original function.
With your plot, add
g <- scores(mod, display = "species")
len <- 1
arrows(0, 0, len * g[, 1], len * g[, 2], length = 0.05, col = "darkcyan")
You might want to adjust the value of len to make the arrows longer

Principal Component Analysis in R data color

Hi everyone I have a simple question but for which i havent been able to get an answer in any tutorial. Ive done a simple principal component analysis on a set of data and then plot my data with biplot.
CP <- prcomp(dat, scale. = T)
summary(CP)
biplot(CP)
With this i get a scatter plot of my data in terms of the first and second component. I wish to separate my data by color, indicating R to paint my first 20 data in red and next 20 data in blue. I dont know how to tell R to color those two sets of data.
Any help will be very appreciated. thks!
(im very new to R)
Disclaimer: This is not a direct answer but can be tweak to obtain the desired output.
library(ggbiplot)
data(wine)
wine.pca <- prcomp(wine, scale. = TRUE)
print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, ellipse = TRUE, circle = TRUE))
Using plot() will provide you more flexibility - you may use it alone or with text() for text labels as belows (Thanks #flodel for useful comments):
col = rep(c("red","blue"),each=20)
plot(CP$x[,1], CP$x[,2], pch="", main = "Your Plot Title", xlab = "PC 1", ylab = "PC 2")
text(CP$x[,1], CP$x[,2], labels=rownames(CP$x), col = col)
However if you want to use biplot() try this code:
biplot(CP$x[1:20,], CP$x[21:40,], col=c("red","blue"))

Resources