I'm new to PCA. I'm plotting the scores using autoplot from ggfortify and ggplot. Both have the same shape but have different values for the x and y axes. Eg. autoplot goes from -0.2 to 0.2 in the y-axis, and ggplot goes from -0.6 to -0.6. The points on the graphs look the exact same. Only the values of the axes changed. Why is that?
Edit:
I can't really give the full data here as it's very long. I tried these two:
library(ggfortify)
pca.data <- prcomp(my_data)
autoplot(pca.data)
and
my_dataframe <- data.frame(Sample = rownames(pca.data$x),
X = pca.data$x[,1],
Y = pca.data$x[,2])
ggplot(data = my_dataframe, aes(x=X, y=Y, label=Sample)) +
geom_point() +
xlab("PC1") +
ylab("PC2") +
ggtitle("PCA Graph")
According to the vignette, autoplot scales in the same way as the biplot() function. If you don't want it to, you can instead use:
autoplot(pca.data, scale=0)
which (except for axis labels) gives the same at the ggplot command that you used.
Related
I am trying to combine a line plot and horizontal barplot on the same plot. The difficult part is that the barplot is actually counts of the y values of the line plot.
Can someone show me how this can be done using the example below ?
library(ggplot2)
library(plyr)
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
counts <- ddply(dff, ~ y1, summarize, y2 = sum(y2))
# line plot
ggplot(data=dff) + geom_line(aes(x=x,y=y1))
# bar plot
ggplot() + geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
I believe what I need is presented in the pseudocode below but I do not know how to write it out in R.
Apologies. I actually meant the secondary x axis representing the value of counts for the barplot, while primary y-axis is the y1.
ggplot(data=dff) + geom_line(aes(x=x,y=y1)) + geom_bar(data=counts , aes(primary y axis = y1,secondary x axis =y2),stat="identity")
I just want the barplots to be plotted horizontally, so I tried the code below which flip both the line chart and barplot, which is also not I wanted.
ggplot(data=dff) +
geom_line(aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y2,y=y1),stat="identity") + coord_flip()
You can combine two plots in ggplot like you want by specifying different data = arguments in each geom_ layer (and none in the original ggplot() call).
ggplot() +
geom_line(data=dff, aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
The following plot is the result. However, since x and y1 have different ranges, are you sure this is what you want?
Perhaps you want y1 on the vertical axis for both plots. Something like this works:
ggplot() +
geom_line(data=dff, aes(x=y1 ,y = x)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity", color = "red") +
coord_flip()
Maybe you are looking for this. Ans based on your last code you look for a double axis. So using dplyr you can store the counts in the same dataframe and then plot all variables. Here the code:
library(ggplot2)
library(dplyr)
#Data
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
#Code
dff %>% group_by(y1) %>% mutate(Counts=sum(y2)) -> dff2
#Scale factor
sf <- max(dff2$y1)/max(dff2$Counts)
# Plot
ggplot(data=dff2)+
geom_line(aes(x=x,y=y1),color='blue',size=1)+
geom_bar(stat='identity',aes(x=x,y=Counts*sf),fill='tomato',color='black')+
scale_y_continuous(name="y1", sec.axis = sec_axis(~./sf, name="Counts"))
Output:
I'm trying to plot a distribution CDF using R and ggplot2. However, I am finding difficulties in plotting the CDF function after I transform the Y axis to obtain a straight line.
This kind of plot is frequently used in Gumbel paper plots, but here I'll use as example the normal distribution.
I generate the data, and plot the cumulative density function of the data along with the function. They fit well. However, when I apply an Y axis transformation, they don't fit anymore.
sim <- rnorm(100) #Simulate some data
sim <- sort(sim) #Sort it
cdf <- seq(0,1,length.out=length(sim)) #Compute data CDF
df <- data.frame(x=sim, y=cdf) #Build data.frame
library(scales)
library(ggplot2)
#Now plot!
gg <- ggplot(df, aes(x=x, y=y)) +
geom_point() +
stat_function(fun = pnorm, colour="red")
gg
And the output should be something on the lines of:
Good!
Now I try to transform the Y axis according to the distribution used.
#Apply transformation
gg + scale_y_continuous(trans=probability_trans("norm"))
And the result is:
The points are transformed correctly (they lie on a straight line), but the function is not!
However, everything seems to work fine if I do like this, calculating the CDF with ggplot:
ggplot(data.frame(x=sim), aes(x=x)) +
stat_ecdf(geom = "point") +
stat_function(fun="pnorm", colour="red") +
scale_y_continuous(trans=probability_trans("norm"))
The result is OK:
Why is this happening? Why doesn't calculating the CDF manually work with scale transformations?
This works:
gg <- ggplot(df, aes(x=x, y=y)) +
geom_point() +
stat_function(fun ="pnorm", colour="red", inherit.aes = FALSE) +
scale_y_continuous(trans=probability_trans("norm"))
gg
Possible explanation:
Documentation States:
inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.
My guess:
As scale_y_continuous changes the aesthetics of the main plot, we need to turn off the default inherit.aes=TRUE. It seems inherit.aes=TRUE in stat_function picks its aesthetics from the first layer of the plot, and so the scale transformation does not impact unless specifically chosen to.
I am trying to plot some data on directions which vary from 0 to360 deg. The most intuitive way of doing this is around a circle where I can plot each point (I only have 13 points to plot).
cont=c(319,124,182,137,55,302,221,25,8,36,132,179,152)
My data for one plot
I tried following the ggplot2 guides and have not got it to work. I'm not very good at ggplot though...
(my dataframe is called "data")
ggplot(data, aes(x=1), ) + coord_polar(theta = "y") +geom_point(y=cont)
It works adding y to the ggplot mapping
data <- data.frame(cont = cont)
ggplot(data, aes(x=1, y = cont)) + coord_polar(theta = "y") + geom_point()
You can add other ggplot parameters to improve the appearence.
Have you tried polar.plot from plotrix library?
I would like to use ggplot2 to draw a lattice plot of densities produced from different methods, in which the same yaxis scale is used throughout.
I would like to set the upper limit of the y axis to a value below the highest density value for any one method. However ggplot by default removes sections of the geom that are outside of the plotted region.
For example:
# Toy example of problem
xval <- rnorm(10000)
#Base1
plot(density(xval))
#Base2
plot(density(xval), ylim=c(0, 0.3)) # densities > 0.3 not removed from plot
xval <- as.data.frame(xval)
ggplot(xval, aes(x=xval)) + geom_density() #gg1 - looks like Base1
ggplot(xval, aex(x=xval)) + geom_density() + ylim(0, 0.3)
#gg2: does not look like Base2 due to removal of density values > 0.3
These produce the images below:
How can I make the ggplot image not have the missing section?
Using xlim() or ylim() directly will drop all data points that are not within the specified range. This yields the discontinuity of the density plot. Use coord_cartesian() to zoom in without losing the data points.
ggplot(xval, aes(x=xval)) +
geom_density() +
coord_cartesian(ylim = c(0, 0.3))
I am trying to do some density plots in R. I originally used density plot but I changed to the density plot in ggplot2 because I visually prefer ggplot2.
So I did a density plot using the density plot function and did a density plot in ggplot2 (see below) but I found the plots were not identical. It looks like some of the y-values have been lost or dropped in the ggplot2 (right plot). Is there any particular reason for this? How can I make the ggplot identical to the destiny plot (left plot).
Code:
library(ggplot2)
library(grid)
par(mfrow=c(1,2))
# define function to create multi-plot setup (nrow, ncol)
vp.setup <- function(x,y){
grid.newpage()
pushViewport(viewport(layout = grid.layout(x,y)))
}
# define function to easily access layout (row, col)
vp.layout <- function(x,y){
viewport(layout.pos.row=x, layout.pos.col=y)
}
vp.setup(1,2)
dat <- read.table(textConnection("
low high
10611.0 14195.0
10759.0 14437.0
10807.0 14574.0
10714.0 14380.0
10768.0 14448.0
10601.0 14239.0
10579.0 14218.0
10806.0 14510.0
"), header=TRUE, sep="\t")
plot(density(dat$low))
dat.low = data.frame(low2 = c(dat$low), lines = rep(c("low")))
low_plot_gg = (ggplot(dat.low, aes(x = low2, fill = lines)) +
stat_density(aes(y = ..density..)) +
coord_cartesian(xlim = c(10300, 11000))
)
print(low_plot_gg, vp=vp.layout(1,2))
Based on some trial and error, it looks like you want
+ xlim(c(10300,11000))
rather than
+ coord_cartesian(xlim = c(10300, 11000))
coord_cartesian extends the limits of the plots but doesn't change what's drawn inside them at all ...
It's not a problem of lost values. The function plot(density()) proceed to smoothing for extreme value but it's not very accurate for your little dataset. For a bigger dataset the two plots will be the same.