I`m having trouble constructing an histogram from a matrix in R
The matrix contains 3 treatments(lamda0.001, lambda0.002, lambda0.005 for 4 populations rec1, rec2, rec3, con1). The matrix is:
lambda0.001 lambda0.002 lambda.003
rec1 1.0881688 1.1890554 1.3653264
rec2 1.0119031 1.0687678 1.1751051
rec3 0.9540271 0.9540271 0.9540271
con1 0.8053506 0.8086985 0.8272758
my goal is to plot a histogram with lambda in the Y axis and four groups of three treatments in X axis. Those four groups should be separated by a small break from eache other.
I need help, it doesn`t matter if in ggplot2 ou just regular plot (R basic).
Thanks a lot!
Agree with docendo discimus that maybe a barplot is what you're looking for. Based on what you're asking though I would reshape your data to make it a little easier to work with first and you can still get it done with stat = "identity"
sapply(c("dplyr", "ggplot2"), require, character.only = T)
# convert from matrix to data frame and preserve row names as column
b <- data.frame(population = row.names(b), as.data.frame(b), row.names = NULL)
# gather so in a tidy format for ease of use in ggplot2
b <- gather(as.data.frame(b), lambda, value, -1)
# plot 1 as described in question
ggplot(b, aes(x = population, y = value)) + geom_histogram(aes(fill = lambda), stat = "identity", position = "dodge")
# plot 2 using facets to separate as an alternative
ggplot(b, aes(x = population, y = value)) + geom_histogram(stat = "identity") + facet_grid(. ~ lambda)
Related
The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)
I want to overlay two density plots; one of data prior to transformation and one after. I don't care about the x and y values, only the shape of the curve.
I want to superimpose the 2 charts for a given Predictor on top of each other, even though the x-axis is different. I find it hard to look across the two facets. In reality, as well, there will be a lot more plots, so combining the non-transformed and transformed data into the one would be the best solution.
library(tidyverse)
require(caret)
data(BloodBrain)
bbbTrans <- preProcess(select(bbbDescr, adistd, adistm, dpsa3, inthb), method = "YeoJohnson")
bbbTransData <- predict(bbbTrans, select(bbbDescr, adistd, adistm, dpsa3, inthb))
dat <- bbbTransData %>%
gather(Predictor, Value) %>%
mutate(Transformation = "Yeo-Johnson") %>%
bind_rows(data.frame(gather(select(bbbDescr, adistd, adistm, dpsa3, inthb), Predictor, Value), Transformation = "NA", stringsAsFactors = FALSE))
# For the predictor adistd, I would like the x-axis range to be 0:12.5 for the
# "Yeo-Johnson" transformation and 0:250 for no transformation. In this plot, it
# is hard to see the shape of the transformed variables due to the different x-value range.
dat %>% ggplot(aes(x = Value, color = Transformation)) +
geom_density(aes(y = ..scaled..), position = "dodge") +
facet_wrap(~Predictor, scales = "free")
# i.e., I want to superimpose the 2 charts for a given Predictor on top of each other, even though the x-axis is different
# I find it hard to look across the two facets. In reality, as well, there will be a lot more plots, so combining the non-transformed and transformed data into the one plot using colour would be the best solution.
filter(dat, Transformation != 'NA') %>% ggplot(aes(x = Value, y = ..scaled..)) +
geom_density() +
facet_wrap(~Predictor, scales = "free")
filter(dat, Transformation == 'NA') %>% ggplot(aes(x = Value, y = ..scaled..)) +
geom_density() +
facet_wrap(~Predictor, scales = "free")
Edit: The algorithm I think I need is (and prefer to do using tidyverse):
Group by predictor/transformation
Get density for each
Transform x of density to (x-xmin)/(xmax-xmin) so that between 0 to 1
Plot transformed density$x, density$y
Solution that scales (base::scale) and calculates density (stats::density). density function outputs same number of equally spaced points so we can arrange them from 0 to 1 (as OP wants).
# How many points we want
nPoints <- 1e3
# Final result
res <- list()
# Using simple loop to scale and calculate density
combinations <- expand.grid(unique(dat$Predictor), unique(dat$Transformation))
for(i in 1:nrow(combinations)) {
# Subset data
foo <- subset(dat, Predictor == combinations$Var1[i] & Transformation == combinations$Var2[i])
# Perform density on scaled signal
densRes <- density(x = scale(foo$Value), n = nPoints)
# Position signal from 1 to wanted number of points
res[[i]] <- data.frame(x = 1:nPoints, y = densRes$y,
pred = combinations$Var1[i], trans = combinations$Var2[i])
}
res <- do.call(rbind, res)
ggplot(res, aes(x / nPoints, y, color = trans, linetype = trans)) +
geom_line(alpha = 0.5, size = 1) +
facet_wrap(~ pred, scales = "free")
I am new to R and have been trying for a few days to plot histogram / bar chart to view the trend. I have this categorical variable : countryx and coded it into 1,2,3.
I have tried these 2 scripts below and got error messages as follows :
Output 1: blank chart with x and y axis, no stack/bar trend
qplot(DI$countryx,geom = "histogram",ylab = "count",
xlab = "countryx",binwidth=5,colour=I("blue"),fill=I("wheat"))
Output 2: error message- ggplot2 doesn't know how to deal with data of class integer
ggplot(DI$countryX, aes(x=countryx))
+ geom_bar(aes(y=count), stat = "count",position ="stack",...,
width =5,aes=true)
Appreciate for all advice.
Thank you very much for your help!
Multiple problems with your code. ggplot takes a dataframe, not a vector, but you're supplying a vector. Try this
ggplot(DI, aes(x=countryx, y = count)) + geom_col(width = 5)
As #yeedle mentioned you need a data.frame (maybe use as.data.frame)
How about:
library(ggplot2)
df <- data.frame(countryx = rep(1:3), count = rbinom(3,10,0.3))
p <- ggplot2::ggplot(df, aes(x = countryx, y = count)) + ylab("count")
p + geom_col(aes(x = countryx, fill = factor(countryx)))
So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:
require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)
df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
scale_y_continuous(limits = c(0, 15)) +
theme(legend.position = "none"))
d
However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.
df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value))
d
The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.
What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?
The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:
Scale Transformations
Statistical Computations
Coordinate Transformations
In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.
The solution is to omit the scale_y_continuous instruction and instead use:
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))
This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.
I have a large matrix mdat (1000 rows and 16 columns) contains first column as x variable and other columns as y variables. What I want to do is to make scatter plot in R having 15 figures on the same window. For example:
mdat <- matrix(c(1:50), nrow = 10, ncol=5)
In the above matrix, I have 10 rows and 5 columns. Is it possible that to use the first column as variable on x axes and other columns as variable on y axes, so that I have four different scatterplots on the same window? Keep in mind that I will not prefer par(mfrow=, because in that case I have to run each graph and then produce them on same window. What I need is a package so that I will give it just data and x, y varaibeles, and have graphs on same windows.
Is there some package available that can do this? I cannot find one.
Perhaps the simplest base R way is mfrow (or mfcol)
par(mfrow = c(2, 2)) ## the window will have 2 rows and 2 columns of plots
for (i in 2:ncol(mdat)) plot(mdat[, 1], mdat[, i])
See ?par for everything you might want to know about further adjustments.
Another good option in base R is layout (the help has some nice examples). To be fancy and pretty, you could use the ggplot2 package, but you'll need to reshape your data into a long format.
require(ggplot2)
require(reshape2)
molten <- melt(as.data.frame(mdat), id = "V1")
ggplot(molten, aes(x = V1, y = value)) +
facet_wrap(~ variable, nrow = 2) +
geom_point()
Alternatively with colors instead of facets:
ggplot(molten, aes(x = V1, y = value, color = variable)) +
geom_point()
#user4299 You can re-write shujaa's ggplot command in this form, using qplot which means 'quick plot' which is easier when starting out. Then instead of faceting, use variable to drive the color. So first command produces the same output as shujaa's answer, then the second command gives you all the lines on one plot with different colors and a legend.
qplot(data = molten, x = V1, y = value, facets = . ~ variable, geom = "point")
qplot(data = molten, x = V1, y = value, color = variable, geom = "point")
Maybe
library(lattice)
x = mdat[,1]; y = mdat[,-1]
df = data.frame(X = x, Y = as.vector(y),
Grp = factor(rep(seq_len(ncol(y)), each=length(x))))
xyplot(Y ~ X | Grp, df)