This an example of my data
library(ggplot)
set.seed(1)
df <- data.frame(Groups = factor(rep(1:10, each = 10)))
x <- sample(1:100, 50)
df[x, "Style"] <- "Lame"
df[-x, "Style"] <- "Cool"
df$Style <- factor(df$Style)
p <- ggplot() + stat_summary(data = df, aes(Groups, Style, fill = Style),
geom = "bar", fun.y = length, position=position_dodge())
(Sorry, this is my first question... I don't know how to present code snippets like head(df) or the actual plot in SO. Please run this code to understand my question.)
So the plot adequately presents the count of every 'Style' per 'Groups'. However, the y axis scale shows the levels of the factor variable 'Style'. Although values I am plotting are originally discrete, the count of every 'Cool' and 'Lame' per 'Groups' is continuous.
How do I change the 'y' scale of my barplot from discrete to continuous in ggplot2, in order to correspond to the count values and not the original factor levels???
You can take advantage of ggplot grouping and the histogram to do this for you
p <- ggplot(df, aes(Groups, fill=Style)) + geom_histogram(position=position_dodge())
Related
I have a question about using geom_segment in R ggplot2.
For example, I have three facets and two clusters of points(points which have the same y values) in each facets, how do I draw multiple vertical line segments for each clustering with geom_segment?
Like if my data is
x <- (1:24)
y <- (rep(1,2),2,rep(2,2),1,rep(3,2),4, rep(4,1),5,6, ..rep(8,2),7)
facets <-(1,2,3)
factors <-(1,2,3,4,5,6)
xmean <- ( (1+2+3)/3, (4+5+6)/3, ..., (22+23+24)/3)
Note: (1+2+3)/3 is the mean first cluster in the first facet and (4+5+6)/3 is the mean second cluster in the second facet and (7+8+9)/3 is the first cluster in the second facet.
My Code:
ggplot(,aes(x=as.numeric(x),y=as.numeric(y),color=factors)+geom_point(alpha=0.85,size=1.85)+facet_grid(~facets)
+geom_segment(what should I put here to draw this line in different factors?)
Desired result:
Please see the picture!
Please see the updated picture!
Thank you so much! Have a nice day :).
Maybe this is what you are looking for. Instead of working with vectors put your data in a dataframe. Doing so you could easily make an aggregated dataframe with the mean values per facet and cluster which makes it easy to the segments:
Note: Wasn't sure about the setup of your data. You talk about two clusters per facet but your data has 8. So I slightly changed the example data.
library(ggplot2)
library(dplyr)
df <- data.frame(
x = 1:24,
y = rep(1:6, each = 4),
facets = rep(1:3, each = 8)
)
df_sum <- df %>%
group_by(facets, y) %>%
summarise(x = mean(x))
#> `summarise()` has grouped output by 'facets'. You can override using the `.groups` argument.
ggplot(df, aes(x, y, color = factor(y))) +
geom_point(alpha = 0.85, size = 1.85) +
geom_segment(data = df_sum, aes(x = x, xend = x, y = y - .25, yend = y + .25), color = "black") +
facet_wrap(~facets)
I would like to create a sub grouping in a ggplot2 (geom_point), meaning that I would like to shift discrete x values slightly according to a subgroup (see Figure).
I could achieve that by changing the discrete values to continuous and add a subgroup dependent shift value (see Fig.B), and than manually adjust the x labels. But I thought there is probably a more elegant way which deals with spacing and labeling issues.Below is a minimal example which hopefully describes what I mean.
library(ggplot2)
set.seed(1)
df <- data.frame(
ID = rep(seq(1,8),2),
group = rep(LETTERS[1:4],4),
subgroup = c(rep("a",8),rep("b",8)),
value = runif(16)
)
df$xpos <- as.numeric(df$group)+(as.numeric(df$subgroup)/4)
ggplot(data=df, aes(x=group, y= value, color=subgroup))+
geom_point()+
ggtitle("How it is")
ggplot(data=df, aes(x=xpos, y= value, color=subgroup))+
geom_point() +
ggtitle("How I would like it (without adjusted xAxes Labels)")
We can use position_dodge:
library(ggplot2)
ggplot(data=df, aes(x=group, y= value, color=subgroup))+
geom_point(position=position_dodge(width=0.5))+
ggtitle("How it is")
Data
set.seed(1)
df <- data.frame(
ID = rep(seq(1,8),2),
group = rep(LETTERS[1:4],4),
subgroup = c(rep("a",8),rep("b",8)),
value = runif(16)
)
The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)
Lets say I have a data frame :
df <- data.frame(x = c("A","B","C"), y = c(10,20,30))
and I wish to plot it with ggplot2 such that I get a plot like a histogram ( where instead of plotting count I plot my y column values from the data frame. ( I don't mind if the x column is a factor column or a character column.
I will add that I know how to reorder a bar chart by descending/ascending, but ordering like a histogram (highest values in the middle- around the mean and decreasing to both sides) is still beyond me.
I thought of transmuting the data such that I can fit it in a histogram - like creating a vector with 10 "A"objects, 20 "B" and 30 "C" and then running a histogram on that. But its not practical for what I'm trying to do as it seems like a lazy and highly inefficient way to do it. Also the df data frame is huge as it is- so multiplying by millions etc is not going to be kind on my system.
This seems like a strange thing to want to do, since if the ordering is not already implicit in your x variables, then ordering as a bell curve is at best artificial. However, it's fairly trivial to implement if you really want to...
library(ggplot2)
df <- data.frame(yvals = floor(abs(rnorm(26)) * 100),
xvals = LETTERS,
stringsAsFactors = FALSE)
ggplot(data = df, aes(x = xvals, y = yvals)) + geom_bar(stat = "identity")
ordered <- order(df$yvals)
left_half <- ordered[seq(1, length(ordered), 2)]
right_half <- rev(ordered[seq(2, length(ordered), 2)])
new_order <- c(left_half, right_half)
df2 <- df[new_order,]
df2$xvals <- factor(df2$xvals, levels = df2$xvals)
ggplot(data = df2, aes(x = xvals, y = yvals)) + geom_bar(stat = "identity")
I have 40 subjects, of two groups, over 15 weeks, with some measured variable (Y).
I wish to have a plot where: x = time, y = T, lines are by subjects and colours by groups.
I found it can be done like this:
TIME <- paste("week",5:20)
ID <- 1:40
GROUP <- sample(c("a","b"),length(ID), replace = T)
group.id <- data.frame(GROUP, ID)
a <- expand.grid(TIME, ID)
colnames(a) <-c("TIME", "ID")
group.id.time <- merge(a, group.id)
Y <- rnorm(dim(group.id.time)[1], mean = ifelse(group.id.time$GROUP =="a",1,3) )
DATA <- cbind(group.id.time, Y)
qplot(data = DATA,
x=TIME, y=Y,
group=ID,
geom = c("line"),colour = GROUP)
But now I wish to add to the plot something to show the difference between the two groups (for example, a trend line for each group, with some CI shadelines) - how can it be done?
I remember once seeing the ggplot2 can (easily) do this with geom_smooth, but I am missing something about how to make it work.
Also, I wondered at maybe having the lines be like a boxplot for each group (with a line for the different quantiles and fences and so on). But I imagine answering the first question would help me resolve the second.
Thanks.
p <- ggplot(data=DATA, aes(x=TIME, y=Y, group=ID)) +
geom_line(aes(colour=GROUP)) +
geom_smooth(aes(group=GROUP))
geom_smooth plot http://img143.imageshack.us/img143/7678/geomsmooth.png