I am using dotplot to do an analysis with two y variables per x variable. I'd like to arrange the chart so that it descends by one of the y variables. I used the reorder() function in the aes() and it reorders it slightly, but not entirely. Chart 1 is what it looks like before, and chart 2 is what it looks like after I use reorder().
Chart 1:
Chart 2:
Here's the code:
answers %>%
ggplot(aes(x = reorder(locale, -percent) , y = percent, fill = box)) +
geom_dotplot(binaxis='y',
stackdir='center',
dotsize = 1,
binwidth = 0.01) +
geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi), width = .5, position = position_dodge(0))
And this is what the "answers" df looks like. The two variables being plotted per locale are in the "box" column - there's a top_box and bottom_box row for each locale:
As pointed out in the comments, you do not provide and data, but I think I have an idea on where you're going wrong.
Here is some example data. I'm going to use a modified mtcars for the example where we will look at the min and max weight of the cars by make.
library(tidyverse)
df <- mtcars %>% rownames_to_column() %>%
select(car = rowname, wt) %>%
mutate(car = gsub("\\s.*?$", "", car)) %>%
group_by(car) %>%
mutate(n = n()) %>%
filter(n > 1) %>%
arrange(car,wt) %>%
filter(row_number() == max(row_number()) | row_number() == min(row_number())) %>%
select(-n) %>%
ungroup() %>%
mutate(stat = rep(c("min", "max"), nrow(.)/2)) %>%
spread(stat, wt)
print(df)
# car max min
# Fiat 2.2 1.94
# Hornet 3.44 3.22
# Mazda 2.88 2.62
# Merc 4.07 3.15
# Toyota 2.46 1.84
Here is what the plot for that would look like:
df %>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
Now lets talk about what you're trying to do. You say that you would like to order by descending on one of your variables.
df %>%
arrange(-max)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
or
df %>%
arrange(-min)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
I think the key here is that you want to arrange the data and then set the factor levels to get the desired output. If your data is not a factor, then ggplot will use alphabetical order. You may need to spread your data in order to use the exact method outlined above.
Update
You could do this without spreading your data, by arranging with two variables.
Here we will modify the data above to long format
df2 <- df %>% gather(measure, value, -car)
Which plots like this
df2 %>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
and then we can arrange without spreading
df2 %>%
arrange(-value, measure) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
or for descending by min
df2 %>%
arrange(desc(measure), -value) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
Related
I would like to make plots with every 3 colors in one graph. for my sample data, I will need two graphs. Also, is it possible to just plot top 3 colors group with most obs? What should I do. Currently I have 6 in one graph. This is just a sample data, my real data has about 50 levels, and my codes won't be able to create sth that is readable. Too crawdad.
The codes are:
ID<- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18")
Group<-c("A","B","C","D","D","D","A","B","D","C","B","D","A","A","C","B","B","B")
Color<-c("Green","Blue","Red","Red","Black","Yellow","Green","Green","Yellow","Purple","Red","Yellow","Yellow","Yellow","Green","Red","Red","Green")
Realy_Love<-c("Y","N","Y","Y","N","N","Y","Y","Y","N","N","Y","N","Y","N","Y","N","Y")
Sample.data <- data.frame(ID, Group, Color, Realy_Love)
Sample<-Sample.data %>%
count(Group, Color, sort = TRUE)
Sample<-Sample.data %>%
count(Group, Color, Realy_Love, sort = TRUE)
library(dplyr)
library(ggplot2)
Sample.data %>%
count(Group, Color, sort = TRUE) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Color)
Thanks.
For your first question, make two graphs with 3 colors each, you need to create a variable that groups the colors that will go into each group. We will use the case_when() function for this
Sample.data %>%
count(Group, Color, sort = TRUE) %>%
mutate(Facet = case_when(Color %in% c("Black", "Blue", "Green") ~ "Group 1",
Color %in% c("Purple", "Red", "Yellow") ~ "Group 2")) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Facet)
For your second request, about plotting only the 3 colors with most observations, we can use the fct_lump() function from the forcats package:
Sample.data %>%
mutate(Color = fct_lump(f = Color,
n = 3,
other_level = "Other colors")) %>%
filter(Color != "Other colors") %>%
count(Group, Color, sort = TRUE) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Color)
I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)
I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()
How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()
I have a dodged bar chart generated from the mtcars dataset showing a histogram of gear vs cyl. I'd like to mark a particular make of car mark on the chart - the chart attached will illustrate this better. The code is doing what it should (points are placed in line with the x axis label cyl) but I'd like the points to align with the correct bar showing the number of gears instead. Any ideas please?
require(graphics)
carsraw <- mtcars
cars <- mtcars %>%
select(cyl,gear) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
tally() %>%
rename(count=n)
Hornet <- mtcars %>%
add_rownames("model") %>%
filter(model %in% c("Hornet 4 Drive","Hornet Sportabout")) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
mutate(count=7) %>%
select(model,cyl,gear,count)
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5)+
aes(x=cyl,y=count,fill=gear))
You could try using position_nudge in this particular case, although it is a bit of a hack:
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5, position = position_nudge(-0.25))+
aes(x=cyl,y=count,fill=gear))
Edit
Given the comments by the OP, here is an alternative that changes the preparation of the data a little bit. What I am doing here is including the existance of the model in the group as a Hornet column in the summarise function (and adding NA for (cyl, gear) pairs that don't include the specific model). Additionally, you have to use group = gear and position = position_dodge(1):
library(tidyverse)
cars <- mtcars %>%
rownames_to_column("model") %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
summarise(
count = n(),
Hornet = ifelse(
any(model %in% c("Hornet 4 Drive","Hornet Sportabout")),
7,
NA
)
) %>% ungroup()
cars %>% ggplot() +
geom_col(
aes(
x = cyl,
y = count,
fill = gear
),
position = "dodge"
) +
geom_point(
aes(
x = cyl,
y = Hornet,
group = gear
),
na.rm = TRUE,
position = position_dodge(1),
size = 5
)