no. of geom_point matches the value - r

I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As

As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)

Related

ggplot2 geom_bar position failure

I am using the ..count.. transformation in geom_bar and get the warning
position_stack requires non-overlapping x intervals when some of my categories have few counts.
This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)
#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20 #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions
#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)
# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()
This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts.
However more velocity classes leads to a warning. For instance, with
FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that
position_stack requires non-overlapping x intervals
and the plot will show data in this category spread out along the x axis.
It seems that 5 is the minimum size for a group to have for this to work correctly.
I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.
Also, any suggestions how to get around this would be appreciated.
Sincerely
This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).
As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar() +
facet_wrap(~ grp)
> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1] 1 2 3 4 6 7 8 9 10
[1] 1 2 3 4 5 6 7 8 9 10
[1] 2 3 4 5 7 9 10
[1] 2 4 7
We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.
The following solutions should all achieve the same result:
1. Explicitly specify the same bar width for all groups in geom_bar():
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar(width = 0.9)
2. Convert dir to a categorical variable before passing it to aes(x = ...):
ggplot(data=df,
aes(x=factor(dir), y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar()
3. Specify that the group parameter should be based on both df$dir & df$grp:
ggplot(data=df,
aes(x=dir,
y=(..count..)/sum(..count..),
group = interaction(dir, grp),
fill = grp)) +
geom_bar()
This doesn't directly solve the issue, because I also don't get what's going on with the overlapping values, but it's a dplyr-powered workaround, and might turn out to be more flexible anyway.
Instead of relying on geom_bar to take the cut factor and give you shares via ..count../sum(..count..), you can easily enough just calculate those shares yourself up front, and then plot your bars. I personally like having this type of control over my data and exactly what I'm plotting.
First, I put dir and FF into a data frame/tbl_df, and cut FF. Then count lets me group the data by dir and grp and count up the number of observations for each combination of those two variables, then calculate the share of each n over the sum of n. I'm using geom_col, which is like geom_bar but when you have a y value in your aes.
library(tidyverse)
set.seed(12345)
FF <- rweibull(100,1.7,1) * 20 #mock speeds
FF[FF > 60] <- 59
dir <- sample.int(10, size = 100, replace = TRUE) # mock directions
shares <- tibble(dir = dir, FF = FF) %>%
mutate(grp = cut(FF, breaks = seq(0, 60, by = 15), ordered_result = T, right = F, drop = F)) %>%
count(dir, grp) %>%
mutate(share = n / sum(n))
shares
#> # A tibble: 29 x 4
#> dir grp n share
#> <int> <ord> <int> <dbl>
#> 1 1 [0,15) 3 0.03
#> 2 1 [15,30) 2 0.02
#> 3 2 [0,15) 4 0.04
#> 4 2 [15,30) 3 0.03
#> 5 2 [30,45) 1 0.01
#> 6 2 [45,60) 1 0.01
#> 7 3 [0,15) 6 0.06
#> 8 3 [15,30) 1 0.01
#> 9 3 [30,45) 2 0.02
#> 10 4 [0,15) 6 0.06
#> # ... with 19 more rows
ggplot(shares, aes(x = dir, y = share, fill = grp)) +
geom_col()

R - reshaped data from wide to long format, now want to use created timevar as factor

I am working with longitudinal data and assess the utilization of a policy over 13 months. In oder to get some barplots with the different months on my x-axis, I converted my data from wide Format to Long Format.
So now, my dataset looks like this
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
I thought, after reshaping I could easily use my newly created "month" variable as a factor and plot some graphs. However, it does not work out and tells me it's a list or an atomic vector. Transforming it into a factor did not work out - I would desperately Need it as a factor.
Does anybody know how to turn it into a factor?
Thank you very much for your help!
EDIT.
The OP's graph code was posted in a comment. Here it is.
library(ggplot2)
ggplot(data, aes(x = hours, y = month)) + geom_density() + labs(title = 'Distribution of hours')
# Loading ggplot2
library(ggplot2)
# Placing example in dataframe
data <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
# Converting month to factor
data$month <- factor(data$month, levels = 1:12, labels = 1:12)
# Plotting grouping by id
ggplot(data, aes(x = month, y = hours, group = id, color = factor(id))) + geom_line()
# Plotting hour density by month
ggplot(data, aes(hours, color = month)) + geom_density()
The problem seems to be in the aes. geom_density only needs a x value, if you think about it a little, y doesn't make sense. You want the density of the x values, so on the vertical axis the values will be the values of that density, not some other values present in the dataset.
First, read in the data.
Indirekte_long <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
Now graph it.
library(ggplot2)
g <- ggplot(Indirekte_long, aes(hours))
g + geom_density() + labs(title = 'Distribution of hours')

changing colors of geom_col ggplot2 to show categorical variable at 3 sites

Long time fan of this site, first time user though.
Been searching for a similar/working result for this question.
I am trying to show the PROPORTION that each level of a 2 level factor appear at three locations. All in a side by side bar chart in ggplot.
Here is the code I've been using to (try) to create the chart. The result has been two charts: one using geom_bar and geom_col, respectively. What I'd like is essentially a combination of the two. The first, but with the colors and Y axis of the second.
Thank you!
ggplot(df,aes(x = Stream,fill = death)) +
geom_bar(position = "dodge")+
scale_fill_manual(values = c(rep(c("gray45", "gray75"))))+
labs(fill="Time of Death")
death_stream <-df %>%
group_by(Stream,Tree_Death)%>%
summarise (n = n()) %>%
mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
death_stream %>%
ggplot(aes(x = Stream,y = rel.freq)) +
geom_col(position = "dodge",fill = "grey50", colour = "black")+
labs(fill="Time of Death")
Thanks Axeman, I figured it out.
the "class" of "rel.freq." was character. I tried specifying as numeric, but instead of
<int>
it produced
<dbl>
turns out all i had to do was revert the tibble BACK to data.frame and specify as numeric. Another way is to export as excel file and change the column "rel.freq" to NUMBERS in Excel.
death_stream
# A tibble: 6 x 4
Stream Tree_Death n percent
<int> <int> <int> <int>
1 1 0 25 33
2 1 1 50 67
3 2 0 17 30
4 2 1 40 70
5 3 0 120 70
6 3 1 51 30

barplots in R comparing data from two columns

I have the following:
> ArkHouse2014 <- read.csv(file="C:/Rwork/ar14.csv", header=TRUE, sep=",")
> ArkHouse2014
DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349
What I would like to do is make a barplot (or series of barplots) to compare the totals in the second and third columns on the y-axis while the x-axis would display the information in the first column.
It seems like this should be very easy to do, but most of the information on making barplots that I can find has you make a table from the data and then barplot that, e.g.,
> table(ArkHouse2014$GOP)
2,936 3,258 3,508 3,573 3,581 3,588 3,638 3,830 3,899 3,951 4,133 4,166 4,319 4,330 4,345 4,391 4,396 4,588
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4,969 5,130 5,177 5,343 5,425 5,466 5,710 5,991 6,070 6,100 6,234 6,490 6,550 6,980 7,847 8,846
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I don't want the counts of how many have each total, I'd like to just represent the quantities visually. I feel pretty stupid not being able to figure this out, so thanks in advance for any advice you have to offer me.
Here's an option using libraries reshape2 and ggplot2:
I first read your data (with dec = ","):
df <- read.table(header=TRUE, text="DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349", dec = ",")
Then reshape it to long format:
library(reshape2)
df_long <- melt(df, id.var = "DISTRICT")
Then create a barplot using ggplot:
library(ggplot2)
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
or if you want the bars stacked:
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity")

How to melt R data.frame and plot group by bar plot

I have following R data.frame:
group match unmatch unmatch_active match_active
1 A 10 4 0 0
2 B 116 20 0 3
3 c 160 27 1 4
4 D 79 17 0 3
5 E 309 84 4 14
6 F 643 244 10 23
...
My goal is to plot a group by bar plot (http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/ section-Graphs with more variables) as shown in the link.
I realize that before getting to that I need to get the data in to following format
group variable value
1 A match 10
2 B match 116
3 C match 160
4 D match 79
5 E match 309
6 F match 643
7 A unmatch 4
8 B unmatch 20
...
I used the melt function:
groups.df.melt <- melt(groups.df[,c('group','match','unmatch', 'unmatch_active', 'match_active')],id.vars = 1)
I don't think I am doing the melt correctly because after I execute above groups.df.melt has 1000+ lines which doesn't make sense to me.
I looked at how Draw histograms per row over multiple columns in R and tried to follow the same yet I don't get the graph I want.
In addition I get following error: When I try to do the plotting:
ggplot(groups.df.melt, aes(x='group', y=value)) + geom_bar(aes(fill = variable), position="dodge") + scale_y_log10()
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
Error in pmin(y, 0) : object 'y' not found
Try:
mm <- melt(ddf, id='group')
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'dodge')
or
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
# `geom_col()` uses `stat_identity()`: it leaves the data as is.
geom_col(position = 'dodge')

Resources