I want to display this matrix with ggplot in order to have lines :
Example : in X the portion from 1 to 12, and in Y ther is 5 lines (categories) with different colors, and their corresponding values.
Example first point x=1 and Y = 12.25 in red
Second point x=2 and Y=0.9423 in green
DF <- read.table(text = "
Portion 1 2 3 4 5
1 1 12.250000 0.9423077 33.92308 0.0000000 1.8846154
2 2 6.236364 1.7818182 38.30909 0.8909091 1.7818182
3 3 9.333333 1.8666667 28.00000 0.0000000 2.8000000
4 4 9.454545 2.8363636 34.03636 4.7272727 0.9454545
5 5 27.818182 0.0000000 19.47273 2.7818182 0.9272727
6 6 19.771930 2.5789474 19.77193 0.8596491 6.0175439
7 7 22.350877 1.7192982 22.35088 0.8596491 1.7192982
8 8 17.769231 4.0384615 15.34615 0.8076923 4.0384615
9 9 16.925373 8.8656716 23.37313 2.4179104 2.4179104
10 10 10.036364 8.3636364 25.09091 0.8363636 1.6727273
11 11 8.937500 8.9375000 8.12500 0.0000000 0.0000000
12 12 12.157895 5.2105263 14.76316 0.8684211 0.0000000", header = TRUE)
newResults <- as.data.frame(DF)
library(reshape2)
R = data.frame(Portion = c('1','2','3','4','5','6','7','8','9','10','11','12'), newResults[,1], newResults[,2], newResults[,3], newResults[,4], newResults[,5])
meltR = melt(R, id = "Portion")
ggplot(meltR, aes(reorder(Portion, -value), y = value, group = variable, colour = variable)) + geom_line().
Why is my X value are not ordered ? and is it the healthiest way to do this ?
Thanks a lot.
Try:
meltR = melt(DF, id = "Portion")
ggplot(meltR, aes(x=Portion, y = value, group = variable, colour = variable)) + geom_line()
In this case there is no need to reorder anything in the aesthetic for ggplot. This will give you the following graph:
You may want to change the names of the variables, either by renaming them in the first step, or by providing custom labels to ggplot.
Related
I have a data like this
df<- structure(list(Number = 1:23, Value1 = c(0.054830335, 1.19531842,
3.27820329, 1.03530176, 5.77430976, 3.72944, -0.683513395, 0.029550239,
2.487922644, 0.533448117, 0.098825565, -1.089022938, 2.301631235,
-0.095666867, -1.359480317, -1.359480317, 1.089441628, 3.307589929,
4.67838434, 3.562761178, 2.630726653, 1.795107015, 2.616255192
), Value2 = c(-0.296874921, 1.491747294, 2.951219257, 1.258677675,
-8.68096591, 3.361029751, -1.824459195, -1.445827538, 1.889631269,
-15.47774216, 3.085461276, -1.078286963, 0.948056999, -2.109354753,
-1.36703068, -1.36703068, 1.074642842, 2.945589842, 3.757911793,
2.765225717, 2.44452491, 1.784451022, 1.158493893)), class = "data.frame", row.names = c(NA,
-23L))
I am trying to make a dot plot (one color for the Value1 vrsus number) and one with Value2 versus Number. Then show the first 5 values in bigger size and the bottom 5 in bigger size
I tried to plot it like this
df$Number <- factor(df$Number, levels = paste0("D", 1:23), ordered = TRUE)
ggplot(df, aes(x=Value1, y=Value2, color= Number)) +
geom_text()+
theme_classic()
I can plot one of them like this
ggplot(data = df, aes(x = Number, y = Value1))+
geom_point()
when it comes to have the second one on the same plot, kinda fuzzy.
I can put them together in this way
# wide to long format
plotDf <- gather(df, Group, Myvalue, -1)
# plot
ggplot(plotDf, aes(Number, Myvalue, col = Group)) +
geom_point()
I still don't know how to show the first 5 values in bigger size and last 5 values in bigger size
The first 5 and the last 5 I mean these ones
df
Number Value1 Value2
1 1 0.05483034 -0.2968749
2 2 1.19531842 1.4917473
3 3 3.27820329 2.9512193
4 4 1.03530176 1.2586777
5 5 5.77430976 -8.6809659
6 6 3.72944000 3.3610298
7 7 -0.68351339 -1.8244592
8 8 0.02955024 -1.4458275
9 9 2.48792264 1.8896313
10 10 0.53344812 -15.4777422
11 11 0.09882557 3.0854613
12 12 -1.08902294 -1.0782870
13 13 2.30163123 0.9480570
14 14 -0.09566687 -2.1093548
15 15 -1.35948032 -1.3670307
16 16 -1.35948032 -1.3670307
17 17 1.08944163 1.0746428
18 18 3.30758993 2.9455898
19 19 4.67838434 3.7579118
20 20 3.56276118 2.7652257
21 21 2.63072665 2.4445249
22 22 1.79510701 1.7844510
23 23 2.61625519 1.1584939
These are the first 5
1 1 0.05483034 -0.2968749
2 2 1.19531842 1.4917473
3 3 3.27820329 2.9512193
4 4 1.03530176 1.2586777
5 5 5.77430976 -8.6809659
and these are the last 5
19 19 4.67838434 3.7579118
20 20 3.56276118 2.7652257
21 21 2.63072665 2.4445249
22 22 1.79510701 1.7844510
23 23 2.61625519 1.1584939
Using the original data (without factor):
ggplot(df, aes(Number, Value1, size = (Number <= 5 | Number > 18))) +
geom_point() +
geom_point(aes(y=Value2)) +
scale_size_manual(name = NULL, values = c("TRUE" = 2, "FALSE" = 0.5)) +
scale_x_continuous(breaks = function(z) do.call(seq, as.list(round(z,0))))
Because using a logical condition to determine size=, the manual values assigned to it need to correspond to character versions of the various values observed, which are of course TRUE and FALSE logicals into "TRUE" and "FALSE". My choice of 2 and 0.5 is arbitrary.
Feel free to name the legend better with name="some name" if desired. If you want no legend (which makes sense), you can use
... +
scale_size_manual(guide = "none", values = c("TRUE" = 2, "FALSE" = 0.5))
instead.
Another alternative, in case you want to make distinct the dots by which Value# they are, is to melt the data into a long format before plotting.
ggplot(reshape2::melt(df, "Number"),
aes(Number, value, color = variable,
size = (Number <= 5 | Number >= 18))) +
geom_point() +
scale_size_manual(guide = "none", values = c("TRUE" = 2, "FALSE" = 0.5))
One can use tidyr::pivot_longer or data.table::melt with similar results, see Reshaping data.frame from wide to long format.
I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)
I am using the ..count.. transformation in geom_bar and get the warning
position_stack requires non-overlapping x intervals when some of my categories have few counts.
This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)
#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20 #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions
#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)
# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()
This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts.
However more velocity classes leads to a warning. For instance, with
FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that
position_stack requires non-overlapping x intervals
and the plot will show data in this category spread out along the x axis.
It seems that 5 is the minimum size for a group to have for this to work correctly.
I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.
Also, any suggestions how to get around this would be appreciated.
Sincerely
This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).
As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar() +
facet_wrap(~ grp)
> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1] 1 2 3 4 6 7 8 9 10
[1] 1 2 3 4 5 6 7 8 9 10
[1] 2 3 4 5 7 9 10
[1] 2 4 7
We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.
The following solutions should all achieve the same result:
1. Explicitly specify the same bar width for all groups in geom_bar():
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar(width = 0.9)
2. Convert dir to a categorical variable before passing it to aes(x = ...):
ggplot(data=df,
aes(x=factor(dir), y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar()
3. Specify that the group parameter should be based on both df$dir & df$grp:
ggplot(data=df,
aes(x=dir,
y=(..count..)/sum(..count..),
group = interaction(dir, grp),
fill = grp)) +
geom_bar()
This doesn't directly solve the issue, because I also don't get what's going on with the overlapping values, but it's a dplyr-powered workaround, and might turn out to be more flexible anyway.
Instead of relying on geom_bar to take the cut factor and give you shares via ..count../sum(..count..), you can easily enough just calculate those shares yourself up front, and then plot your bars. I personally like having this type of control over my data and exactly what I'm plotting.
First, I put dir and FF into a data frame/tbl_df, and cut FF. Then count lets me group the data by dir and grp and count up the number of observations for each combination of those two variables, then calculate the share of each n over the sum of n. I'm using geom_col, which is like geom_bar but when you have a y value in your aes.
library(tidyverse)
set.seed(12345)
FF <- rweibull(100,1.7,1) * 20 #mock speeds
FF[FF > 60] <- 59
dir <- sample.int(10, size = 100, replace = TRUE) # mock directions
shares <- tibble(dir = dir, FF = FF) %>%
mutate(grp = cut(FF, breaks = seq(0, 60, by = 15), ordered_result = T, right = F, drop = F)) %>%
count(dir, grp) %>%
mutate(share = n / sum(n))
shares
#> # A tibble: 29 x 4
#> dir grp n share
#> <int> <ord> <int> <dbl>
#> 1 1 [0,15) 3 0.03
#> 2 1 [15,30) 2 0.02
#> 3 2 [0,15) 4 0.04
#> 4 2 [15,30) 3 0.03
#> 5 2 [30,45) 1 0.01
#> 6 2 [45,60) 1 0.01
#> 7 3 [0,15) 6 0.06
#> 8 3 [15,30) 1 0.01
#> 9 3 [30,45) 2 0.02
#> 10 4 [0,15) 6 0.06
#> # ... with 19 more rows
ggplot(shares, aes(x = dir, y = share, fill = grp)) +
geom_col()
Here is an excerpt of the dataset I am working on.
Name Value ID Total
A 10 1 3
A 11 2 3
A 10 3 3
B 10 1 4
B 11 2 4
B 11 3 4
B 11 4 4
What I want to do is plot Name on the x-axis ID on the y-axis for all Values of 11; on top of which I want to overlay Total so that when the graph is interpreted, it is possible to see the count of items per a Name group. This might be achieved using length of a group in the Name variable or using Total. Here is what I did and a sample of the output desired.
mydf <- read.csv("./test1.csv", header = T)
x <- ggplot(mydf, aes(Name, ID))+ geom_point(data = subset(mydf, Value==11), size=3, colour="tomato3")+ scale_y_continuous(name="Class ID", limits=c(1,4),breaks=seq(1,4, by=1))
y <- x+ xlab("Class")+theme_bw()
z <- y+scale_x_discrete(limits = c("A","B", "C"))
The three orange asterisks at (A,3) and (B,4) are manual text annotation that I want to replace with either a short line or a circle to indicate the total number of items.
Thank you for your help.
I have following R data.frame:
group match unmatch unmatch_active match_active
1 A 10 4 0 0
2 B 116 20 0 3
3 c 160 27 1 4
4 D 79 17 0 3
5 E 309 84 4 14
6 F 643 244 10 23
...
My goal is to plot a group by bar plot (http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/ section-Graphs with more variables) as shown in the link.
I realize that before getting to that I need to get the data in to following format
group variable value
1 A match 10
2 B match 116
3 C match 160
4 D match 79
5 E match 309
6 F match 643
7 A unmatch 4
8 B unmatch 20
...
I used the melt function:
groups.df.melt <- melt(groups.df[,c('group','match','unmatch', 'unmatch_active', 'match_active')],id.vars = 1)
I don't think I am doing the melt correctly because after I execute above groups.df.melt has 1000+ lines which doesn't make sense to me.
I looked at how Draw histograms per row over multiple columns in R and tried to follow the same yet I don't get the graph I want.
In addition I get following error: When I try to do the plotting:
ggplot(groups.df.melt, aes(x='group', y=value)) + geom_bar(aes(fill = variable), position="dodge") + scale_y_log10()
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
Error in pmin(y, 0) : object 'y' not found
Try:
mm <- melt(ddf, id='group')
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'dodge')
or
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
# `geom_col()` uses `stat_identity()`: it leaves the data as is.
geom_col(position = 'dodge')