hi i have a data set like this
ALL Critical Error Warning Review
2016 1412 475 4 125
154 45 49 2 58
116 86 12 1 17
I want to plot a stacked bar graph using ggplot2 where a single bar would show 100% of "ALL" and rest "Critical","Error","Warning","Review" should be on top of another according to their contribution in "ALL".
I am try it with no luck!!! Need a hand..Thanks
I'm not quite sure if your description of the desired plot is non-ambiguous.
My interpretation would be the following:
## Copied from user1317221_G - Thanks for that.
babydf <- structure(list(ALL = c(2016L, 154L, 116L), Critical = c(1412L,
45L, 86L), Error = c(475L, 49L, 12L), Warning = c(4L, 2L, 1L),
Review = c(125L, 58L, 17L)), .Names = c("ALL", "Critical",
"Error", "Warning", "Review"), class = "data.frame", row.names = c(NA,
-3L))
# Add IDs
babydf <- cbind(id=1:nrow(babydf), babydf))
library(reshape2)
library(ggplot2)
# reshape the dataframe:
df.reshaped <- melt(babydf, id.vars='id')
ggplot(subset(df.reshaped, variable != 'ALL'), aes(x=id, y=value, fill=variable)) + geom_bar(stat='identity')
If you want to have all bars of equal height, just do
babydf[, 3:6] <- babydf[, 3:6] / babydf$ALL * 100
before melt. The result:
Related
I want to build a network diagram from a dataframe that I have, but I am having troubles.
This is what the dataframe looks like.
Shop
Manager
S1
34
S1
12
S2
11
S2
34
S3
34
S4
50
For example, S1 should be connected to S2 and S3 since they have the same manager and so on. Also, is it possible to set the size of the dot based on the number of managers a shop has?
I really appreciate the help. Thanks!
You can try graph_from_adjacency_matrix + tcrossprod + table
library(igraph)
g <- graph_from_adjacency_matrix(as.dist(tcrossprod(table(df))))
and plot(g) shows the network like below
Another way is bipartite.projection
df %>%
graph_from_data_frame() %>%
set_vertex_attr(name = "type", value = names(V(.)) %in% df$Shop) %>%
bipartite.projection() %>%
pluck(2) %>%
plot()
Data
> dput(df)
structure(list(Shop = c("S1", "S1", "S2", "S2", "S3", "S4"),
Manager = c(34L, 12L, 11L, 34L, 34L, 50L)), class = "data.frame", row.names = c(NA,
-6L))
Following is my dataset:
Result
course1
course2
course3
pass
15
17
18
pass
12
14
19
Fail
9
13
3
Fail
3
2
0
pass
14
11
20
Fail
5
0
7
I want to plot a grouped bar graph. I am able to plot following graphs but I want both the results in same graph.
par(mfrow=c(1,1))
options(scipen=999)
coul <- brewer.pal(3, "Set2")
# Bar graph for passed courses
result_pass <-data %>% filter(Result=='Pass') %>% summarize(c1_tot=sum(course1),
c2_tot = sum(course2), c3_tot = sum(course3) )
col_sum <- colSums(result_pass[,1:3])
barplot(colSums(result_pass[,1:3]), xlab = "Courses", ylab = "Total Marks", col = coul, ylim=range(pretty(c(0, col_sum))), main = "Passed courses ")
# Bar graph for Failed courses
result_fail <-data %>% filter(Result=='Fail') %>% summarize(c1_tot=sum(course1),
c2_tot = sum(course2), c3_tot = sum(course3) )
col_sum <- colSums(result_fail[,1:3])
barplot(colSums(result_fail[,1:3]), xlab = "Courses", ylab = "Total Marks", col = coul, ylim=range(pretty(c(0, col_sum))), main = "Failed courses ")
Any suggestion for which I can merge both the above plots and create grouped bar graph for Pass and Fail courses.
It's probably easier than you think. Just put the data directly in aggregate and use as formula . ~ Result, where . means all other columns. Removing first column [-1] and coerce as.matrix (because barplot eats matrices) yields exactly the format we need for barplot.
This is the basic code:
barplot(as.matrix(aggregate(. ~ Result, data, sum)[-1]), beside=TRUE)
And here with some visual enhancements:
barplot(as.matrix(aggregate(. ~ Result, data, sum)[-1]), beside=TRUE, ylim=c(0, 70),
col=hcl.colors(2, palette='viridis'), legend.text=sort(unique(data$Result)),
names.arg=names(data)[-1], main='Here could be your title',
args.legend=list(x='topleft', cex=.9))
box()
Data:
data <- structure(list(Result = c("pass", "pass", "Fail", "Fail", "pass",
"Fail"), course1 = c(15L, 12L, 9L, 3L, 14L, 5L), course2 = c(17L,
14L, 13L, 2L, 11L, 0L), course3 = c(18L, 19L, 3L, 0L, 20L, 7L
)), class = "data.frame", row.names = c(NA, -6L))
I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you
Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))
I want to rank the variables in my dataset in a descending order of the Number of Plants used. I tried ranking in .csv and then exporting it in R. But even then, the plot was not ranked in the required order. Here is my dataset
df <- structure(list(Lepidoptera.Family = structure(c(3L, 2L, 5L, 1L, 4L, 6L),
.Label = c("Hesperiidae", "Lycaenidae", "Nymphalidae", "Papilionidae", "Pieridae","Riodinidae"), class = "factor"),
LHP.Families = c(55L, 55L, 15L, 14L, 13L, 1L)),
.Names = c("Lepidoptera.Family", "LHP.Families"),
class = "data.frame", row.names = c(NA, -6L))
library(ggplot2)
library(reshape2)
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+
geom_bar(stat="identity")+
coord_flip()+facet_grid(Type~.)
How do i rank them in a descending order? Also, i want to combine 3 plots into one. How can i go about it?
The reason this is happening is that ggplot plots the x variables that are factors in the ordering of the underlying values (recall that factors are stored as numbers underneath the covers). If you want to graph them in an alternate order, you should change the order of the levels before plotting
gg$Lepidoptera.Family<-with(gg,
factor(Lepidoptera.Family,
levels=Lepidoptera.Family[order(LHP.Families)]))
The trick is to reorder the levels of the Lepidoptera.Family factor, which by default is alphabetical:
df = within(df, {
factor(Lepidoptera.Family, levels = reorder(Lepidoptera.Family, LHP.Families))
})
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+ geom_bar(stat="identity")+ coord_flip()+facet_grid(Type~.)
I am building a quantile-quantile plot out of an variable called x from a data frame called df in the working example provided below. I would like to label the points with the name variable of my df dataset.
Is it possible to do this in ggplot2 without resorting to the painful solution (coding the theoretical distribution by hand and then plotting it against the empirical one)?
Edit: it happens that yes, thanks to a user who posted and then deleted his answer. See the comments after Arun's answer below. Thanks to Didzis for his otherwise clever solution with ggbuild.
# MWE
df <- structure(list(name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 9L,
10L, 6L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 18L, 20L, 21L, 22L,
8L, 23L, 11L, 24L), .Label = c("AUS", "AUT", "BEL", "CAN", "CYP",
"DEU", "DNK", "ESP", "FIN", "FRA", "GBR", "GRC", "IRL", "ITA",
"JPN", "MLT", "NLD", "NOR", "NZL", "PRT", "SVK", "SVN", "SWE",
"USA"), class = "factor"), x = c(-0.739390016757746, 0.358177826874146,
1.10474523846099, -0.250589535389937, -0.423112615445571, -0.862144579740376,
0.823039669834058, 0.079521521937704, 1.08173649722493, -2.03962942823921,
1.05571087029737, 0.187147291278723, -0.144770773941437, 0.957990771847331,
-0.0546549555439176, -2.70142550075757, -0.391588386498849, -0.23855544527369,
-0.242781575907386, -0.176765072121165, 0.105155860923456, 2.69031085872414,
-0.158320176671995, -0.564560815972446)), .Names = c("name",
"x"), row.names = c(NA, -24L), class = "data.frame")
library(ggplot2)
qplot(sample = x, data = df) + geom_abline(linetype = "dotted") + theme_bw()
# ... using names instead of points would allow to spot the outliers
I am working on an adaptation of this gist, and will consider sending other questions to CrossValidated if I have questions about the regression diagnostics, which might be of interest to CV users.
You can save your original QQ plot as object (used function ggplot() and stat_qq() instead of qplot())
g<-ggplot(df, aes(sample = x)) + stat_qq()
Then with function ggplot_build() you can extract data used for plotting. They are stored in element data[[1]]. Saved those data as new data frame.
df.new<-ggplot_build(g)$data[[1]]
head(df.new)
x y sample theoretical PANEL group
1 -2.0368341 -2.7014255 -2.7014255 -2.0368341 1 1
2 -1.5341205 -2.0396294 -2.0396294 -1.5341205 1 1
3 -1.2581616 -0.8621446 -0.8621446 -1.2581616 1 1
4 -1.0544725 -0.7393900 -0.7393900 -1.0544725 1 1
5 -0.8871466 -0.5645608 -0.5645608 -0.8871466 1 1
6 -0.7415940 -0.4231126 -0.4231126 -0.7415940 1 1
Now you can add to hew data frame names of observations. Important is to use order() as data in new data frame are ordered.
df.new$name<-df$name[order(df$x)]
Now plot new data frame as usual and instead of geom_point() provide geom_text().
ggplot(df.new,aes(theoretical,sample,label=name))+geom_text()+
geom_abline(linetype = "dotted") + theme_bw()
The points are too close by. I would do something like this:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
This gives:
If you insist on having labels inside the plot, then, you could try something like:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
p <- p + geom_text(aes(x=t-0.05, y=x-0.15, label=df$name, size=1, colour=df$name))
p
You can play around with the x and y coordinates and if you want you can always remove the colour aesthetics.
#Arun has a good solution in the comment above, but this works with R 4.0.3:
ggplot(data = df, aes(sample = x)) + geom_qq() + geom_text_repel(label=df$name[order(df$x)], stat="qq") + stat_qq_line()
Basically the same thing, with addition of stat_qq_line() and [order(df$x)] as part of the label. If you don't include the order function then your labels will be all out of order and very misleading.
Here's hoping this saves someone else some hours of their life.