R Plot Bar graph transposed dataframe - r

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?

If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()

I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

Related

line graph with multiple variables on y axis stepwise

I need some help. Here is my data which i want to plot. I want to keep $path.ID on y axis and numerics of all other columns added stepwise. this is a subset of very large dataset so i want to pathID labels attached to each line. and also the values of the other columns with each point if possible.
head(table)
Path.ID sc st rc rt
<chr> <dbl> <dbl> <dbl> <dbl>
1 map00230 1 12 5 52
2 map00940 1 20 10 43
3 map01130 NA 15 8 34
4 map00983 NA 14 5 28
5 map00730 NA 5 3 26
6 map00982 NA 16 2 24
somewhat like this
Thank you
Here is the pseudo code.
library(tidyr)
library(dplyr)
library(ggplot2)
# convert your table into a long format - sorry I am more used to this type of data
table_long <- table %>% gather(x_axis, value, sc:rt)
# Plot with ggplot2
ggplot() +
# draw line
geom_line(data=table_long, aes(x=x_axis, y=value, group=Path.ID, color=Path.ID)) +
# draw label at the last x_axis in this case is **rt**
geom_label(data=table_long %>% filter(x_axis=="rt"),
aes(x=x_axis, y=value, label=Path.ID, fill=Path.ID),
color="#FFFFFF")
Note that with this code if a Path.ID doesn't have the rt value then it will not have any label
p<-ggplot() +
# draw line
geom_line(data=table_long, aes(x=x_axis, y=value, group=Path.ID, color=Path.ID)) +
geom_text(data=table_long %>% filter(x_axis=="rt"),
aes(x=x_axis, y=value, label=Path.ID),
color= "#050505", size = 3, check_overlap = TRUE)
p +labs(title= "title",x = "x-lable", y="y-label")
I had to use geom_text as i had large dataset and it gave me somewhat more clear graph
thank you #sinh it it helped a lot.

How to incorporate "significance" of a Tukey'sHSD directly into graphs of ggplot2 in R?

I have the following data(dat)
V W X Y Z
1 8 89 3 900
1 8 100 2 800
0 9 333 4 980
0 9 560 1 999
I wish to perform TukeysHSD pairwise test to the above data set. From the results of the test, I want to incorporate the significant comparisons in the graph (showing the "*" or "**" sign between the groups that are significant).
This is the code attempted:
library(ggplot2)
library(reshape2)
dat1 <- gather(dat)
ggplot(data = dat1, aes(x = key, y = value)) + stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1)
pairwise.t.test(dat1$value, dat1$key, p.adj = "holm")
I do not know if (I can) and how to incorporate the results ("significance") of the test directly into the graph without saving each of the results as an external array and then calling it into ggplot2.

ggplot2 adding custom legend when plotting two lines from subset of columns

I've looked all over stack and other sites to fix my code but can't see what's wrong. I am trying to plot 2 lines on the same graph on ggplot that are portions of 2 different columns. For example, I have a column of length 8 of which the first four rows are M (male) and the last four rows are F (female). I have two columns of data and one column for condition (factor).
ModelMF <- data.frame(ProbGender, ProbCond, ProbMF, Act_pct)
where:
ProbGender ProbCond ProbMF Act_pct
M 0 .75 .71
M 10 .67 .69
M 20 .61 .54
M 30 .81 .77
F 0 .88 .82
F 10 .73 .71
F 20 .67 .71
F 30 .60 .63
I have tried the following but I keep getting errors (see below):
ggplot(data = ModelMF, aes(x = ProbCond)) + geom_line(data =
ModelMF[ModelMF$ProbGender=="M",], aes(y=ProbMF), color = 'col1') +
geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y = ProbMF)) +
geom_line(data = ModelMF[ModelMF$ProbGender=="M",], aes(y=Act_pct), color =
'col2') + geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y =
Act_pct)) + scale_color_manual(values = c('col1' = 'darkblue', 'col2' ='lightblue'))
Preferably I would like to be able to create a custom legend that lets me map the colors as I've attempted to do using scale_color_manual, but I get the following error:
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'col1'
I'm not sure if it is due to the fact that I'm subsetting data within the df or something else I'm just missing? Also if I add the female lines I assume I can simply follow the same procedure?
Thanks in advance.

Modifying y-axis with ggplot2

I'm trying to plot the number of observations for each instance of a word, both of which are stored in a data frame.
I can generate the plot with ggplot2, but the y-axis displays "1+e05", "2+e05",...,etc...instead of numerical values.
How can I modify this code so that the y-axis displays numbers instead?
Here is my code:
> w
p.word p.freq
1 the 294571
2 and 158624
3 you 84152
4 for 77117
5 that 71672
6 with 47987
7 this 42768
8 was 41088
9 have 39835
10 are 36458
11 but 33899
12 not 30370
13 all 27079
14 your 26923
15 just 25507
16 from 24497
17 out 22578
18 like 22501
19 what 22150
20 will 21530
21 they 21435
22 about 21184
23 one 20877
24 its 20109
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity")
Here is the plot that is generated:
"1e+05" etc are numerical values (scientific notation).
If you want the long notation (e.g. "100,000") use library(scales) and the comma formatter:
library(scales)
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity") +
scale_y_continuous(labels=comma)

How to melt R data.frame and plot group by bar plot

I have following R data.frame:
group match unmatch unmatch_active match_active
1 A 10 4 0 0
2 B 116 20 0 3
3 c 160 27 1 4
4 D 79 17 0 3
5 E 309 84 4 14
6 F 643 244 10 23
...
My goal is to plot a group by bar plot (http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/ section-Graphs with more variables) as shown in the link.
I realize that before getting to that I need to get the data in to following format
group variable value
1 A match 10
2 B match 116
3 C match 160
4 D match 79
5 E match 309
6 F match 643
7 A unmatch 4
8 B unmatch 20
...
I used the melt function:
groups.df.melt <- melt(groups.df[,c('group','match','unmatch', 'unmatch_active', 'match_active')],id.vars = 1)
I don't think I am doing the melt correctly because after I execute above groups.df.melt has 1000+ lines which doesn't make sense to me.
I looked at how Draw histograms per row over multiple columns in R and tried to follow the same yet I don't get the graph I want.
In addition I get following error: When I try to do the plotting:
ggplot(groups.df.melt, aes(x='group', y=value)) + geom_bar(aes(fill = variable), position="dodge") + scale_y_log10()
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
Error in pmin(y, 0) : object 'y' not found
Try:
mm <- melt(ddf, id='group')
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'dodge')
or
ggplot(data = mm, aes(x = group, y = value, fill = variable)) +
# `geom_col()` uses `stat_identity()`: it leaves the data as is.
geom_col(position = 'dodge')

Resources