How to create a stacked bar chart from summarized data in ggplot2 - r

I'm trying to create a stacked bar graph using ggplot 2. My data in its wide form, looks like this. The numbers in each cell are the frequency of responses.
activity yes no dontknow
Social events 27 3 3
Academic skills workshops 23 5 8
Summer research 22 7 7
Research fellowship 20 6 9
Travel grants 18 8 7
Resume preparation 17 4 12
RAs 14 11 8
Faculty preparation 13 8 11
Job interview skills 11 9 12
Preparation of manuscripts 10 8 14
Courses in other campuses 5 11 15
Teaching fellowships 4 14 16
TAs 3 15 15
Access to labs in other campuses 3 11 18
Interdisciplinary research 2 11 18
Interdepartamental projects 1 12 19
I melted this table using reshape2 and
melted.data(wide.data,id.vars=c("activity"),measure.vars=c("yes","no","dontknow"),variable.name="haveused",value.name="responses")
That's as far as I can get. I want to create a stacked bar chart with activities on the x axis, frequency of responses in the y axis, and each bar showing the distribution of the yes, nos and dontknows
I've tried
ggplot(melted.data,aes(x=activity,y=responses))+geom_bar(aes(fill=haveused))
but I'm afraid that's not the right solution
Any help is much appreciated.

You haven't said what it is that's not right about your solution. But some issues that could be construed as problems, and one possible solution for each, are:
The x axis tick mark labels run into each other. SOLUTION - rotate the tick mark labels;
The order in which the labels (and their corresponding bars) appear are not the same as the order in the original dataframe. SOLUTION - reorder the levels of the factor 'activity';
To position text inside the bars set the vjust parameter in position_stack to 0.5
The following might be a start.
# Load required packages
library(ggplot2)
library(reshape2)
# Read in data
df = read.table(text = "
activity yes no dontknow
Social.events 27 3 3
Academic.skills.workshops 23 5 8
Summer.research 22 7 7
Research.fellowship 20 6 9
Travel.grants 18 8 7
Resume.preparation 17 4 12
RAs 14 11 8
Faculty.preparation 13 8 11
Job.interview.skills 11 9 12
Preparation.of.manuscripts 10 8 14
Courses.in.other.campuses 5 11 15
Teaching.fellowships 4 14 16
TAs 3 15 15
Access.to.labs.in.other.campuses 3 11 18
Interdisciplinay.research 2 11 18
Interdepartamental.projects 1 12 19", header = TRUE, sep = "")
# Melt the data frame
dfm = melt(df, id.vars=c("activity"), measure.vars=c("yes","no","dontknow"),
variable.name="haveused", value.name="responses")
# Reorder the levels of activity
dfm$activity = factor(dfm$activity, levels = df$activity)
# Draw the plot
ggplot(dfm, aes(x = activity, y = responses, group = haveused)) +
geom_col(aes(fill=haveused)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = responses), position = position_stack(vjust = .5), size = 3) # labels inside the bar segments

Related

Scatterplot - adding equation and r square value

I am newbie at R. Now I want to plot data (two variables) and showing regression line including the boxplot. I am able to show those data except the r square value and equation chart.
Below is my script in showing the graph
library (car)
scatterplot(FIRST_S2A_NDVI, MEAN_DRONE_NDVI,
main = "NDVI Value from Sentinel and Drone",
xlab = "NDVI Value from Sentinel",
ylab = "NDVI Value from Drone",
pch = 15, col = "black",
regLine = list(col="green"), smooth = FALSE)
The figure is like this.
Now, the final touch is to add the equation and r square value on my figure. What script do I need to write. I tried this script from Add regression line equation and R^2 on graph but still no idea how to show them.
Thanks for read and hopefully helping me in this.
p.s.
Content of my data
OBJECTID SAMPLE_GRID FIRST_S2A_NDVI MEAN_DRONE_NDVI
1 1 1 0.6411405 0.8676092
2 2 2 0.4335293 0.5697814
3 3 3 0.7350439 0.7321858
4 4 4 0.7268013 0.8271566
5 5 5 0.3638939 0.5682631
6 6 6 0.1953890 0.3168246
7 7 7 0.4841993 0.7380627
8 8 8 0.4137447 0.3239288
9 9 9 0.8219178 0.8676065
10 10 10 0.2647872 0.2296441
11 11 11 0.8126657 0.8519964
12 12 12 0.2648504 0.2465738
13 13 13 0.5992035 0.8016030
14 14 14 0.2420299 0.3933670
15 15 15 0.5059137 0.7593807
16 16 16 0.7713419 0.8026068
17 17 17 0.3762540 0.5941540
18 18 18 0.5876435 0.7763927
19 19 19 0.2491609 0.5095306
20 20 20 0.3213648 0.4456958
21 21 21 0.2101466 0.1960858
22 22 22 0.3749034 0.4956361
23 23 23 0.5712630 0.7350484
24 24 24 0.8444895 0.8577550
25 25 25 0.3331450 0.4390229
26 26 26 0.1851611 0.4573663
27 27 27 0.4914998 0.2750837
28 28 28 0.7121390 0.7780228
For adding the equation and the R squared value to your current plot. You can simply create a model with the y and x variables and format a equation and paste in over the plot using mtext function.
m <- lm(MEAN_DRONE_NDVI~FIRST_S2A_NDVI)
eq <- paste0("y = ",round(coef(m)[2],3),"x ",
ifelse(coef(m)[1]<0,round(coef(m)[1],3),
paste("+",round(coef(m)[1],3))))
mtext(eq, 3,-1)
mtext(paste0("R^2 = ",round(as.numeric(summary(m)[8]),3)), 3, -3)
You can change the variables in your model and also change the position of the text with the 2nd and 3rd arguments in the mtext function

In R, how do I create a bar graph using a categorical x and the average of a numeric y with ggplot2?

I am using R version 3.5.1.
I started with this problem, and used the code suggested by the top comment on my own dataset, where time_len is a categorical variable telling how long it takes to play game i, with values of short, medium, and long; num_fans is a numeric variable telling how many fans game i has:
ggplot(cdata) +
aes(x = time_len, y = num_fans) +
geom_bar(stat = "identity")
Here is the plot:
I created the bar chart using the code above, but the problem is that the num_fans variable looks like it is a total. I want it to show the average for each category.
EDIT: here is a sample of my data.
game_id min_players single_player max_players family_game avg_time time_len year life avg_rating
1 161936 2 multi 4 family 60 short 2015 4 8.65105
2 187645 2 multi 4 family 240 long 2016 3 8.45420
3 12333 2 multi 2 small 180 long 2005 14 8.32843
4 193738 2 multi 4 family 150 medium 2016 3 8.28646
5 162886 1 single 4 family 120 medium 2017 2 8.39401
6 84876 2 multi 4 family 90 medium 2011 8 8.12803
geek_rating rating age owned num_fans
1 8.49375 8.572400 13 47498 2168
2 8.16391 8.309055 14 23989 1594
3 8.18051 8.254470 13 45955 3639
4 8.07144 8.178950 12 21513 835
5 7.96162 8.177815 13 13743 1002
6 8.01120 8.069615 12 49353 1866
Again, I am only asking about time_len. Is there a way to show the average of num_fans?
Try replacing the
geom_bar(stat = "identity")
line with
stat_summary(geom = "bar", fun.y = "mean")?
(Note: this is the answer contributed by Z.Lin, who told me that I could post it as an answer.)

How to order both positive and negative values in ggplot

How to create the ordered bar plot in ggplot2 with both positive and negative values. Here is the data:
down -11
down -10
down -9
down -6
up 6
up 6
up 6
up 6
up 7
up 7
up 8
up 8
up 8
up 8
up 8
up 8
up 8
up 10
up 10
up 11
up 11
up 12
up 14
up 14
up 21
up 21
up 24
I have tried this code:
ggplot(GO, aes(x = d1, y = order(d2), fill = factor(d1))) +
geom_bar(stat = "identity"‌​, position = "identity", width = 0.6)
This is not working.
I would like to order the plot. Can anybody please suggest some code.
Please check out my answer for a similar question. You should set your vector up in the order you want and then use +scale_y_discrete(limits = yourOrderedData) and it should plot in your order.

ggplot2 is plotting a line strangely

i am trying to plot the time series x_t = A + (-1)^t B
To do this i am using the following code. The problem is, that the ggplot is wrong.
require (ggplot2)
set.seed(42)
N<-2
A<-sample(1:20,N)
B<-rnorm(N)
X<-c(A+B,A-B)
dat<-sapply(1:N,function(n) X[rep(c(n,N+n),20)],simplify=FALSE)
dat<-data.frame(t=rep(1:20,N),w=rep(A,each=20),val=do.call(c,dat))
ggplot(data=dat,aes(x=t, y=val, color=factor(w)))+
geom_line()+facet_grid(w~.,scale = "free")
looking at the head of dat everything looks right:
> head(dat)
t w val
1 1 12 10.5533
2 2 12 13.4467
3 3 12 10.5533
4 4 12 13.4467
5 5 12 10.5533
6 6 12 13.4467
So the lower (blue) line should only have values 10.5533 and 13.4467. But it also takes different values. What is wrong in my code?
Thanks in advance for any help
You really should be more careful before asserting that something is "wrong". The way you are creating dat the rows are not ordered by dat$t, so head(...) is not displaying the extra values:
head(dat[order(dat$w,dat$t),],10)
# t w val
# 21 1 18 18.43530
# 61 1 18 18.36313
# 22 2 18 19.56470
# 62 2 18 17.63687
# 23 3 18 18.43530
# 63 3 18 18.36313
# 24 4 18 19.56470
# 64 4 18 17.63687
# 25 5 18 18.43530
# 65 5 18 18.36313
Note the row numbers.

Frequency distribution with custom format data

I need help with a R plot, with a data format I have not worked with before. Please help if you know.
NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3
i need a bar plot with numbers on X axis (continuous, not bins in histogram) and frequency on Y, but combined.
like
10 46
11 3
12 6
it seems simple enough, but i have 10,000 rows and large numbers in real data so I am looking for a good solution in R without doing it manually.
What about:
##tapply splits dd$FREQ by dd$NUM and "sums" them
barplot(tapply(dd$FREQUENCY, dd$NUMBER, sum))
to get:
Read in your data:
dd = read.table(textConnection("NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3"), header=TRUE)

Resources