ggplot2 to generate a geom_bar() - r

I have the following table in R Studio, and I am trying to create a geom_bar() with ggplot2 to represent the percentage of students that received financial assistance and those who did not.
comparison_table <- data.frame(Students1, Percentage1, financial_assistance1, stringsAsFactors = F)
comparison_table
Students1 Percentage1 financial_assistance1
1 PELL 0.4059046 True
2 NOPELL 0.5018954 False
3 LOAN 0.4053371 True
4 NOLOAN 0.2290538 False
My code for the bar plot is:
PELL<-mean(na.omit(completion_rate_based_on_financial_assistance$percent_of_students_with_Pell_Grant_and_completed_in_4_years))
NOPELL<-mean(na.omit(completion_rate_based_on_financial_assistance$percent_of_students_without_Pell_Grant_and_completed_in_4_years))
LOAN<-mean(na.omit(completion_rate_based_on_financial_assistance$percent_of_students_with_federal_loan_and_completed_in_4_years))
NOLOAN<-mean(na.omit(completion_rate_based_on_financial_assistance$percent_of_students_without_federal_loan_and_completed_in_4_years))
tab1<-cbind(PELL,LOAN)
tab2<-cbind(NOPELL,NOLOAN)
tab<-rbind(tab1,tab2)
rownames(tab) <- c("PELL","LOAN")
colnames(tab) <- c("With Financial Help","Without Financial Help")
barplot(tab,beside = F,legend.text= rownames(tab),xlab = "Financial Help",col=c("lightblue","pink"))
My question is, how can I generate this bar plot using ggplot2 and geom_bar(). For visualization purposes, I wish to generate two stacked bars, one that contains the percentage of students that received Pell Grants and Loans (PELL & LOAN) and other bar that contains the percentage of students that did not received Pell Grants and Loans (NOPELL & NOLOAN).

tb<-data.frame(students1=c("PELL","NOPELL","LOAN","NOLOAN"), percentage1=c(40,50,40,23), financial_assistance1=c(TRUE,FALSE, TRUE, FALSE))
g<-ggplot(tb,aes(financial_assistance1,percentage1))
g+geom_bar(stat="identity",aes(fill=students1))
Explanation: It's pretty simple - create the ggplot with the x (financial_assistance) and y (percentage) variables, and create the geom_bar. Only thing to remember about the geom_bar is that it defaults to counting how many cases of "x" and showing that as the bar height. In this case, you want to use the y variable as the value, so that's the stat="identity" bit. The aes(fill=students1) is there to add the two colours for the stacked bars.
UPDATE: Just noticed I misread what you tried to achieve, edited the code to correct for it.

Related

Traminer: Mean time barplot with number of observations

Because I am still new to TraMineR, my problem may seem trivial to most of you. I'm working on meantime plots with my data and would I like to plot on the bar charts the mean time spent in different states. is there a command in TramineR?
The option to add bar labels on the mean time plot has been implemented in version TraMineR v 2.2-3. The option is available through the arguments bar.labels, cex.barlab, and offset.barlab of the plot method for the outcome of seqmeant. These arguments can be passed as ... arguments to seqmtplot. In this latter case, when groups are specified, bar.labels should be a matrix with the labels for each group in columns.
I show, using the actcal data, how to display the meant times over the bars. The group is here sex, but can of course be your clusters.
library(TraMineR)
data(actcal)
## We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal),300),]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)
group <- factor(actcal$sex)
blab <- NULL
for (i in 1:length(levels(group))){
blab <- cbind(blab,seqmeant(actcal.seq[group==levels(group)[i],]))
}
seqmtplot(actcal.seq, group=group,
bar.labels = round(blab,digits=2), cex.barlab=1.2)

Customize Barplot in base function barplot()

I have a date frame (df), with 2 columns: One numerical and one as.factor() with three levels:
Pre
Post
Blank
I want to make a barplot() with each factor colored to it's respective group (easy), and change the order of the plot so each factor appears next to each other (this is where I'm stuck).
I followed the same logic as I would with a boxplot(), but it does not appear to work the same. I also tried following examples from several stackoverflow threads, including (but not limited to) this one:
Re-ordering bars in R's barplot()
But still can't get it to work.
Here is what I've tried, and it works with the boxplot function quite well:
df <- read.table("https://pastebin.com/raw/zaETq28M", header = T)
df$Treatment <- as.factor(df$Treatment)
levels(df$Treatment) # note: I would like to display order to be: Pre, Post, then Blank.
df$Treatment <- ordered(df$Treatment, levels = c("Pre","Post","Blank")) # set to the right order
barplot(df$Cq,names.arg = df$Treatment ,col = df$Treatment, ylim=c(0,30), main = "Not the right order bar plot", cex.main=2)
In total, I should have 66 individual bars (which I do), but somehow, the order of the graph is not what I set, and the groups are still separated. How can I simply get 3 distinct groups? Meaning, first show all "Pre", then all "post", followed by "blank"
General questions for future posts:
How to get a get my graphs to be displayed on Stackoverflow when I post a question? For some reason, my posts never include my graphs.
Also, any kind suggestion on using color blind pallet would be great, but I can just do this manually if needed. Just curious if there is an automatic way of doing it, so I do not need to set it manually in all my graphs
Thank you for your help
Do you mean this?
First the Pre, then Post then blank. Within each group order is preserved. Legend added with blank == No Treatment.
df <- read.table("https://pastebin.com/raw/zaETq28M", header = T)
df_Pre <- df[which(df$Treatment == 'Pre'),]
df_Post <- df[which(df$Treatment == 'Post'),]
df_Blank <- df[which(df$Treatment == 'Blank'),]
ddf <- rbind(df_Pre, df_Post, df_Blank)
ddf$color <- c(rep('blue', nrow(df_Pre)), rep('red', nrow(df_Post)), rep('magenta', nrow(df_Blank)))
barplot(ddf$Cq, col = ddf$color, names = rownames(ddf))
legend("bottomleft",
legend = c("Pre-Treatmen", "Post-Treatment", 'No Treatment'),
fill = c("darkblue", "red","magenta"))

Q: How combine two types of lines using ggplot?

I am trying to plot the following graph:
This plot was made using a command in R; however, I need to change the x-axis. As you see the x-axis starts at 0 and finish at 46. I want that the x-axis starts in 1972 and finishes in 2018 seq(1972, 2018). The data used for this graph is the following:
For regime one
structure(c(0.996336942021931, 0.982749831853788, 0.25257000136794,
0.707797489518183, 0.339372705184362, 0.999209103898399, 0.348786927897612,
0.821500770877589, 0.569473419352121, 0.544946043345147, 0.15347485404411,
0.987921203799956, 0.00247541125926418, 0.999925918450173, 0.996940249283586,
0.0141234625702467, 0.105466117156579, 0.999992944275275, 0.991723355647765,
0.0958472062267191, 0.0362729940372193, 0.999999790503447, 0.0750715811130157,
0.999975836828039, 0.998991768987905, 0.327943641159186, 5.05723080618291e-05,
0.999999999869691, 0.995538324405397, 0.123355227931813, 0.999776636825943,
0.00875781169836433, 0.696284480883101, 0.854839147672286, 0.113243492249383,
0.00984853715078062, 0.442061195271808, 0.999959859676686, 0.0249739384218217,
0.715262186931097, 0.269481397703521, 0.708458897302807, 0.0444979324520481,
0.000133950914911277, 0.997976154782607, 0.191386380576805, 0.99775339928206,
0.97921531595208, 0.27690132186733, 0.671995422154737, 0.458800347851363,
0.999155966774432, 0.417000082142666, 0.838969001100901, 0.576424593247709,
0.439169303472056, 0.227227711549776, 0.978527102362448, 0.00408165810824898,
0.999955057843957, 0.994643622809094, 0.00847570472458959, 0.163000467960203,
0.999995704786608, 0.987482614312069, 0.0569007267419926, 0.0585312256476362,
0.999999671060746, 0.118213072794827, 0.99998536150034, 0.998897081324845,
0.212968271334585, 8.35316288758489e-05, 0.999999999920876, 0.993537683112221,
0.188538497918178, 0.999604116439039, 0.00905848219612739, 0.769430430615986,
0.794457999021984, 0.0665707154963958, 0.00776458004359329, 0.5668500474175,
0.999931021995446, 0.0265573724408095, 0.661699294173752, 0.296009575623967,
0.587638579198176, 0.0251758869152202, 0.000220356219397782,
0.997352716237698, 0.191386380576805), .Dim = c(46L, 2L))
for regime 2:
structure(c(0.00366305797806813, 0.0172501681462116, 0.74742999863206,
0.292202510481817, 0.660627294815638, 0.000790896101601132, 0.651213072102388,
0.178499229122411, 0.430526580647879, 0.455053956654853, 0.846525145955889,
0.0120787962000438, 0.997524588740736, 7.40815498269273e-05,
0.00305975071641352, 0.985876537429753, 0.894533882843421, 7.05572472485335e-06,
0.00827664435223535, 0.904152793773281, 0.963727005962781, 2.09496553467159e-07,
0.924928418886985, 2.41631719608902e-05, 0.00100823101209502,
0.672056358840815, 0.999949427691938, 1.30308744399533e-10, 0.00446167559460289,
0.876644772068187, 0.00022336317405711, 0.991242188301636, 0.303715519116899,
0.145160852327714, 0.886756507750617, 0.990151462849219, 0.557938804728191,
4.01403233139628e-05, 0.975026061578178, 0.284737813068903, 0.730518602296479,
0.291541102697193, 0.955502067547952, 0.999866049085089, 0.00202384521739295,
0.808613619423195, 0.00224660071793958, 0.0207846840479196, 0.72309867813267,
0.328004577845263, 0.541199652148637, 0.000844033225568314, 0.582999917857334,
0.161030998899099, 0.423575406752291, 0.560830696527944, 0.772772288450224,
0.0214728976375518, 0.995918341891751, 4.49421560426429e-05,
0.00535637719090558, 0.99152429527541, 0.836999532039797, 4.29521339242403e-06,
0.0125173856879312, 0.943099273258007, 0.941468774352364, 3.28939253926857e-07,
0.881786927205173, 1.46384996596921e-05, 0.00110291867515508,
0.787031728665414, 0.999916468371124, 7.91243531099699e-11, 0.00646231688777926,
0.811461502081822, 0.00039588356096145, 0.990941517803873, 0.230569569384014,
0.205542000978016, 0.933429284503604, 0.992235419956407, 0.4331499525825,
6.89780045536876e-05, 0.973442627559191, 0.338300705826248, 0.703990424376033,
0.412361420801824, 0.97482411308478, 0.999779643780602, 0.00264728376230197,
0.808613619423195), .Dim = c(46L, 2L))
I know that the red line can be plotted using geom_line but I do not know how can the black bars plot? maybe using geom_bar, and also how can I merge the plots?
Thanks for your help
It's actually plotted using base R (good old times), using your first data for For regime one:
plot(Regime1[,1],type="h",xaxt="n",ylab="",cex.axis=0.6,xlab="",xlim=c(0,46))
lines(Regime1[,2],col="red")
mtext("Smoothed Probabilities",2,padj=-5,col="red",cex=0.7)
mtext("Fitted Probabilities",4,padj=1,cex=0.7)
axis(side=1,at=c(0,20,46),labels=c(1972,1992,2018))
Your xaxis values are actually 0:46, so you turn off the x-axis ticks using xaxt="n", then with axis(), you put it at 0,20,46 with the labels 1972...
It also depends on your plotting device, so might have to change the padj parameter in the axis to adjust the axis labels. I guess you can check out post like this for base R plotting functions.
In ggplot2, I guess you just create a data.frame with the Index as the years you need, and you call geom_segment() to plot the vertical lines :
library(ggplot2)
Regime1 = data.frame(Regime1)
colnames(Regime1) = c("Fitted","Smoothed")
Regime1$index = 1:nrow(Regime1)+1972
ggplot(Regime1,aes(x=index))+
geom_segment(aes(xend=index,y=0,yend=Fitted,col="Fitted")) +
geom_line(aes(y=Smoothed,col="Smoothed")) + theme_minimal() +
scale_color_manual(values=c("black","red"))
For a ggplot2 solution, you are going to need a data.frame or tibble with 4 columns (Regime, Year, Smoothed, and Fitted). Based on the data you provided, this would have 92 rows.
Now assuming you use those column names (and storing your data into the variable example.dat), a ggplot2 solution is
example.dat %>%
ggplot( aes(x=Year) ) +
geom_line( aes(y=Smoothed), color="red" ) +
geom_linerange( aes(ymax=Fitted), ymin=0 ) +
facet_wrap( ~ Regime, ncol=1 )
Then you might need to adjust some of the scales to get the best plot.

How to put 2 boxplot in one graph in R without additional libraries?

I have this kind of dataset
Defect.found Treatment Program
1 Testing Counter
1 Testing Correlation
0 Inspection Counter
3 Testing Correlation
2 Inspection Counter
I would like to create two boxplotes, one boxplot of detected defects per program and one boxplot of detected defects per technique but in one graph.
Meaning having:
boxplot(exp$Defect.found ~ exp$Treatment)
boxplot(exp$Defect.found ~ exp$Program)
In a joined graph.
Searching on Stackoverflow I was able to create it but with lattice library typing:
bwplot(exp$Treatment + exp$Program ~ exp$Defects.detected)
but i would like to know if its possible to create the graph without additional libraries like ggplot and lattice
Prepare the plot window to receive two plots in one row and two columns (default is obviously one row and one column):
par(mfrow = c(1, 2))
My suggestion is to avoid using the word exp, because it is already used for the exponential function. Use for instance mydata.
Defects found against treatment (frame = F suppresses the external box):
with(mydata, plot(Defect.found ~ Treatment, frame = F))
Defects found against program (ylab = NA suppresses the y label because it is already shown in the previous plot):
with(mydata, plot(Defect.found ~ Program, frame = F, ylab = NA))

want to use another df for errorbars in R with barplot

I have these two df.
x;
experiment expression
1 HC 50
2 LC 4
3 HR 10
4 LR 2
y;
HC_conf_lo HC_conf_hi LC_conf_lo LC_conf_hi HR_conf_lo HR_conf_hi LR_conf_lo LR_conf_hi
1 63.3293 109.925 2.33971 5.26642 8.8504 16.7707 0.124013 0.434046
I want to use df:y to plot low and high conf. points. Output should be a barplot with errorbars. Can someone show me using lines in the basic package how to do this?
So don't know if your data is valid. Assuming the confidence intervals are valid.
Here's what you can do to get error bars in your data
#First reading in your data
x<-read.table("x.txt", header=T)
y<=read.table("y.txt", header =T)
#reshaping y to merge it with x
y.wide <-data.frame(matrix(t(y),ncol=2,byrow=T)) #Transpose Y,
#matrix with 2 cols, byrow,
#so we get the lo and hi values in one row
names(y.wide)<-c("lo","hi") #name the columns in y.wide
#Make a data.frame of x and y.wide
xy.df <-data.frame(x,y.wide) # this will be used for plotting the error bars
#make a matrix for using with barplot (barplot takes only matrix or table)
xy<-as.matrix(cbind(expression=x$expression,y.wide))
rownames(xy)<-x$experiment #rownames, so barplot can label the bars
#Get ylimts for barplot
ylimits <-range(range(xy$expression), range(xy$lo), range(xy$hi))
barx <-barplot(xy[,1],ylim=c(0,ylimits[2])) #get the x co-ords of the bars
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
# ? as don't know if it's C.I, or what
with(xy.df, arrows(barx,expression,barx,lo,angle=90, code=1,length=0.1))
with(xy.df, arrows(barx,expression,barx,hi,angle=90, code=1,length=0.1))
Resultant Plot
But it doesn't look right, This is because your expression values don't fall between the lo and hi values.
With the hack below,
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
with(xy.df, arrows(barx,lo,barx,hi,angle=90, code=2,length=0.1))
with(xy.df, arrows(barx,hi,barx,lo,angle=90, code=2,length=0.1))
The resultant plot
So look at the both arrows call carefully, and you will see how I achieved it.
I would recommend double checking your calculations though.
And this is far easier with ggplot2. Look at this page for examples and code
http://docs.ggplot2.org/0.9.3.1/geom_errorbar.html

Resources