How to fix colors of bar graph - r

I want to produce a bar chart where the top bar is one color, and the bottom bars are a second color. For some reason my code keeps producing a graph where all the bars are the same color. How can I fix this?
counts <- structure(list(
`Residency Program` = c(4),
`Coursework & Degree` =c(3),
`Generalized Job Placement` =c(3),
`Accelerated Program Duration (13 months)` =c(3),
`Faculty and Staff Support` =c(3),
`Self-Efficacy in Teaching` =c(1),
`Perceived Impact on Students` =c(1),
`Stipend` =c(1),
`Resources` =c(1),
`Opportunities (generalized)` =c(1),
`MicroCredentials` =c(1),
`Hybrid (F2F and Online) Delivery Format` =c(1),
`Teaching Fellows Program` =c(1)
),
class = "data.frame", row.names = c(NA, -1L))
xFun <- function(x) x/1 + c(0.2, cumsum(x)[-length(x)])
counts <- counts[, order(colSums(counts))]
par(mar=c(5, 20, 4.1, 5))
byc <- barplot(as.matrix(counts), horiz=TRUE,
col=c("dodgerblue4","slategray3","slategray3","slategray3",
"slategray3","slategray3","slategray3","slategray3",
"slategray3","slategray3","slategray3","slategray3"), # assign `byc`
border=FALSE, las=1, xaxt='n', ylim = range(0,16.1),xlim = range(0,5) )
text(1.19, 0.75, "1")
text(1.19, 1.85, "1")
text(1.19, 3.1, "1")
text(1.19, 4.3, "1")
text(1.19, 5.5, "1")
text(1.19, 6.7, "1")
text(1.19, 7.9, "1")
text(1.19, 9.1, "1")
text(3.19, 10.3, "3")
text(3.19, 11.5, "3")
text(3.19, 12.7, "3")
text(3.19, 13.9, "3")
text(4.19, 15.1, "4")

You may unlist your data, because so far you attempt to color the whole matrix (and actually use just one color). You can repeat the duplicate colors, notice that the order is upside down because of horiz=TRUE.
For the labels you can easily use the object byc you've already exported, as well as the values adjusted by say +.25. This works because text also eats vectors as x and y arguments.
counts.plot <- unlist(counts)
counts.plot <- counts.plot[order(counts.plot)]
op <- par(mar=c(5, 20, 4.1, 5)) ## set par/store defaults
byc <- barplot(counts.plot, horiz=TRUE,
col=c(rep("slategray3", length(counts.plot) - 1), "dodgerblue4"),
border=FALSE, las=1, xaxt='n', ylim = range(0, 16.1),
xlim = range(0, 5))
text(counts.plot + .25, byc, labels=counts.plot)
par(op) ## restore defaults

Related

parameters of histogram with R

First, I wanted to be able to display the absciss axis with decimal numbers (example: 1.5, 2.6, ...), but the problem is that when I display the histogram with my code, then automatically the x-axis displays whole number as you can see in the follow picture (I have circled in red what I would like to change): hist
How can i change the parameters to be able to get these whole numbers into decimals?
Secondly, I would like the numbers that appear on the x-axis to correspond exactly to my breaks vector.
Could someone please help me?
Here is my code:
my_data <- transform(my_data, new = as.numeric(new/1000000))
sal_hist_default = hist(my_data$new, breaks = c(1,6.3,11.6,16.9,22.2,27.5), col = "blue", border = "black", las = 1, include.lowest=TRUE,right=FALSE, main="Salary Of best category", xlab = "salaries", ylab = "num of players",xlim = c(1,27.5), ylim = c(0,600))
You should really provide sample data, but try this:
set.seed(42)
new <- rnorm(1000, 14, 3.5)
my_data <- data.frame(new)
sal_hist_default = hist(my_data$new, breaks = c(1, 6.3, 11.6, 16.9, 22.2, 27.5), col = "blue",
border = "black", las = 1, include.lowest=TRUE,right=FALSE, main="Salary Of best category",
xlab = "salaries", ylab = "num of players",xlim = c(1,27.5), ylim = c(0,600), xaxt="n")
axis(1, c(1, 6.3, 11.6, 16.9, 22.2, 27.5), c(1, 6.3, 11.6, 16.9, 22.2, 27.5))

How to automate positioning of inner labels within a stacked barplot?

I frequently have to produce stacked bar plots with labels. The way I've been coding the labels is very time intensive and I wondered if there was a way to code things more efficiently. I would like the labels to be centered on each section of the bars. I'd prefer base R solutions.
stemdata <- structure(list( #had to round some nums below for 100% bar
A = c(7, 17, 76),
B = c(14, 10, 76),
C = c( 14, 17, 69),
D = c( 4, 10, 86),
E = c( 7, 17, 76),
F = c(4, 10, 86)),
.Names = c("Food, travel, accommodations, and procedures",
"Travel itinerary and dates",
"Location of the STEM Tour stops",
"Interactions with presenters/guides",
"Duration of each STEM Tour stop",
"Overall quality of the STEM Tour"
),
class = "data.frame",
row.names = c(NA, -3L)) #4L=number of numbers in each letter vector#
# attach(stemdata)
print(stemdata)
par(mar=c(0, 19, 1, 2.1)) # this sets margins to allow long labels
barplot(as.matrix(stemdata),
beside = F, ylim = range(0, 10), xlim = range(0, 100),
horiz = T, col=colors, main="N=29",
border=F, las=1, xaxt='n', width = 1.03)
text(7, 2, "14%")
text(19, 2, "10%")
text(62, 2, "76%")
text(7, 3.2, "14%")
text(22.5, 3.2, "17%")
text(65.5, 3.2, "69%")
text(8, 4.4, "10%")
text(55, 4.4, "86%")
text(3.5, 5.6, "7%")
text(15, 5.6, "17%")
text(62, 5.6, "76%")
text(9, 6.9, "10%")
text(55, 6.9, "86%")
Staying base R as OP requested, we can easily automate the inner label positioning (i.e. x coordinates) within a small function.
xFun <- function(x) x/2 + c(0, cumsum(x)[-length(x)])
Now, it's good to know that barplot invisibly trows the y coordinates, we can catch them by assignment (here byc <- barplot(.)).
Eventually, just assemble coordinates and labels in data frame labs and "loop" through the text calls in a sapply. (Use col="white" or col=0 for white labels as wished in the other question.)
# barplot
colors <- c("gold", "orange", "red")
par(mar=c(2, 19, 4, 2) + 0.1) # expand margins
byc <- barplot(as.matrix(stemdata), horiz=TRUE, col=colors, main="N=29", # assign `byc`
border=FALSE, las=1, xaxt='n')
# labels
labs <- data.frame(x=as.vector(sapply(stemdata, xFun)), # apply `xFun` here
y=rep(byc, each=nrow(stemdata)), # use `byc` here
labels=as.vector(apply(stemdata, 1:2, paste0, "%")),
stringsAsFactors=FALSE)
invisible(sapply(seq(nrow(labs)), function(x) # `invisible` prevents unneeded console output
text(x=labs[x, 1:2], labels=labs[x, 3], cex=.9, font=2, col=0)))
# legend (set `xpd=TRUE` to plot beyond margins!)
legend(-55, 8.5, legend=c("Medium","High", "Very High"), col=colors, pch=15, xpd=TRUE)
par(mar=c(5, 4, 4, 2) + 0.1) # finally better reset par to default
Result
Data
stemdata <- structure(list(`Food, travel, accommodations, and procedures` = c(7,
17, 76), `Travel itinerary and dates` = c(14, 10, 76), `Location of the STEM Tour stops` = c(14,
17, 69), `Interactions with presenters/guides` = c(4, 10, 86),
`Duration of each STEM Tour stop` = c(7, 17, 76), `Overall quality of the STEM Tour` = c(4,
10, 86)), class = "data.frame", row.names = c(NA, -3L))
Would you consider a tidyverse solution?
library(tidyverse) # for dplyr, tidyr, tibble & ggplot2
stemdata %>%
rownames_to_column(var = "id") %>%
gather(Var, Val, -id) %>%
group_by(Var) %>%
mutate(id = factor(id, levels = 3:1)) %>%
ggplot(aes(Var, Val)) +
geom_col(aes(fill = id)) +
coord_flip() +
geom_text(aes(label = paste0(Val, "%")),
position = position_stack(0.5))
Result:

How to create 2D-Grid, raster or heatmap based on group values that include NAs?

Following data:
df <- data.frame(cbind("Group_ID" = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4), "WBHO" = runif(20, 1.0, 7.0), "SI" = runif(20, 1.0, 7.0), "OORT" = c(2.34, 4.64, NA, 5.32, 3.23, 6.01, 5.43, 4.78, 3.98, 3.80, 4.45, NA, NA, 3.18, 4.87, NA, NA, 5.73, 3.52, 4.89), "LMX" = runif(20, 1.0, 7.0),"RL" = runif(20, 1.0, 7.0),"AL" = c(1.54, NA, 1.08, 6.77, NA, NA, 4.56, NA, 5.34, 4.32, 2.45, 3.86, 6.21, 2.89, 7.32, 6.43, NA, 4.56, 3.89, 6.16),"SL" = runif(20, 1.0, 7.0),"RV" = runif(20, 1.0, 7.0),"PT" = runif(20, 1.0, 7.0),"SD" = runif(20, 1.0, 7.0), "HT" = runif(20, 1.0, 7.0), "RTL" = c(2.45, NA, 6.04, 2.88, 3.49, 2.30, NA, 5.32, 2.39, NA, 3.62, 3.22, 4.87, 2.91, 5.41, NA, NA, 4.78, 6.20, NA), "INB" = runif(20, 1.0, 7.0), "ETB" = runif(20, 1.0, 7.0)))
Now, I want to create a raster, 2D-Grid or Heatmap which gives a nice overview of all the variables for each group ("Group_ID") using the mean (the x-axis showing the groups and the y-axis all the variables), giving a particular field green colour for value 1 to 3, yellow for 3 to 5 and green for 5 to 7. I have the following Code to create a df that combines the variables in one column and has the values and Group-belonging in the other two:
library(dplyr)
library(tidyr)
df %>%
gather(key = "variable", value = "value", - Group_ID) -> df_new
This does not work, however, as there are NAs included. However, I want to keep those rows with NAs. Is there a way with which I can do this in the same step?
Then, I would like to create the raster concerning which I have been given the following code which I am not fully sure how to apply in this case:
library(raster)
r <- raster(ncol=nrow(df_new), nrow=15, xmn=0, xmx=4, ymn=0, ymx=15)
values(r) <- as.vector(as.matrix(df$WBHO, df$SI, df$OORT, df$LMX, df$RL, df$AL, df$SL, df$RV, df$PT, df$SD, df$HT, df$RTL,
df$INB, df$ETB)
plot(r, axes=F, box=F, asp=NA)
axis(1, at=seq(), 0:9)
axis(2, at=seq(), c("", colnames(df_new)), las=1)
Thanks for any help!
We can use the dplyr and tidyr to calculate the mean. After that, we can use the cut function to categorize the values. We can then use the geom_tile from the ggplot2 to plot a heatmap. Specify x to be the variable, y is Group_ID (converted to be factor), and fill to be based on value2. No raster package is required.
It is not clear why do you want two groups (1-3, 5-7), both being green. My example assign red to the group 5-7, but you can make changes easily based on your needs.
library(dplyr)
library(tidyr)
df_new <- df %>%
gather(key = "variable", value = "value", - Group_ID) %>%
group_by(Group_ID, variable) %>%
summarise(value = mean(value, na.rm = TRUE)) %>%
mutate(value2 = cut(value, breaks = c(1, 3, 5, 7), labels = c("Low", "Medium", "High"))) %>%
ungroup()
library(ggplot2)
ggplot(df_new, aes(x = variable, y = factor(Group_ID), fill = value2)) +
geom_tile() +
scale_fill_manual(values = c("Low" = "Green", "Medium" = "Yellow", "High" = "Red")) +
labs(
y = "Group_ID"
)

Wrap Axis Labels in Correlation Matrix

I'm attempting to use the ggcorr() function within library(GGally) to create a correlation matrix. The package is working as it is supposed to, but I'm running into an issue where I would like to edit how the axis labels appear on the plot.
Currently, they will automatically add a _ or . to separate names with spaces or other characters between them. Ideally, I would like to create a line break (\n) between spaces in names so that long names and short names can be easily read and don't extend much further beyond the appropriate column and row.
I have found solutions that others have used on SO, including using str_wrap(), but it was within a ggplot() call, not this specific package. I have inspected the R code for the package, but couldn't find where to edit these labels specifically. Whenever I attempt to edit X or Y axis text, it adds an entirely new axis and set of labels.
I currently dcast() a data frame into the resulting data frame and even when I gsub() "\n" into the player names column, they get lost in the dcast() transition.
Here is an example of what I am working with. I would like to be able to automatically create line breaks between first and last name of the labels.
library(GGally)
library(ggplot2)
test <- structure(list(Date = structure(c(17100, 17102, 17103, 17106,
17107), class = "Date"), `Alexis Ajinca` = c(1.2, NA, 9.2, 6.4,
NA), `Anthony Davis` = c(95.7, 76.9, 29, 67, 24.9), `Buddy Hield` = c(9.7,
4.7, 17, 8, 28.3), `Cheick Diallo` = c(NA, NA, 3.2, NA, NA),
`Dante Cunningham` = c(0.5, 27.6, 14, 13.5, -1), `E'Twaun Moore` = c(19.2,
16.1, 22, 20.5, 10.1), `Lance Stephenson` = c(16.1, 31.6,
8, 8.1, 34.8), `Langston Galloway` = c(10.9, 2, 13.8, 2.2,
29.4), `Omer Asik` = c(4.7, 6.6, 9.9, 15.9, 14.2), `Solomon Hill` = c(4.7,
13.2, 12.8, 35.2, 4.4), `Terrence Jones` = c(17.1, 12.4,
9.8, NA, 20.8), `Tim Frazier` = c(40.5, 40.2, 18.3, 44.1,
7.2)), .Names = c("Date", "Alexis Ajinca", "Anthony Davis",
"Buddy Hield", "Cheick Diallo", "Dante Cunningham", "E'Twaun Moore",
"Lance Stephenson", "Langston Galloway", "Omer Asik", "Solomon Hill",
"Terrence Jones", "Tim Frazier"), row.names = c(NA, -5L), class = "data.frame")
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85, size = 3,
layout.exp=2)
ggc
Thank you for any and all help and please, let me know if you have any questions or need any clarification!
A couple of approaches
You can edit the object returned by ggcorr
g = ggplot_build(ggc)
g$data[[2]]$label = gsub("_", "\n", g$data[[2]]$label )
grid::grid.draw(ggplot_gtable(g))
Or you can create a new data frame and add the labels manually using geom_text. This probably gives a bit more control over the text justification and placement.
# I dont see how to suppress the labels so just set the size to zero
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85,
size = 0, # set this to zero
layout.exp=2)
# Create labels and plot
dat <- data.frame(x = seq(test[-1]), y = seq(test[-1]),
lbs = gsub(" ", "\n", names(test[-1]) ))
ggc + geom_text(data=dat, aes(x, y, label=lbs), nudge_x = 2, hjust=1)

Basic Plotting in "Modeling Techniques in Predictive Analytics"

I am trying to plot the x and y pairs as demonstrated below. Can someone provide me with the basic code to plot x1, y1? I've tried a number of things to include plot(x1,y1) and its not recognizing these variables.
# The Anscsombe Quartet in R
# demonstration data from
# Anscombe, F. J. 1973, February. Graphs in statistical analysis.
# The American Statistician 27: 17รข21.
# define the anscombe data frame
anscombe <- data.frame(
x1 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
x2 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
x3 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
x4 = c(8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8),
y1 = c(8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26,10.84, 4.82, 5.68),
y2 = c(9.14, 8.14, 8.74, 8.77, 9.26, 8.1, 6.13, 3.1, 9.13, 7.26, 4.74),
y3 = c(7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73),
y4 = c(6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.5, 5.56, 7.91, 6.89))
# show results from four regression analyses
with(anscombe, print(summary(lm(y1 ~ x1))))
with(anscombe, print(summary(lm(y2 ~ x2))))
with(anscombe, print(summary(lm(y3 ~ x3))))
with(anscombe, print(summary(lm(y4 ~ x4))))
# place four plots on one page using standard R graphics
# ensuring that all have the same scales
# for horizontal and vertical axes
pdf(file = "fig_more_anscombe.pdf", width = 8.5, height = 8.5)
par(mfrow=c(2,2),mar=c(3,3,3,1))
with(anscombe, plot(x1, y1, xlim=c(2,20),ylim=c(2,14),
pch = 19, col = "darkblue", cex = 2, las = 1)
title("Set I")
with(anscombe,plot(x2, y2, xlim=c(2,20),ylim=c(2,14),
pch = 19, col = "darkblue", cex = 2, las = 1))
title("Set II")
with(anscombe,plot(x3, y3, xlim=c(2,20),ylim=c(2,14),
pch = 19, col = "darkblue", cex = 2, las = 1))
title("Set III")
with(anscombe,plot(x4, y4, xlim=c(2,20),ylim=c(2,14),
pch = 19, col = "darkblue", cex = 2, las = 1))
title("Set IV")
dev.off()
par(mfrow=c(1,1),mar=c(5.1, 4.1, 4.1, 2.1)) # return to plotting defaults
# suggestions for the student
# see if you can develop a quartet of your own
# or perhaps just a duet...
# two very different data sets with the same fitted model
Note that anscombe data set comes with R out of the box and does not have to be defined.
The code below sets up a 2x2 grid for plotting and then calculates the overall range for the x and separately for the y variables. Then for i = 1, 2, 3, 4 it creates the ith formula and plots it using the calculated ranges. as.roman is used to get the roman numeral portion of the title. Then we perform a linear regression. We could have just written fm <- lm(fo, anscombe) to calculate the regression but had we done that, the print(summary(fm)) output would have shown literally fo as the formula which is not very nice. Finally we plot the regression line using abline and print the summary.
Try this:
par(mfrow = c(2,2))
xrange <- range(anscombe[1:4])
yrange <- range(anscombe[5:8])
for(i in 1:4) {
fo <- as.formula( sprintf("y%d ~ x%d", i, i) )
plot(fo, anscombe, xlim = xrange, ylim = yrange, main = paste("Set", as.roman(i)))
fm <- do.call("lm", list(fo, quote(anscombe)))
abline(fm)
print( summary(fm) )
}
par(mfrow = c(1,1))
giving this plot (output from print(summary(...)) not shown):
If all you want to do is plot x1 and y1, try:
plot(anscombe$x1,anscombe$y1)
or (from your code):
with(anscombe, plot(x1, y1, xlim=c(2,20),ylim=c(2,14),
pch = 19, col = "darkblue", cex = 2, las = 1)
Your above code is plotting them to a pdf file, starting at the line:
pdf(file = "fig_more_anscombe.pdf", width = 8.5, height = 8.5)
and not ending until you terminate the pdf at:
dev.off()
If you don't terminate the pdf, you will never see a plot output in R. If you have run the code multiple times, make sure no pdf devices are open by running:
dev.off()
until you see:
Error in dev.off() : cannot shut down device 1 (the null device)

Resources