Related
I have a tibble created like this:
tibble(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
Now I would like to know how the type of housing is distributed per district. Since the amount of respondents per district is different, I would like to work with percentages. Basically I'm looking for two plots;
1) One barplot in which the percentage of housing categories is visualized in 1 bar per district (since it is percentages all the bars would be of equal height).
2) A pie chart for every district, with the percentage of housing categories for that specific district.
I am however unable to group the data is the wished way, let along compute percentages of them. How to make those plots?
Thanks ahead!
Give this a shot:
library(tidyverse)
library(ggplot2)
# original data
df <- data.frame(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
# group by district
df <- df %>%
group_by(district) %>%
summarise(housing=sum(housing))
# make percentages
df <- df %>%
mutate(housing_percentage=housing/sum(df$housing)) %>%
mutate(district=as.character(district)) %>%
mutate(housing_percentage=round(housing_percentage,2))
# bar graph
ggplot(data=df) +
geom_col(aes(x=district, y=housing_percentage))
# pie chart
ggplot(data=df, aes(x='',y=housing_percentage, fill=district)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0) +
theme_void()
Which yields the following plots:
I want to have a barplot using ggplot2 that display multiple bars within each group, but in my plot, I have 4 bars instead of 8 for each group. I will appreciate your help.
here is my code:
levels = c('D', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9')
method = c('G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7','G8')
ave = c(4, 4, 4, 4, 5, 1, 2, 6, 3, 5, 2, 2, 2, 2, 5, 3, 4, 1, 1, 1, 2,
2, 2, 2, 3, 3, 2, 1, 1, 1, 1, 3, 4, 5, 6, 8, 9, 7, 1, 2, 3, 3, 4, 5, 7,
6, 1, 1, 1, 2, 5, 7, 7, 8, 9, 1, 4, 6, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
levels = factor(c(rep(levels,8)))
method = factor(c(rep(method,10)))
dat = data.frame(levels,ave,method)
dodge = position_dodge(width = .9)
p = ggplot(dat,mapping =aes(x = as.factor(levels),y = ave,fill =
as.factor(method)))
p + geom_bar(stat = "identity",position = "dodge") +
xlab("levels") + ylab("Mean")
It looks like geom_bar will only plot bars for observations that exist; if you want to have bars for every method (assuming you want each level to have a bar for each method), you need to have observations in your data corresponding to those pairings. Currently, it looks like each level corresponds to two methods at most. To artificially generate those pairings, you can use tidyr::complete() and tidyr::expand() before plotting. For each new pairing, ave will automatically be assigned NA, but you can change this behavior using the fill parameter in tidyr::complete().
Here's an example where ave is set to 0 for every new pairing instead of NA:
dat %>%
complete(expand(dat, levels, method), fill = list(ave = 0)) %>%
ggplot(df4,mapping = aes(x = as.factor(levels),
y = ave,
fill = as.factor(method),
)) +
geom_bar(stat = "identity", position = position_dodge(width = 1))+
xlab("levels") +
ylab("Mean")
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)
My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.
To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.
(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]])
through print(myplots[[4]]) one at a time.)
Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.
(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)
Here is a reproducible example:
library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function
#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
When I look at a summary of a plot object in the plot list, this is what I see
> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping: x = data2[, i]
faceting: facet_null()
-----------------------------------
geom_histogram: fill = lightgreen
stat_bin:
position_stack: (width = NULL, height = NULL)
I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.
Thanks!
In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_histogram(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).
1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.
Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- eval(substitute(
ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
,list(i = i)))
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Using lapply works too as x exists within the anonymous function environment (using mtcars as data):
plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
ggplot(data = mtcars) +
geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
theme_wsj() +
scale_colour_wsj("colors6")
})
I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.
Here is the code with the visualizations:
Question
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_bar(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Answer
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_bar(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
multiplot(plotlist = myplots, cols = 4)
Same result using lapply:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_bar(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Created on 2021-04-09 by the reprex package (v0.3.0)
I would like to plot 3 plots in the same window. Each will have a different amount of bar plots. How could I make them all the same size and close together (same distance from each other) without doing NAs in the smaller barplots. example code below. I do want to point out my real data will be plotting numbers from dataframes$columns not a vector of numbers as shown below. I am sure there is magic way to do this but cant seem to find helpful info on the net. thanks
pdf(file="PATH".pdf");
par(mfrow=c(1,3));
par(mar=c(9,6,4,2)+0.1);
barcenter1<- barplot(c(1,2,3,4,5));
mtext("Average Emergent", side=2, line=4);
par(mar=c(9,2,4,2)+0.1);
barcenter2<- barplot(c(1,2,3));
par(mar=c(9,2,4,2)+0.1);
barcenter3<- barplot(c(1,2,3,4,5,6,7));
Or would there be a way instead of using the par(mfrow....) to make a plot window, could we group the barcenter data on a single plot with an empty space between the bars? This way everything is spaced and looks the same?
Using the parameters xlim and width:
par(mfrow = c(1, 3))
par(mar = c(9, 6, 4, 2) + 0.1)
barcenter1 <- barplot(c(1, 2, 3, 4, 5), xlim = c(0, 1), width = 0.1)
mtext("Average Emergent", side = 2, line = 4)
par(mar = c(9, 2, 4, 2) + 0.1)
barcenter2 <- barplot(c(1, 2, 3), xlim = c(0, 1), width = 0.1)
par(mar = c(9, 2, 4, 2) + 0.1)
barcenter1 <- barplot(c(1, 2, 3, 4, 5, 6, 7), xlim = c(0, 1), width = 0.1)
Introducing zeroes:
df <- data.frame(barcenter1 = c(1, 2, 3, 4, 5, 0, 0),
barcenter2 = c(1, 2, 3, 0, 0, 0, 0),
barcenter3 = c(1, 2, 3, 4, 5, 6, 7))
barplot(as.matrix(df), beside = TRUE)
With ggplot2 you can get something like this:
df <- data.frame(x=c(1, 2, 3, 4, 5,1, 2, 3,1, 2, 3, 4, 5, 6, 7),
y=c(rep("bar1",5), rep("bar2",3),rep("bar3",7)))
library(ggplot2)
ggplot(data=df, aes(x = x, y = x)) +
geom_bar(stat = "identity")+
facet_grid(~ y)
For the option you mentioned in your second comment you would need:
x <- c(1, 2, 3, 4, 5, NA, 1, 2, 3, NA, 1, 2, 3, 4, 5, 6, 7)
barplot(x)