I would like to create multiple histograms (ggplot) using a for loop. The problem is that my x-as from the plots, stay the same like "value". Do you know how to change the x-as every time it loops?
My dataframe for example:
df <- data.frame(variable = c("A", "A", "B", "B", "C", "C"), value = c(1, 2, 4, 5, 2, 3))
So that means I get three plots with x-as: "A", "B" and "C"
My code:
for (i in unique(df$variable)){
d <- subset(df, df$variable == i)
print(ggplot(d, aes(x = value)) + geom_histogram())
}
You can take help of imap to get different x-axis value after splitting the data by variable.
library(ggplot2)
list_plot <- df %>%
split(.$variable) %>%
purrr::imap(~ggplot(.x, aes(x = value)) +
geom_histogram() + xlab(.y))
Also have you considered using facets? Where x-axis is the same and you get A, B, C as facet names.
ggplot(df, aes(x = value)) + geom_histogram() + facet_wrap(~variable)
Related
Here is a sample dataset:
#sample data
df <- tibble(year=c(1,1,1,1,2,2,2,2,3,3,3,3),
col=c("a", "b", "c", "d", "a", "b", "c", "d", "a", "b", "c", "d"),
freq=c(2,3,5,1,4,3,8,3,5,7,3,9))
I want to create a bar plot for each year of data independently. I'd also like to print the total sample size for each year as a caption on the respective plot. I'm currently doing it manually like this:
#find total sample size for year 1
df_yr_1 <- df %>%
filter(year==1)
sum(df_yr_1$freq)
#make year 1 histogram
hist <- ggplot(df_yr_1, aes(x=col, y=freq)) +
geom_col() +
labs(caption = "N=11")
hist
So on and so forth for each year.
Is there a way to automate this process? Ideally, all the plots would save to a pdf (1 per page), but if they're saved independently that's fine too. I have a feeling adding the total sample size as a caption might make the process more challenging, so if it's possible to automate the process without the sample size captions that would still be very helpful. Thanks in advance!
We could wrap the code in a loop
pdf( "plots.pdf", onefile = TRUE)
for(i in unique(df$year)) {
df_yr_i <- df %>%
filter(year==i)
hist <- ggplot(df_yr_i, aes(x=col, y=freq)) +
geom_col() +
labs(caption = paste0("N=", sum(df_yr_i$freq)))
print(hist)
}
dev.off()
-output
A simple loop should do the trick:
pdf("my.pdf")
for(i in 1:3) {
plot_df <- df %>% filter(year ==1)
p <- ggplot(plot_df, aes(col, freq)) +
geom_col() +
labs(title = paste("Year", i), caption = paste0("N=", sum(plot_df$freq)))
print(p)
}
dev.off()
Resulting in
my.pdf
Page 1
Page 2
Page 3
Why are the pies flat?
df<- data.frame(
Day=(1:6),
Var1=c(172,186,191,201,205,208),
Var2= c(109,483,64010,161992,801775,2505264), A=c(10,2,3,4.5,16.5,39.6), B=c(10,3,0,1.4,4.8,11.9), C=c(2,5,2,0.1,0.5,1.2), D=c(0,0,0,0,0.1,0.2))
ggplot() +
geom_scatterpie(data = df, aes(x = Var1 , y = Var2, group = Var1), cols = c("A", "B", "C", "D"))
I have tried using coord_fixed() and does not work either.
The problem seems to be the scales of the x- and y-axes. If you rescaled them to both to have zero mean and unit variance, the plot works. So, one thing you could do is plot the rescaled values, but transform the labels back into the original scale. To do this, you would have to do the following:
Make the data:
df<- data.frame(
Day=(1:6),
Var1=c(172,186,191,201,205,208),
Var2= c(109,483,64010,161992,801775,2505264), A=c(10,2,3,4.5,16.5,39.6), B=c(10,3,0,1.4,4.8,11.9), C=c(2,5,2,0.1,0.5,1.2), D=c(0,0,0,0,0.1,0.2))
Rescale the variables
df <- df %>%
mutate(x = c(scale(Var1)),
y = c(scale(Var2)))
Find the linear map that transforms the rescaled values back into their original values. Then, you can use the coefficients from the model to make a function that will transform the rescaled values back into the original ones.
m1 <- lm(Var1 ~ x, data=df)
m2 <- lm(Var2 ~ y, data=df)
trans_x <- function(x)round(coef(m1)[1] + coef(m1)[2]*x)
trans_y <- function(x)round(coef(m2)[1] + coef(m2)[2]*x)
Make the plot, using the transformation functions as the call to labels in the scale_[xy]_continuous() functions
ggplot() +
geom_scatterpie(data=df, aes(x = x, y=y), cols = c("A", "B", "C", "D")) +
scale_x_continuous(labels = trans_x) +
scale_y_continuous(labels = trans_y) +
coord_fixed()
There may be an easier way than this, but it wasn't apparent to me.
The range on the y-axis is so large it's compressing the disks to lines. Change the y-axis to a log scale, and you can see the shapes. Adding coord_fixed() to keep the pies circular:
ggplot() +
geom_scatterpie(data = df, aes(x = Var1 , y = Var2, group = Var1), cols = c("A", "B", "C", "D")) +
scale_y_log10() +
coord_fixed()
Suppose my data is two columns, one is "Condition", one is "Stars"
food <- data.frame(Condition = c("A", "B", "A", "B", "A"), Stars=c('good','meh','meh','meh','good'))
How to make a barplot of the frequency of "Star" as grouped by "Condition"?
I read here but would like to expand that answer to include groups.
for now I have
q <- ggplot(food, aes(x=Stars))
q + geom_bar(aes(y=..count../sum(..count..)))
but that is the proportion of the full data set.
How to make a plot with four bars, that is grouped by 'Condition'?
Eg. 'Condition A' would have 'Good' as 0.66 and 'Meh' as 0.33
I guess this is what you are looking for:
food <- data.frame(Condition = c("A", "B", "A", "B", "A"), Stars=c('good','meh','meh','meh','good'))
library(ggplot2)
library(dplyr)
data <- food %>% group_by(Stars,Condition) %>% summarize(n=n()) %>% mutate(freq=n/sum(n))
ggplot(data, aes(x=Stars, fill = Condition, group = Condition)) + geom_bar(aes(y=freq), stat="identity", position = "dodge")
At first i have calculated the frequencies using dplyr package, which is used as y argument in geom_bar(). Then i have used fill=Condition argument in ggplot() which divided the bars according to Condition. Additionally i have set position="dodge" to get the bars next to each other and stat="identity", due to already calculated frequencies.
I have used value ..prop.., aesthetic group and facet_wrap(). Using aesthetic group proportions are computed by groups. And facet_wrap() is used to plot each condition separately.
require(ggplot2)
food <- data.frame(Condition = c("A", "B", "A", "B", "A"),
Stars=c('good','meh','meh','meh','good'))
ggplot(food) +
geom_bar(aes(x = Stars, y = ..prop.., group = Condition)) +
facet_wrap(~ Condition)
Following is a short code to generate a barplot with an added layer of line plot. I have added comments indicating what works and what doesn't. While my problem is solved, I can't understand why I had a problem or how it got solved. If you can explain or suggest the right way to do it, that would be nice.
library(ggplot2)
factors <- c("A", "B", "C", "D", "B", "A", "C", "B", "D", "D")
data <- data.frame(n=1:10, a= runif(10, 1, 5), b=runif(10, 1, 5),c=runif(10, 1, 5))
gg_data <- melt(data, id.vars="n", variable.name="var")
gg_data$alp <- rep(factors, 3)
gg_data1 <- melt(data.frame(n=1:10, a= runif(10, 2, 3), b=runif(10, 4, 5),c=runif(10, 3, 4)), id.vars="n", variable.name="var")
#this does not work
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
#this gives a weird output
gg_data1$alp <- rep(factors, 3)
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
#this works the way I want it to, don't know why.
gg_data1$alp <- "A"
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
Basically your plots are combining information from the two datasets to try to get a new plot. Since you have listed that fill = alp, then ggplot is trying to apply this information to all of your plots.
The easiest way to see this is consider this new data.frame:
gg1 <- gg_data1
names(gg1) <- c("n1", "var1", "value1")
gg_combine <- cbind(gg_data, gg1)
To reproduce your 2nd graph it is equivalent to:
ggplot(data=gg_combine, aes(x=n, y=value, fill=alp))+
geom_bar(stat="identity")+
geom_line(aes(x=n1, y=value1, colour=alp)) +
facet_grid( var ~ ., scale="free_y")
Basically what it is saying is I want to group everything by "alp" and plot them together by those groups, which is why you get those lines; with the addition of colour=alp then it becomes clear why the lines look that way.
With your last plot. What you've done is only group the bar plots with alp, but with the lines we want to ignore this grouping. This is equivalent to:
ggplot(data=gg_combine, aes(x=n, y=value))+
geom_bar(aes(fill=alp), stat="identity")+
geom_line(aes(x=n1, y=value1)) +
facet_grid( var ~ ., scale="free_y")
Hope this helps.
I am trying to plot a pie chart using the following dataset
dt <- data.frame(name= c("A", "B", "C"),
one = sample(1:10, 3),
two= sample(1:10, 3),
three =sample(1:10, 3))
Of course the data are untidy, so I rearrange the dataset in a longitudinal form using
library(dplyr)
dt <- dt %>% gather("letter")
colnames(dt)[2] <- "number"
And I am perfectly able to plot a barchart
library(ggplot2)
ggplot(dt, aes(x=letter, y=value, fill=number)) +
geom_bar(stat="identity")
But when I apply the coord_polar() transformation, I can't make the slices look even nor make the pie-chart to sum up to 100%
ggplot(dt, aes(x=letter, y=value, fill=number)) +
geom_bar(stat="identity") +
coord_polar(theta = "x")