I am struggling with some plots. I have a really big data.frame with some entries. To get an overview I will work with some test data.
Let's assume the following data:
Sender <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akz <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkz <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
data <- data.frame(Sender, Akz, NAkz)
I want to get a (stacked) barplot group by the column "Person". So for each person I want to illustrate the occurrences of the columns "A" and "NA". Means one bar represents the column "A" with 3 "0"s and 4 "1"s and next to this bar I want the column "NA" with 4 "0"s and 3 "1"s. Would be great if there is a possibility to have a legend and the total amount of each level.
Thanks and all the best
Peter
PS: Found a pictures which illustrates a cool barplot. But I am not able to create this since the work with integers and total amounts
Your data is a bit messed up, I trust this is what you wanted to post:
data:
Person <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akzept <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkzept <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
df <- data.frame(Person, Akzept, NAkzept)
The key to plotting in ggplot2 is to arrange the data in long format achieved by the function gather:
library(tidyverse)
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = interaction(var, Person), fill = val))
or perhaps:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
facet_wrap(~var)
with text:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
geom_text(stat = "count", aes(label = ..count.. , x = Person, group = val), position = "stack", vjust = 2, hjust = 0.5)+
facet_wrap(~var)
Related
I'm trying to create a ggplot bar chart, and to create different colored fills for some bars.
I copied from somewhere the code, but with my data it just deosnt work.
Here is the code:
df <- data.frame(cat = c( 0, 1, 2, 3, 4),
perc = c(10, 20, 30, 40, 0),
mark = c( 0, 0, 0, 1, 0))
library(ggplot2)
ggplot(df) +
aes(x = cat, fill = mark, weight = perc) +
geom_bar()
But the result is a colorless chart, with this warning message:
The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
What am I doing wrong?
The issue is that geom_bar uses stat_count by default, so it simply counts up the number of rows at each value of cat. This summary doesn't know what to do with the fill = mark part of your mapping, since there could be multiple values for mark in each category. In your case this isn't obvious because there is only one value for fill at each value of cat, but the same principle applies; if you are using a grouped summary function then you cannot have a row-wise fill variable.
My guess is that you are looking for geom_col
df <- data.frame(cat = c( 0, 1, 2, 3, 4),
perc = c(10, 20, 30, 40, 0),
mark = c( 0, 0, 0, 1, 0))
library(ggplot2)
ggplot(df) +
aes(x = cat, fill = mark, y = perc) +
geom_col()
Created on 2022-11-24 with reprex v2.0.2
I have a dataset having many columns. The last column (Labels) shows the cluster member for each user (row). How can I edit my code to show only a few labels of x-axis?, since right now the dates are overlapping and can not be read. I want to show the first, last and one out of every five dates. For example, showing the dates 1,5,10,15,....,133, which 1 and 133 are the first and the last dates.
BTW, I have used the scale_x_date() but I had no success.
Data Sample
mat <- structure(c(1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 2, 3),
.Dim = c(3L, 5L),
.Dimnames = list(c("A", "B", "C"),
c("2011-1-6", "2011-1-9", "2011-1-15", "2011-2-19", "Labels")))
Code
library(tidyverse)
mat %>%
as.data.frame() %>%
mutate(id=1:nrow(mat),
Labels = as.factor(Labels)) %>%
pivot_longer(cols=starts_with("2011")) %>%
filter(value==1) %>%
ggplot(aes(x=name, y=id, color=Labels)) +
geom_point() +
theme(axis.text.x = element_text(angle = 90))
You can use scale_x_date. Following #Rui Barradas' comment, you first have to set the class of the dates to "Date".
Then, with scale_x_date, you can control the breaks with date_breaks. You can also control the format with date_labels. See ?scale_x_date for more info. Here is how to have an axis label every 5 days:
mat %>%
as.data.frame() %>%
mutate(id=1:nrow(mat),
Labels = as.factor(Labels)) %>%
pivot_longer(cols=starts_with("2011")) %>%
mutate(name = as.Date(name)) %>%
filter(value==1) %>%
ggplot(aes(x=name, y=id, color=Labels)) +
geom_point() +
scale_x_date(date_labels = "%Y-%m-%d", date_breaks = "5 days") +
theme(axis.text.x = element_text(angle = 90))
Lets say that I have a dataframe with two variables. We will call those variables x and var_count. We shall say that var_count is only integers and is categorical in nature (i.e. all values will be 1, 2, 3, 4, or 5). I want to plot the x variable on the y-axis and the var_count variable on the x-axis. I then want to label with geom_text the highest x value for a given var_count. So for all elements with a var_count of 1, I want to only label that element that has the highest x.
I have the code for most of written but not the condition to detect this situation and say "yes, this point needs to be labeled". How do I write this last condition?
So here is some sample code with some sample data.
library(tidyverse)
library(ggplot)
sample_data <- tibble(var_count = c(1, 1, 1, 1, 2, 2, 2, 5),
x = c(0.2, 1.1, 0.7, 0.5, 2.4, 0.8, 0.9, 1.3))
sample_data %>%
ggplot(aes(x = var_count, y = x)) +
geom_point() +
geom_text(aes(label =ifelse(
FALSE,
"my label", ""
)), hjust = 1.2)
I want to replace the ifelse's condition from FALSE to work so that the points circled in the following image are labeled:
Pretty much, I want to label the highest x value for a given var_count.
It's easiest to do this inside the dataframe itself. Make a new column where you do the ifelse() or case_when(), giving the column a blank string "" if the condition is FALSE. Then use that as the label text.
library(tidyverse)
library(ggrepel)
my_iris <-
iris %>%
group_by(Species) %>%
mutate(my_label = ifelse(Sepal.Width == max(Sepal.Width),
paste(Sepal.Width, "cm"), "")) %>%
ungroup()
ggplot(my_iris, aes(x = Petal.Length, y = Sepal.Width, colour = Species)) +
geom_point() +
geom_text_repel(aes(label = my_label), box.padding = 2)
Created on 2019-11-22 by the reprex package (v0.2.1)
I have a data.frame that looks like this:
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1))
> df
mean_swd afd_pot union_pot spd_pot fdp_pot green_pot linke_pot
1 4.0000 0 0 0 0 0 1
2 5.3333 1 1 1 1 1 0
3 6.3333 0 1 0 1 0 1
4 5.6666 0 1 0 0 1 1
5 3.6666 1 1 1 0 1 1
The pot variables represent a potential (1) or no potential (0) to vote for a party, mean_swd stands for a mean score on an attitude scale (from 1-7), the rows represent individuals.
I want produce a grouped barplot using ggplot2 that actually puts several barplots into one plot. It should plot the mean of mean_swd against the 6 pot variables separately, so that I can compare the mean scores on mean_swd for the individual groups of persons for which ..._pot == 1 (additionally, but not necessarily, grouping by the levels of these variables (1/0), so that I can compare mean_swd between those that have a potential of voting for that party vs those that don't).
As I don't have a single categorical variable by which to group I can't figure out how to code this and haven't found any solutions to the problem. The grouping solutions I found all work with single categorical variables for grouping. But I can't transform these six variables into one, as these potentials are not exclusive. The seperate barplots thus need to be calculated with varying individual observations. I also thought about grouping by boolean expressions but couldn't find any sources for this.
Any suggestions? Thank you in advance. Also feel free to criticize the presentation of my problem, as this is my first posting ever.
Welcome to stackoverflow!
Are you looking for something like this? Is this going in the right direction?
library(magrittr)
library(dplyr)
library(reshape2)
library(ggplot2)
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1))
dat <- df %>%
melt(id.vars = "mean_swd") %>%
group_by(variable, value) %>%
summarise(mean = mean(mean_swd))
dat$value %<>% as.factor()
ggplot(dat, aes(variable, mean, fill = value)) + geom_col()
Is this what you are after? Feel free to clarify. I'm not sure if you'd rather have one that counts 1s and 0s and plots that against the average though.
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1),
Group = c(1,2,3,4,5))
df1 <- gather(df, key = variables, value = value, mean_swd:linke_pot)
ggplot(df1, aes(x = variables, y = value, fill = factor(Group))) +
facet_wrap(~Group) +
geom_bar(stat = "identity", color = "black", position = position_dodge()) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(fill = "Groups")
I have a data frame with the following columns: product_id, ..., p1, p2, p3, ... etc. The p-columns only have 0 or 1 as their cell data.
I want a bar chart that sums up (or count) p1, p2 etc. and shows each p-column as a bar with the value of the sum (with ggplot).
Additionally I want to fill the color by product_id.
It seems like reshaping the data in the long format could be helpful, but I still stuck.
Here's the minimal data set, already reshaped:
product_id <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
p1 <- c(0, 0, 1, 1, 0, 0, 1, 0, 0)
p2 <- c(1, 0, 1, 0, 1, 0, 1, 1, 0)
p3 <- c(0, 0, 1, 1, 0, 1, 0, 1, 1)
df1 <- data.frame(product_id, p1, p2, p3)
df2 <- melt(df1, id.vars = "product_id",
measure.vars = grep("^p[0-9]", names(df1), value = TRUE),
variable.name = "p",
value.name = "p-active")
There are dozens of ggplot2 tutorials, but I'm feeling generous:
ggplot(df2,
#map columns to aesthetics:
aes(x = p, y = `p-active`,
#important to use a factor for discrete values:
fill = factor(product_id),
color = factor(product_id))) +
#summarize data:
stat_summary(fun.y = sum,
#the geom:
geom = "bar",
#positioning:
position = "dodge")
I'm not sure I understood exactly what you want, but I'll give it a try:
I changed the reshaping a bit, because it is not a good idea to use - in the name of a data frame column:
df2 <- melt(df1, id.vars = "product_id",
measure.vars = grep("^p[0-9]", names(df1), value = TRUE),
variable.name = "p",
value.name = "p_active")
The next step is to sum up the values in p_active per value for p and product_id:
library(dplyr)
df2_summed <- group_by(df2, product_id, p) %>%
summarise(p_active_summed = sum(p_active))
And finally, I create the plot:
library(ggplot2)
ggplot(df2_summed, aes(x = p, y = p_active_summed, fill = as.factor(product_id))) +
geom_col()