I want to have two charts containing multiple horizontal bar graphs, each showing mean values of one of the two variables: fear and expectation. The bar graphs should be grouped by the dummies.
I have created single bar graphs with the mean values of fear and expectation grouped by each of the dummies but I don't know how to combine them properly.
x = data.frame(
id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 0, 1),
migration = c(0, 1, 0, 1, 0),
handicap = c(0, 1, 1, 1, 0),
east = c(0, 1, 1, 1, 0),
fear = c(1, 3, 4, 6, 3),
expectation = c(2, 3, 2, 5, 4))
I want to have it look like this basically:
https://ibb.co/3fz0GQ4
Any help would be greatly appreciated.
TO get to the plot you show, you will need to reshape a bit your data:
library(tidyverse)
x2 <- x%>%
gather(fear, expectation, key = "group", value = "value")%>%
gather(sex, migration, handicap, east, key = "dummies", value = "dum_value")%>%
group_by(group, dummies, dum_value)%>%
summarize(prop = mean(value))
Then you can easily get to the plot:
x2%>%
ggplot(aes(y= prop, x = dummies, fill = factor(dum_value)))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()+
facet_wrap(~group)
Related
Trying to make a stacked histogram, but it just comes out grey, with no stacking. I don't understand what is different from all the examples on here, or the built in 'iris' example, unless using time as x variable is a problem.
I have a big df, in long format, cut down to 25 rows and named 'mini' for this example:
> dput(mini)
structure(list(maxdep = c(203.9540564, 212.9573869, 13.45896065,
209.961431, 162.9633891, 13.97961439, 85.48389032, 102.4905817,
100.0035986, 88.02608837, 89.02947373, 22.0301996, 20.03060219,
19.03098037, 29.03141345, 13.03170014, 82.0328164, 55.03384725,
15.03437183, 17.53463412, 37.5352136, 70.03588457, 90.53687883,
91.53861116, 10.03902594), st_time = structure(c(1633321800,
1633328510, 1633331050, 1633331285, 1633334080, 1633347960, 1633348185,
1633355115, 1633279830, 1633298825, 1633301480, 1633302985, 1633303300,
1633303600, 1633303825, 1633304280, 1633304430, 1633305635, 1633306445,
1633306610, 1633306890, 1633307310, 1633307960, 1633309380, 1633310320
), class = c("POSIXct", "POSIXt"), tzone = ""), dbin = c(2, 2,
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1)), row.names = c(NA, 25L), class = "data.frame")
the code is simple:
gg3 <- ggplot(data = mini, aes(x = st_time, fill = dbin)) #
gg3 <- gg3 + geom_histogram(position = "stack", binwidth = 3600) # gives hourly columns in histogram
gg3
this should plot the start time of the data on the x axis - correct, against the count on y - correct and stack in colour by dbin value (e.g. 1 through 5) - producing 5 colours of histogram stacked on top of each other (only two are present in the sample data above).
Instead I get one grey plot of all data (25 count total). please help me understand what is wrong
You can change dbin to a factor:
mini %>%
ggplot(aes(x = st_time, fill = as.factor(dbin) )) +
geom_histogram(position = "stack", binwidth = 3600)
This is the data to take up as a reference
df <- data.frame(a = c(3,3,3,3,3,2,2,3,2,1,1,1,3,1,3), b = c(1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2), c = c(4, 5, 3, 2, 4, 2, 3, 4, 5, 4, 4, 3, 3, 1, 2) )
I want to draw a bargraph with the proportion of a for each facet. At the same time I want the bars to be colored according to the b value.
The variable b is not relevant for calculating the percentage. This is what I came up with, when I set the fill = c, it divides the stacked color in two, one corresponding to 1, and the other as NA.
ggplot(aes(x = a, y = ...prop..., group = 1, fill = b)) +
geom_bar(position = "stack") +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c")
how can I have a result similar to this one but with the proportions of a for each facet wrap instead of the absolute values?
Thank you!
Here's an approach using the ..count.. and ..PANEL.. special symbols:
ggplot(df, aes(x = a, fill = as.factor(b))) +
geom_bar(aes(y = ..count.. / tapply(..count..,..PANEL..,sum)[..PANEL..])) +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c", fill = "b", y = "Proportion")
If you weren't using facet_wrap this would be trivial by setting y = ..prop... However, ..prop.. is not caculated properly by facet. So, to get around this problem, we can use tapply and the ..PANEL.. special symbol to sum ..count.. only for that panel. The last [..PANEL..] is to subset the resulting vector.
The other issue you had was that b is class numeric, so you need to convert that to factor.
I have a data.frame that looks like this:
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1))
> df
mean_swd afd_pot union_pot spd_pot fdp_pot green_pot linke_pot
1 4.0000 0 0 0 0 0 1
2 5.3333 1 1 1 1 1 0
3 6.3333 0 1 0 1 0 1
4 5.6666 0 1 0 0 1 1
5 3.6666 1 1 1 0 1 1
The pot variables represent a potential (1) or no potential (0) to vote for a party, mean_swd stands for a mean score on an attitude scale (from 1-7), the rows represent individuals.
I want produce a grouped barplot using ggplot2 that actually puts several barplots into one plot. It should plot the mean of mean_swd against the 6 pot variables separately, so that I can compare the mean scores on mean_swd for the individual groups of persons for which ..._pot == 1 (additionally, but not necessarily, grouping by the levels of these variables (1/0), so that I can compare mean_swd between those that have a potential of voting for that party vs those that don't).
As I don't have a single categorical variable by which to group I can't figure out how to code this and haven't found any solutions to the problem. The grouping solutions I found all work with single categorical variables for grouping. But I can't transform these six variables into one, as these potentials are not exclusive. The seperate barplots thus need to be calculated with varying individual observations. I also thought about grouping by boolean expressions but couldn't find any sources for this.
Any suggestions? Thank you in advance. Also feel free to criticize the presentation of my problem, as this is my first posting ever.
Welcome to stackoverflow!
Are you looking for something like this? Is this going in the right direction?
library(magrittr)
library(dplyr)
library(reshape2)
library(ggplot2)
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1))
dat <- df %>%
melt(id.vars = "mean_swd") %>%
group_by(variable, value) %>%
summarise(mean = mean(mean_swd))
dat$value %<>% as.factor()
ggplot(dat, aes(variable, mean, fill = value)) + geom_col()
Is this what you are after? Feel free to clarify. I'm not sure if you'd rather have one that counts 1s and 0s and plots that against the average though.
df <- data.frame(mean_swd = c(4.0000, 5.3333, 6.3333, 5.6666, 3.6666),
afd_pot = c(0, 1, 0, 0, 1),
union_pot = c(0, 1, 1, 1, 1),
spd_pot = c(0, 1, 0, 0, 1),
fdp_pot = c(0, 1, 1, 0, 0),
green_pot = c(0, 1, 0, 1, 1),
linke_pot = c(1, 0, 1, 1, 1),
Group = c(1,2,3,4,5))
df1 <- gather(df, key = variables, value = value, mean_swd:linke_pot)
ggplot(df1, aes(x = variables, y = value, fill = factor(Group))) +
facet_wrap(~Group) +
geom_bar(stat = "identity", color = "black", position = position_dodge()) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(fill = "Groups")
I have dataframe that contains data on the number of TVs and radios owned by survey respondents now and before:
DF <- data.frame(TV_now = as.numeric(c(4, 9, 1, 0, 4, NA)),
TV_before = as.numeric(c(4, 1, 2, 4, 5, 2)),
Radio_now = as.numeric(c(4, 5, 1, 5, 6, 9)),
Radio_before = as.numeric(c(6, 5, 3, 6, 7, 10)))
I want to sum the total value of each variable and then create a barplot that shows the number of TVs and radios owned by survey respondents now and before.
I can manually create a new dataframe that contains just the sum of the value of each variable in the original DF
DFsum <- data.frame(TV_now = as.numeric(c(sum(DF$TV_now,na.rm = TRUE))),
TV_before = as.numeric(c(sum(DF$TV_before,na.rm = TRUE))),
Radio_now = as.numeric(c(sum(DF$TV_now,na.rm = TRUE))),
Radio_before = as.numeric(c(sum(DF$Radio_before,na.rm = TRUE))))
and then use tidyr to do the following:
library(tidyr)
library(ggplot2)
DFsum %>%
gather(key=Device, value=Number) %>%
ggplot(aes(x=Number,fill=Device)) +
geom_bar(aes(x = Device, y = Number), position = "dodge", stat = "identity")
This gives me the result I want, but seems unnecessarily complicated for what should be easy to achieve. Is there an easier way to plot this?
You can simplify your code with use of dplyr::mutate_all since you are summarizing all your columns:
library(tidyverse)
library(ggplot2)
DF %>% mutate_all(funs(sum), na.rm = TRUE) %>%
gather(key=Device, value=Number) %>%
ggplot(aes(x=Device,fill=Device)) +
geom_bar(aes(x = Device, y = Number), position = "dodge", stat = "identity")
Simplify data creation. R knows that 4, 9, 1, etc., are numbers, you don't need as.numeric.
DF <- data.frame(TV_now = c(4, 9, 1, 0, 4, NA),
TV_before = c(4, 1, 2, 4, 5, 2),
Radio_now = c(4, 5, 1, 5, 6, 9),
Radio_before = c(6, 5, 3, 6, 7, 10))
Simplify the data manipulation. Tidy your data (convert it to long format) first, then do other things:
DF_long = gather(DF, key = "device") %>%
group_by(device) %>%
summarize(number = sum(value, na.rm = TRUE))
Simplify the plotting. Aesthetics are inherited - you don't need to specify them multiple times. geom_col is preferred to geom_bar with stat = "identity". position = "dodge" does nothing when there is one group per x index.
ggplot(aes(x = device, y = number, fill = device)) +
geom_col()
I generally prefer to do my own data manipulation, but we can also lean on ggplots stacking bars to replace the summing, making the entire code:
gather(DF, key = "device", value = "number") %>%
ggplot(aes(x = device, y = number, fill = device)) +
geom_col()
Base approach
dev = colSums(DF, na.rm = TRUE)
barplot(dev, col = factor(names(dev)))
I have a list of time-ordered pairwise interactions. I want to plot a temporal network of these interactions, which would look something like the diagram below.
My data looks like the example below. The id1 and id2 values are the unique identifiers of individuals. The time indicates when an interaction betweens those individuals occurred. So at time = 1, I want to plot a connection between individual-1 and individual-2.
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
According to this StackOverflow question, I can see that it is possible to draw vertical lines between positions on the y-axis in ggplot. This is achieved by reshaping the data into a long format. This is fine when there is only one pair per time value, but not when there is more than one interacting pair at a time. For example in my dummy data, at time = 2, there are three pairs (in the plot I would show these by overlaying lines with reduced opacity).
My question is, how can I organise these data in a way that ggplot will be able to plot potentially multiple interacting pairs at specified time points?
I have been trying to reorganise the data by assigning an extra identifier to each of the multiple pairs that occur at the same time. I imagined the data table to look like this, but I haven't figure out how to make this in R... In this example the three interactions at time = 2 are identified by an extra grouping of either 1, 2 or 3. Even if I could arrange this I'm still not sure how I would get ggplot to read it.
Ultimately I'm trying to create someting that looks like Fig. 2 in this scientific paper.
Any help would be appreciated!
You can do this without reshaping the data, just set one id to y and the other id to yend in geom_curve:
ggplot(df, aes(x = time, y = id1)) +
geom_curve(aes(xend = time, yend = id2), curvature = 0.3) +
geom_hline(yintercept = 1:7, colour = scales::muted("blue")) +
geom_point(size = 3) +
geom_point(aes(y = id2), size = 3) +
coord_cartesian(xlim = c(0, max(df$time) + 1)) +
theme_bw()
Output:
Libraries:
library('ggplot2')
library('data.table')
Data:
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
setDT(df)
df1 <- melt.data.table( df, id.vars = c('time'))
Plot:
p <- ggplot( df1, aes(time, value)) +
geom_point() +
geom_curve( mapping = aes(x = time, y = id1, xend = time, yend = id2, colour = "curve"),
data = df,
curvature = 0.2 )
print(p)