Related
Code:
library(plotly)
library(tidyverse)
df <- data.frame(protein = c("Chicken", "Beef", "Pork", "Fish",
"Chicken", "Beef", "Pork", "Fish"),
y1 = c(3, 24, 36, 49, 7, 15, 34, 49),
y2 = c(9, 28, 40, 47, 8, 20, 30, 40 ),
gender = c("Male", "Male", "Male", "Male",
"Female", "Female", "Female", "Female"))
df %>%
plot_ly() %>%
add_bars (y = ~y1, x = ~protein,
name = 'y1.male') %>% add_bars(y = ~y2,
x=~protein, color = I("green"),name = "y2.male")%>%
add_bars(y = ~y1, x = ~protein, color = I("black"),
name = 'y1.female') %>% add_bars(y = ~y2,
x=~protein, color = I("red"), name = "y2.female")
My desired result is to create something similar to this:
However when you run the code, you'll see that it has stacked the "Male" and "Female" values in each bar. I would like "y1.male" to represent the "Male" data when y = y1, "y2.male" to represent the "Male" data when y = y2, "y1.female" to represent the "Female" data when y = y1, and "y2.female" to represent the "Female" data when y = y2, respectively. How can I go about doing this without having to use filter by "transforms" in r-plotly?
We can rearrange the data to be in long format and then plot it:
df %>%
pivot_longer(cols = c(y1, y2)) %>%
unite(gender_var, c(gender, name)) %>%
plot_ly() %>%
add_bars (x = ~protein, y = ~value,
name = ~gender_var)
I have a column in my dataframe where gender is coded 1 and 0 for male and female respectively. It's not a replica, but looks something like this:
df <- read.csv("df.csv")
" Gender Age Width
1 0 35 1.4
2 0 30 1.4
3 1 32 1.3
4 1 31 1.5
5 0 36 1.4
6 1 39 1.7 "
I've managed to change the class type of it to factor and gave it labels:
df$Gender <- as.factor(df$Gender)
class(df$Gender)
df$Gender <- factor(df$Gender,
levels = c("1","0"),
labels = c("male", "female"))
However, when I try to print df$Gender, I get all "NA" as my output
UPDATE:
Thank you all for your help!
I realised that my code works when I run it the first time. It only becomes "NA" when I rerun the second chunk. Will this be a problem or can I just ignore it?
You can use
library(tidyverse)
df %>%
mutate(gender = factor(Gender, labels = c("male", "female")))
or simply
df$gender <- ifelse(df$Gender == 1,"male","female")
or
df %>%
mutate(gender = if_else(Gender == 1,"male","female"))
or
df %>%
mutate(gender = case_when(Gender == 1 ~ "male",
Gender == 0 ~ "female"))
Data
df = structure(list(Sn = 1:6, Gender = c(0L, 0L, 1L, 1L, 0L, 1L),
Age = c(35L, 30L, 32L, 31L, 36L, 39L), Width = c(1.4, 1.4,
1.3, 1.5, 1.4, 1.7), gender = c("female", "female", "male",
"male", "female", "male")), row.names = c(NA, -6L), class = "data.frame")
Everything what you have done seems correct, when I reconstruct your input
Gender <- c(1,0,1,0)
Age <- c(50,30,40,30)
df <- data.frame(Gender,Age)
df$Gender <- factor(df$Gender,
levels = c("1","0"),
labels = c("male", "female"))
print(df$Gender)
if you really want is as character you can then add:
df$Gender <- as.character(df$Gender)
But I think (as others already mentioned) its because of your input data, therefore try to add stringasfactors to your import command:
df <- read.csv("df.csv", stringsAsFactors = FALSE)
Problem:
I can't find the right way to make a plot with values from a given variable with points and plot the value of the mean with another different shape. So far I find a way of doing this, but mean value appears in the color legend also which is something I don't want to. How could I get the desired output? Should I use stat_summary?
NOTE: Variables must be ordered by the mean value among groups by multimorbidity (if it is something important for the solution proposed) this is why I am using reorder_within and scale_x_reordered.
source("https://raw.githubusercontent.com/dgrtwo/drlib/master/R/reorder_within.R")
library(tidyverse)
foo %>%
group_by(multimorbidity, variables) %>%
mutate(Mean = mean(varimportance),
aux_mean = Mean) %>%
ungroup() %>%
spread(Gender, varimportance) %>%
gather(Gender, varimportance, -multimorbidity, -variables, -aux_mean) %>%
mutate(type = if_else(Gender %in% c("Male", "Female"), "Gender", "Mean")) %>%
ggplot(aes(reorder_within(variables, aux_mean, multimorbidity), varimportance,
color = Gender, shape = type)) +
geom_point() +
scale_x_reordered() +
scale_shape_manual(values = c(21, 24)) +
coord_flip() +
facet_wrap(multimorbidity~., scales = "free")
Created on 2019-03-20 by the reprex package (v0.2.1)
The desired output:
dput for foo:
foo <- structure(list(
Gender = c(
"Male", "Male", "Male", "Male", "Male",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male"
), multimorbidity = c(
"Yes", "Yes", "Yes", "Yes", "Yes",
"No", "No", "No", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes",
"No", "No", "No", "No", "No"
), variables = c(
"bmi", "income",
"soccap", "alternattr", "occhaz", "bmi", "income", "soccap",
"alternattr", "occhaz", "bmi", "income", "soccap", "alternattr",
"occhaz", "bmi", "income", "soccap", "alternattr", "occhaz"
),
varimportance = c(
73.1234145437324, 51.0029811829917, 100,
0, 90.9926659603591, 81.1949541852942, 48.2402164701156,
100, 0, 9.10509052698692, 66.7759248406279, 31.69991730502,
100, 4.7914221037359, 93.4636133674693, 70.8853809607131,
75.004433319282, 100, 0, 43.7326141975936
)
), class = c(
"tbl_df",
"tbl", "data.frame"
), row.names = c(NA, -20L))
I have data.table data to create a stacked chart and with grouping using below code:
causesDf <- causesDf[, c('Type', 'Gender', 'Total')]
causesSort <- causesDf[, lapply(.SD, sum),
by=list(causesDf$Type, causesDf$Gender)]
and Data will be like below:
causesDf causesDf.1 Total
1: Illness (Aids/STD) Female 2892
2: Change in Economic Status Female 4235
3: Cancellation/Non-Settlement of Marriage Female 6126
4: Family Problems Female 133181
5: Illness (Aids/STD) Male 5831
6: Change in Economic Status Male 31175
7: Cancellation/Non-Settlement of Marriage Male 5170
and so on..
I am trying to make barplot like below:
barpos <- barplot(sort(causesSort$Total, decreasing=TRUE),
col=c("red","green"), xlab="", ylab="",
horiz=FALSE, las=2)
legend("topright", c("Male","Female"), fill=c("red","green"))
end_point <- 0.2 + nrow(causesSort) + nrow(causesSort) - 0.1
text(seq(0.1, end_point, by=1), par("usr")[3] - 30,
srt=60, adj= 1, xpd=TRUE,
labels=paste(causesSort$causesDf), cex=0.65)
but X-labels are not aligning properly, did I miss anything?
Expected output like:
Edited:
causesSort
structure(list(causesDf = c("Illness (Aids/STD)", "Change in Economic Status",
"Cancellation/Non-Settlement of Marriage", "Physical Abuse (Rape/Incest Etc.)",
"Dowry Dispute", "Family Problems", "Ideological Causes/Hero Worshipping",
"Other Prolonged Illness", "Property Dispute", "Fall in Social Reputation",
"Illegitimate Pregnancy", "Failure in Examination", "Insanity/Mental Illness",
"Love Affairs", "Professional/Career Problem", "Divorce", "Drug Abuse/Addiction",
"Not having Children(Barrenness/Impotency", "Causes Not known",
"Unemployment", "Poverty", "Death of Dear Person", "Cancer",
"Suspected/Illicit Relation", "Paralysis", "Property Dispute",
"Unemployment", "Poverty", "Family Problems", "Illness (Aids/STD)",
"Drug Abuse/Addiction", "Other Prolonged Illness", "Death of Dear Person",
"Causes Not known", "Cancer", "Not having Children(Barrenness/Impotency",
"Cancellation/Non-Settlement of Marriage", "Paralysis", "Physical Abuse (Rape/Incest Etc.)",
"Professional/Career Problem", "Love Affairs", "Fall in Social Reputation",
"Dowry Dispute", "Ideological Causes/Hero Worshipping", "Illegitimate Pregnancy",
"Failure in Examination", "Change in Economic Status", "Insanity/Mental Illness",
"Divorce", "Suspected/Illicit Relation", "Not having Children (Barrenness/Impotency",
"Not having Children (Barrenness/Impotency"), causesDf.1 = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Female", "Male"), Total = c(2892,
4235, 6126, 2662, 31206, 133181, 776, 69072, 4601, 4697, 2391,
12054, 33352, 21339, 1596, 2535, 1205, 5523, 148134, 3748, 7905,
4707, 2878, 8093, 2284, 14051, 23617, 24779, 208771, 5831, 28841,
125493, 5614, 304985, 6180, 2299, 5170, 5002, 1330, 10958, 23700,
8767, 764, 1342, 103, 14951, 31175, 60877, 1598, 6818, 544, 222
)), row.names = c(NA, -52L), class = c("data.table", "data.frame"
)
# , .internal.selfref = <pointer: 0x00000000098d1ef0> # seems not to work
)
If you don't rely on 45° rotation (that one is a bit more tricky) you could use this solution.
First we need to reshape the data by sex.
library(reshape2)
df2 <- dcast(causesSort, ... ~ causesDf.1 , value.var="Total")
Then we generate rownames from the type column and delete this column.
rownames(df2) <- df2[, 1]
df2 <- df2[, -1]
Then we order the data by one column, e.g. by Female.
df2 <- df2[order(-df2$Female), ]
The labels are the rownames.
# labs <- rownames(df2)
However, since they are very long (and bad for the reader's eye!), we may have to think of shorter ones. A workaround is to shorten them a little.
labs <- substr(sapply(strsplit(rownames(df2), " "),
function(x) x[1]), 1, 8)
Now we are able to apply barplot().
pos <- barplot(t(df2), beside=TRUE, xaxt="n",
col=c("#3C6688", "#45A778"), border="white")
pos gives us a matrix of bar positions, because we have a grouped plot we need the column means. We can use it to plot the axis.
axis(1, colMeans(pos), labs, las=2)
Result
Here is ggplot2 solution. This may provide better control over the final output
library(dplyr)
library(ggplot2)
#Rename columns names
names(causesDf) <- c('Type', 'Gender', 'Total')
#sort male before females
causesDf$Gender<-factor(causesDf$Gender, levels=c("Male", "Female"), ordered=TRUE)
#sort types by total sum and sort in decreasing order
sorted<-causesDf %>% group_by(Type) %>% summarize(gtotal=sum(Total)) %>% arrange(desc(gtotal))
causesDf$Type<-factor(causesDf$Type, levels=sorted$Type, ordered=TRUE)
#plot graph
g<-ggplot(causesDf, aes(x=Type, y=Total, group=Gender, fill=Gender)) +
geom_col(position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust=1)) +
scale_fill_manual(values = alpha(c("blue", "green"), .5))
print(g)
This question already has answers here:
facet_wrap add geom_hline
(2 answers)
Closed 5 months ago.
I am quite a newbie to RStudio and I am having problems adding different vertical lines on each of my two facets using facet_wrap
Here is what I thought:
library(ggplot2)
g <- ggplot(dat, aes(x=index, na.rm= TRUE))
d <- g+ geom_density() + facet_wrap(~gender)
vline.data <- data.frame(z = c(2.36,2.48),gender = c("Female","Male"))
d1 <- d + geom_vline(aes(xintercept = z),vline.data)
But it adds the same two lines to each facet - what would you reckon is the problem? I have thought of somehow splitting the facets into two separate data frames, but I have no idea how to go on about it.
P.S The x-axis (the index) goes from 1 to 4.
Thank you in advance.
index <- c(NA, NA, 4, 4, 4, NA, NA, NA, NA, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, NA, 3, NA, NA, 3, 4, 4, 4, 4, 4, 3, 4, 3, 4, 3, NA, 4, 2,
4, 4, 2, 2, NA, 2, 3, 3, 2, 2, NA, NA, 2)
gender <- c("Female", "Female", "Male", "Male", "Male", "Female", "Female",
"Male", "Female", "Male", "Female", "Male", "Female", "Male",
"Male", "Female", "Female", "Female", "Male", "Female", "Female",
"Male", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Female", "Male", "Female", "Female", "Female", "Male",
"Male", "Female", "Male", "Male", "Female", "Female", "Male",
"Male", "Female", "Female", "Male", "Male", "Male", "Male", "Male")
Using the code and data above I get this plot
This was asked a long time ago, but here's the answer:
You need to add geom_vline to your ggplot. geom_vline takes a single value, not a vector, and it doesn't iterate, so you can't have multiple values in a separate data object from the rest of your data. Merge vline.data as a new column with the rest of dat as is appropriate for your data set. This question doesn't have reproducible data, but put index,gender and however you calculate vline.data into a single dat dataframe with each of those pieces as column names. Then you can summarise the vline.data. The example below assumes the values within vline.data are the mean values for the entire index column.
p <- ggplot(dat, aes(x=index, na.rm= TRUE)) +
geom_density() +
facet_wrap(. ~ gender) +
geom_vline(data = . %>% group_by(gender) %>% summarise(vl=mean(index)),
aes(xintercept=vl))
How to Add Lines With A Facet R