Related
I want to draw the same exact graph in R. However, I want to consider two options:
(1) with one x axis for each of the genders &
(2) two different xaxes for each of the gender. Here is also the link for where I found the image: https://rpubs.com/WhataBurger/Anovatype3
Thanks for sharing the knowledge.
Here is a randomly generated one. Please feel free to share your random data in the responses (if you have any).
Show in New Window
structure(list(gender = c("Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female"), education = c("Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Education", "Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education", "No Education", "No Education", "No Education",
"No Education"), salary = c(54395.2435344779, 57698.2251051672,
75587.0831414912, 60705.0839142458, 61292.8773516095, 77150.6498688328,
64609.162059892, 47349.3876539347, 53131.4714810647, 55543.3802990004,
72240.8179743946, 63598.1382705736, 64007.7145059405, 61106.8271594512,
54441.5886524592, 77869.1313680308, 64978.5047822924, 40333.8284337036,
67013.5590156369, 55272.0859227207, 49321.7629401315, 57820.250853417,
49739.9555169276, 52711.0877070886, 53749.6073215074, 54395.2435344779,
57698.2251051672, 75587.0831414912, 60705.0839142458, 61292.8773516095,
77150.6498688328, 64609.162059892, 47349.3876539347, 53131.4714810647,
55543.3802990004, 72240.8179743946, 63598.1382705736, 64007.7145059405,
61106.8271594512, 54441.5886524592, 77869.1313680308, 64978.5047822924,
40333.8284337036, 67013.5590156369, 55272.0859227207, 49321.7629401315,
57820.250853417, 49739.9555169276, 52711.0877070886, 53749.6073215074,
23253.2267570303, 33351.1481779781, 30613.4924713461, 25447.4522519522,
35015.2596842797, 31705.8568859073, 28819.7140680309, 33580.5026441801,
33512.5339501322, 33286.3243265499, 32754.5610164004, 32215.6706141504,
29752.3531576931, 28776.1493450403, 28478.1159959505, 27221.172084318,
29168.3308879216, 24938.4145937269, 38675.8238613541, 34831.84799322,
25507.5656671866, 28388.4606588037, 28133.3785855071, 33119.8604733453,
29666.5237341127, 23253.2267570303, 33351.1481779781, 30613.4924713461,
25447.4522519522, 35015.2596842797, 31705.8568859073, 28819.7140680309,
33580.5026441801, 33512.5339501322, 33286.3243265499, 32754.5610164004,
32215.6706141504, 29752.3531576931, 28776.1493450403, 28478.1159959505,
27221.172084318, 29168.3308879216, 24938.4145937269, 38675.8238613541,
34831.84799322, 25507.5656671866, 28388.4606588037, 28133.3785855071,
33119.8604733453, 29666.5237341127)), class = "data.frame", row.names = c(NA,
-100L))
Look at this code, it may help you to start. Your data it's not complete as all Education are male and all No Education are female, so you can't get a facet_wrap() with all categories. Anyway, I think this may be of help.
Once your variables charged, make a dataframe and analyse with ggplot:
library (ggplot2)
df <- data. Frame(education, gender, salary)
# plot 1
ggplot(df, aes(x = education, y = salary, fill=gender)) +
geom_boxplot() +
facet_wrap(.~gender) +
theme_bw()
# plot 2
ggplot(df, aes(x = education, y = salary, fill = gender)) +
geom_boxplot() +
theme_bw()
I have a character vector data frame and I would like to randomly generate pairs of names coming from this vector. My code gives the all combinations. But I want to generate all names should be paired with one time in random order; an item cannot be partner with itself.
My code is:
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish", "John", "ruban", "mary", "barath", "leema", "joshi", "indhu", "praveen", "joshua",
"alex", "martin", "stella", "veronica", "henry", "rajesh", "yusuf", "jenita", "johana", "jerald", "jegan", "lincy",
"jona", "rani", "julie", "ross", "chandler", "monica", "penny", "sheldon"),
"Sex" = c("Female", "Male", "Male", "male", "male", "Female", "male", "Female", "Female", "Female", "male",
"male", "male", "male", "Female", "Female", "male", "male", "male", "Female", "Female", "male",
"male", "Female", "Female", "Female", "Female", "male", "male", "Female", "Female", "male"),
"Number" = c(8937998889, 2598279874, 4589987483, 2876876877, 2876876876, 2487698798, 2879879877, 2887987897, 2878798733,
4309808098, 8748098990, 9883798798, 8734787987, 8973498787, 8734887877, 9798374877, 8786487687, 7275687263,
4379879847, 8943787876, 3874879874, 8978973987, 8978347878, 8839478768, 9378887774, 8467676764, 7246276874,
7478798743, 6576787877, 7328776876, 6648678833, 6378787878)
)
print(df)
# Accessing first and second column
cat("Accessing first and second column\n")
dat <- print(df[, 1])
t(combn(unique(dat,2)))
TIA
Get the unique elements from 'Name' column, sample it and convert to a matrix with 2 columns (assuming the length of unique elements are even)
matrix(sample(unique(df$Name)), ncol = 2)
I want to know how to filter a DataFrame to exclude specific and discrete dates.
# Input Data
dates = c("2021-03-31", "2021-05-02", "2021-06-30", "2021-10-22")
dates = as.Date(dates)
x = structure(list(Gender = c("Male", "Female", "Male", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Male", "Male", "Female",
"Female", "Female", "Male", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Male", "Male", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Male",
"Female", "Male", "Female", "Male", "Female", "Female", "Female",
"Male", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Female", "Female", "Male", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Male", "Female", "Female", "Female", "Female",
"Female", "Male", "Male", "Female", "Male", "Female", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Male",
"Female", "Female", "Male", "Female", "Female", "Female", "Female",
"Female"), `Termination Date` = c("2021-01-05", "2021-02-12",
"2021-02-22", "2021-02-24", "2021-03-12", "2021-03-12", "2021-03-24",
"2021-03-26", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31",
"2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31",
"2021-04-02", "2021-04-02", "2021-04-05", "2021-04-09", "2021-04-30",
"2021-05-05", "2021-05-11", "2021-05-11", "2021-05-14", "2021-05-21",
"2021-05-21", "2021-05-24", "2021-06-01", "2021-06-11", "2021-06-11",
"2021-06-14", "2021-06-24", "2021-06-27", "2021-06-27", "2021-06-27",
"2021-06-27", "2021-07-02", "2021-07-07", "2021-07-23", "2021-07-26",
"2021-07-26", "2021-07-27", "2021-07-30", "2021-08-02", "2021-08-06",
"2021-08-06", "2021-08-09", "2021-08-11", "2021-08-13", "2021-08-13",
"2021-08-13", "2021-08-13", "2021-08-16", "2021-08-18", "2021-08-20",
"2021-08-23", "2021-08-24", "2021-08-25", "2021-08-27", "2021-08-27",
"2021-08-30", "2021-08-30", "2021-08-31", "2021-09-01", "2021-09-03",
"2021-09-03", "2021-09-15", "2021-09-16", "2021-09-20", "2021-09-22",
"2021-09-23", "2021-09-23", "2021-09-24", "2021-09-24", "2021-10-01",
"2021-10-04", "2021-10-06", "2021-10-08", "2021-10-08", "2021-10-08",
"2021-10-11", "2021-10-14", "2021-10-19", "2021-10-20", "2021-10-21",
"2021-10-22", "2021-10-22", "2021-10-29", "2021-11-02", "2021-11-03",
"2021-11-08", "2021-11-09", "2021-11-16", "2021-11-16", "2021-11-17"
)), row.names = c(229L, 8247L, 3068L, 7222L, 3746L, 3912L, 8019L,
3610L, 6078L, 6085L, 6271L, 6284L, 6285L, 6310L, 6321L, 6335L,
6336L, 3697L, 9149L, 8217L, 3734L, 220L, 6729L, 5562L, 7729L,
7933L, 5291L, 7232L, 1647L, 7335L, 3418L, 7189L, 2912L, 7790L,
6088L, 6247L, 6281L, 6338L, 7608L, 6614L, 410L, 2746L, 8296L,
3117L, 177L, 2788L, 3301L, 6221L, 5173L, 2092L, 3577L, 6219L,
6973L, 9020L, 1274L, 1768L, 8218L, 1822L, 2499L, 8107L, 1910L,
4756L, 2739L, 7342L, 7857L, 6519L, 2104L, 3666L, 7506L, 2635L,
3402L, 5566L, 2637L, 3036L, 2976L, 3871L, 8376L, 3112L, 4772L,
6449L, 8200L, 8445L, 3310L, 4005L, 3219L, 8241L, 8266L, 2995L,
3273L, 8401L, 3336L, 3118L, 2272L, 3333L, 3370L, 3952L, 7339L
), class = "data.frame")
Normally, I would do the following but I assume it doesn't work in this case since I am using a date class. How would I do this using dates?
# Filter df to exclude rows that were entered on a date from the list
x[!(x$`Termination Date` %in% dates), ]
When I run your example data, I see the Termination Date column is interpreted as the character class, not the date class.
Here is a solution that uses the tidyverse:
# Input Data
dates = c("2021-03-31", "2021-05-02", "2021-06-30", "2021-10-22")
x = structure(list(Gender = c("Male", "Female", "Male", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Male", "Male", "Female",
"Female", "Female", "Male", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Male", "Male", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Male",
"Female", "Male", "Female", "Male", "Female", "Female", "Female",
"Male", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Female", "Female", "Male", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Male", "Female", "Female", "Female", "Female",
"Female", "Male", "Male", "Female", "Male", "Female", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Male",
"Female", "Female", "Male", "Female", "Female", "Female", "Female",
"Female"), `Termination Date` = c("2021-01-05", "2021-02-12",
"2021-02-22", "2021-02-24", "2021-03-12", "2021-03-12", "2021-03-24",
"2021-03-26", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31",
"2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31",
"2021-04-02", "2021-04-02", "2021-04-05", "2021-04-09", "2021-04-30",
"2021-05-05", "2021-05-11", "2021-05-11", "2021-05-14", "2021-05-21",
"2021-05-21", "2021-05-24", "2021-06-01", "2021-06-11", "2021-06-11",
"2021-06-14", "2021-06-24", "2021-06-27", "2021-06-27", "2021-06-27",
"2021-06-27", "2021-07-02", "2021-07-07", "2021-07-23", "2021-07-26",
"2021-07-26", "2021-07-27", "2021-07-30", "2021-08-02", "2021-08-06",
"2021-08-06", "2021-08-09", "2021-08-11", "2021-08-13", "2021-08-13",
"2021-08-13", "2021-08-13", "2021-08-16", "2021-08-18", "2021-08-20",
"2021-08-23", "2021-08-24", "2021-08-25", "2021-08-27", "2021-08-27",
"2021-08-30", "2021-08-30", "2021-08-31", "2021-09-01", "2021-09-03",
"2021-09-03", "2021-09-15", "2021-09-16", "2021-09-20", "2021-09-22",
"2021-09-23", "2021-09-23", "2021-09-24", "2021-09-24", "2021-10-01",
"2021-10-04", "2021-10-06", "2021-10-08", "2021-10-08", "2021-10-08",
"2021-10-11", "2021-10-14", "2021-10-19", "2021-10-20", "2021-10-21",
"2021-10-22", "2021-10-22", "2021-10-29", "2021-11-02", "2021-11-03",
"2021-11-08", "2021-11-09", "2021-11-16", "2021-11-16", "2021-11-17"
)), row.names = c(229L, 8247L, 3068L, 7222L, 3746L, 3912L, 8019L,
3610L, 6078L, 6085L, 6271L, 6284L, 6285L, 6310L, 6321L, 6335L,
6336L, 3697L, 9149L, 8217L, 3734L, 220L, 6729L, 5562L, 7729L,
7933L, 5291L, 7232L, 1647L, 7335L, 3418L, 7189L, 2912L, 7790L,
6088L, 6247L, 6281L, 6338L, 7608L, 6614L, 410L, 2746L, 8296L,
3117L, 177L, 2788L, 3301L, 6221L, 5173L, 2092L, 3577L, 6219L,
6973L, 9020L, 1274L, 1768L, 8218L, 1822L, 2499L, 8107L, 1910L,
4756L, 2739L, 7342L, 7857L, 6519L, 2104L, 3666L, 7506L, 2635L,
3402L, 5566L, 2637L, 3036L, 2976L, 3871L, 8376L, 3112L, 4772L,
6449L, 8200L, 8445L, 3310L, 4005L, 3219L, 8241L, 8266L, 2995L,
3273L, 8401L, 3336L, 3118L, 2272L, 3333L, 3370L, 3952L, 7339L
), class = "data.frame")
library(dplyr)
df_without_undesired_dates <- x %>%
filter(!`Termination Date` %in% dates)
I have data.table data to create a stacked chart and with grouping using below code:
causesDf <- causesDf[, c('Type', 'Gender', 'Total')]
causesSort <- causesDf[, lapply(.SD, sum),
by=list(causesDf$Type, causesDf$Gender)]
and Data will be like below:
causesDf causesDf.1 Total
1: Illness (Aids/STD) Female 2892
2: Change in Economic Status Female 4235
3: Cancellation/Non-Settlement of Marriage Female 6126
4: Family Problems Female 133181
5: Illness (Aids/STD) Male 5831
6: Change in Economic Status Male 31175
7: Cancellation/Non-Settlement of Marriage Male 5170
and so on..
I am trying to make barplot like below:
barpos <- barplot(sort(causesSort$Total, decreasing=TRUE),
col=c("red","green"), xlab="", ylab="",
horiz=FALSE, las=2)
legend("topright", c("Male","Female"), fill=c("red","green"))
end_point <- 0.2 + nrow(causesSort) + nrow(causesSort) - 0.1
text(seq(0.1, end_point, by=1), par("usr")[3] - 30,
srt=60, adj= 1, xpd=TRUE,
labels=paste(causesSort$causesDf), cex=0.65)
but X-labels are not aligning properly, did I miss anything?
Expected output like:
Edited:
causesSort
structure(list(causesDf = c("Illness (Aids/STD)", "Change in Economic Status",
"Cancellation/Non-Settlement of Marriage", "Physical Abuse (Rape/Incest Etc.)",
"Dowry Dispute", "Family Problems", "Ideological Causes/Hero Worshipping",
"Other Prolonged Illness", "Property Dispute", "Fall in Social Reputation",
"Illegitimate Pregnancy", "Failure in Examination", "Insanity/Mental Illness",
"Love Affairs", "Professional/Career Problem", "Divorce", "Drug Abuse/Addiction",
"Not having Children(Barrenness/Impotency", "Causes Not known",
"Unemployment", "Poverty", "Death of Dear Person", "Cancer",
"Suspected/Illicit Relation", "Paralysis", "Property Dispute",
"Unemployment", "Poverty", "Family Problems", "Illness (Aids/STD)",
"Drug Abuse/Addiction", "Other Prolonged Illness", "Death of Dear Person",
"Causes Not known", "Cancer", "Not having Children(Barrenness/Impotency",
"Cancellation/Non-Settlement of Marriage", "Paralysis", "Physical Abuse (Rape/Incest Etc.)",
"Professional/Career Problem", "Love Affairs", "Fall in Social Reputation",
"Dowry Dispute", "Ideological Causes/Hero Worshipping", "Illegitimate Pregnancy",
"Failure in Examination", "Change in Economic Status", "Insanity/Mental Illness",
"Divorce", "Suspected/Illicit Relation", "Not having Children (Barrenness/Impotency",
"Not having Children (Barrenness/Impotency"), causesDf.1 = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Female", "Male"), Total = c(2892,
4235, 6126, 2662, 31206, 133181, 776, 69072, 4601, 4697, 2391,
12054, 33352, 21339, 1596, 2535, 1205, 5523, 148134, 3748, 7905,
4707, 2878, 8093, 2284, 14051, 23617, 24779, 208771, 5831, 28841,
125493, 5614, 304985, 6180, 2299, 5170, 5002, 1330, 10958, 23700,
8767, 764, 1342, 103, 14951, 31175, 60877, 1598, 6818, 544, 222
)), row.names = c(NA, -52L), class = c("data.table", "data.frame"
)
# , .internal.selfref = <pointer: 0x00000000098d1ef0> # seems not to work
)
If you don't rely on 45° rotation (that one is a bit more tricky) you could use this solution.
First we need to reshape the data by sex.
library(reshape2)
df2 <- dcast(causesSort, ... ~ causesDf.1 , value.var="Total")
Then we generate rownames from the type column and delete this column.
rownames(df2) <- df2[, 1]
df2 <- df2[, -1]
Then we order the data by one column, e.g. by Female.
df2 <- df2[order(-df2$Female), ]
The labels are the rownames.
# labs <- rownames(df2)
However, since they are very long (and bad for the reader's eye!), we may have to think of shorter ones. A workaround is to shorten them a little.
labs <- substr(sapply(strsplit(rownames(df2), " "),
function(x) x[1]), 1, 8)
Now we are able to apply barplot().
pos <- barplot(t(df2), beside=TRUE, xaxt="n",
col=c("#3C6688", "#45A778"), border="white")
pos gives us a matrix of bar positions, because we have a grouped plot we need the column means. We can use it to plot the axis.
axis(1, colMeans(pos), labs, las=2)
Result
Here is ggplot2 solution. This may provide better control over the final output
library(dplyr)
library(ggplot2)
#Rename columns names
names(causesDf) <- c('Type', 'Gender', 'Total')
#sort male before females
causesDf$Gender<-factor(causesDf$Gender, levels=c("Male", "Female"), ordered=TRUE)
#sort types by total sum and sort in decreasing order
sorted<-causesDf %>% group_by(Type) %>% summarize(gtotal=sum(Total)) %>% arrange(desc(gtotal))
causesDf$Type<-factor(causesDf$Type, levels=sorted$Type, ordered=TRUE)
#plot graph
g<-ggplot(causesDf, aes(x=Type, y=Total, group=Gender, fill=Gender)) +
geom_col(position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust=1)) +
scale_fill_manual(values = alpha(c("blue", "green"), .5))
print(g)
This question already has answers here:
facet_wrap add geom_hline
(2 answers)
Closed 5 months ago.
I am quite a newbie to RStudio and I am having problems adding different vertical lines on each of my two facets using facet_wrap
Here is what I thought:
library(ggplot2)
g <- ggplot(dat, aes(x=index, na.rm= TRUE))
d <- g+ geom_density() + facet_wrap(~gender)
vline.data <- data.frame(z = c(2.36,2.48),gender = c("Female","Male"))
d1 <- d + geom_vline(aes(xintercept = z),vline.data)
But it adds the same two lines to each facet - what would you reckon is the problem? I have thought of somehow splitting the facets into two separate data frames, but I have no idea how to go on about it.
P.S The x-axis (the index) goes from 1 to 4.
Thank you in advance.
index <- c(NA, NA, 4, 4, 4, NA, NA, NA, NA, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, NA, 3, NA, NA, 3, 4, 4, 4, 4, 4, 3, 4, 3, 4, 3, NA, 4, 2,
4, 4, 2, 2, NA, 2, 3, 3, 2, 2, NA, NA, 2)
gender <- c("Female", "Female", "Male", "Male", "Male", "Female", "Female",
"Male", "Female", "Male", "Female", "Male", "Female", "Male",
"Male", "Female", "Female", "Female", "Male", "Female", "Female",
"Male", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Female", "Male", "Female", "Female", "Female", "Male",
"Male", "Female", "Male", "Male", "Female", "Female", "Male",
"Male", "Female", "Female", "Male", "Male", "Male", "Male", "Male")
Using the code and data above I get this plot
This was asked a long time ago, but here's the answer:
You need to add geom_vline to your ggplot. geom_vline takes a single value, not a vector, and it doesn't iterate, so you can't have multiple values in a separate data object from the rest of your data. Merge vline.data as a new column with the rest of dat as is appropriate for your data set. This question doesn't have reproducible data, but put index,gender and however you calculate vline.data into a single dat dataframe with each of those pieces as column names. Then you can summarise the vline.data. The example below assumes the values within vline.data are the mean values for the entire index column.
p <- ggplot(dat, aes(x=index, na.rm= TRUE)) +
geom_density() +
facet_wrap(. ~ gender) +
geom_vline(data = . %>% group_by(gender) %>% summarise(vl=mean(index)),
aes(xintercept=vl))
How to Add Lines With A Facet R