R replace minimum date values per group - r
I have a df with observations on different groups for a year. However, date of the first observation can differ slightly per group (generally within the first days of the year). I'm planning to show these groups in one lineplot and I want them all to start on "2021-01-01".
How can I recode my date variable as such that the first occurrence (min(Date)?) per group is set to "2021-01-01"?
Here is a small subset, with the X, Y, Z having different starting dates. Thanks!
structure(list(Date = structure(c(18637, 18644, 18651, 18658,
18665, 18672, 18679, 18686, 18693, 18700, 18707, 18714, 18721,
18728, 18735, 18636, 18643, 18651, 18656, 18665, 18672, 18676,
18686, 18693, 18700, 18707, 18714, 18720, 18727, 18735, 18635,
18643, 18649, 18658, 18662, 18670, 18677, 18684, 18692, 18700,
18707, 18713, 18718, 18728, 18735), class = "Date"), Maand = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("jan",
"feb", "mrt", "apr", "mei", "jun", "jul", "aug", "sep", "okt",
"nov", "dec"), class = c("ordered", "factor")), UPV2 = c(339L,
69L, 59L, 48L, 77L, 95L, 54L, 61L, 99L, 95L, 67L, 71L, 54L, 98L,
98L, 8L, 6L, 11L, 7L, 15L, 7L, 5L, 4L, 22L, 13L, 4L, 5L, 14L,
14L, 7L, 6L, 7L, 8L, 13L, 2L, 9L, 9L, 13L, 4L, 9L, 8L, 8L, 4L,
14L, 4L), VAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L), .Label = c("X", "Y", "Z"), class = "factor")), row.names = c(NA,
-45L), groups = structure(list(VAR = structure(1:3, .Label = c("X",
"Y", "Z"), class = "factor"), .rows = structure(list(1:15, 16:30,
31:45), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
This solution with dplyr (and lubridate) will target every occurrence of the minimum Date for each group, and replace it with your common starting date of DEFAULT_DATE. As of my recent revision, it will also update the custom month abbreviation in Maand.
library(dplyr)
library(lubridate)
# ...
# Code to generate your data.frame "df".
# ...
DEFAULT_DATE <- as.Date("2021-01-01")
df <- df %>%
group_by(VAR) %>%
mutate(# Update the custom month abbreviation for every "min(Date)" in each group.
Maand = if_else(Date == min(Date),
# Pick out the corresponding level of the factor.
ordered(levels(Maand)[month(DEFAULT_DATE)], levels = levels(Maand)),
Maand),
# Replace every "min(Date)" in each group.
Date = if_else(Date == min(Date), DEFAULT_DATE, Date)) %>%
ungroup()
Keep in mind that most of the complication here arises from your custom abbreviations for month names, as factorized (with ordering) in the Maand column.
Fortunately, my revised solution addresses this challenge. If a new group "A" were added to the mix, and its earliest Date were 2021-03-07, then its Maand would be your custom abbreviation for "March", which in this case is "mrt". When applying my transformation, that date would be updated to DEFAULT_DATE, which in this case is 2021-01-01. Furthermore, the mutate() would also ensure that the Maand is updated (here to "jan"): to the level of the factor (here the 1st level) that corresponds to the month of the DEFAULT_DATE (here the 1st month of the year).
Related
Need plot labels from separate column in ggplot
I am plotting number of people against the number of certain incidents per month, and need to plot each month's label on the side of each point in the plot. The labels are in a separate column (column 'month') and I need to find the synthax that can help me put the abbreviated 3-letter month label besides each associated point in the plot. I have done this in base plot previously but can't get it done in ggplot. My script: library(dplyr) library(ggplot2) new_labels <- c("1995-\n2001","2002-\n2011","2012-\n2019") df %>% mutate(period=factor(period,levels = unique(period), labels = new_labels,ordered = T)) %>% ggplot(aes(people,inc)) + geom_point(cex=3.5) + scale_y_continuous(breaks=seq(0,12,by=2),limit=c(0,12),expand=c(0,1)) + scale_x_continuous(breaks=seq(0,75000,by=10000),limit=c(0,75000),expand=c(0,0)) + theme_bw(base_size=20) + facet_grid(class~category) + facet_grid(rows=vars(period)) + stat_smooth(method="glm", method.args = list(family = "poisson"),col="black") + theme(strip.background = element_rect(fill="lightgrey", size=1, color="black")) + theme(strip.text.y = element_text(size=19, color="black",angle=0)) + labs(x = "Number of people per month", y = "Incidents per month") My dataframe: dput(df) structure(list(period = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), month = structure(c(5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L), .Label = c("APR", "AUG", "DEC", "FEB", "JAN", "JUL", "JUN", "MAR", "MAY", "NOV", "OCT", "SEP"), class = "factor"), people = c(4068L, 7251L, 14384L, 20513L, 18748L, 17760L, 23433L, 22878L, 12815L, 8101L, 7477L, 5018L, 6830L, 16278L, 30244L, 45747L, 31807L, 41184L, 54124L, 52565L, 24365L, 12759L, 8307L, 6038L, 16711L, 32187L, 45810L, 53932L, 40082L, 58506L, 71259L, 67564L, 33556L, 22818L, 16508L, 15848L), inc = c(2L, 1L, 3L, 5L, 3L, 0L, 2L, 5L, 1L, 1L, 0L, 0L, 0L, 2L, 1L, 5L, 5L, 2L, 7L, 6L, 1L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), row.names = c(NA, -36L), class = "data.frame")
How can I easily ad one colour in each bar and make it descending? [duplicate]
This question already has answers here: Reorder bars in geom_bar ggplot2 by value (3 answers) Change bar plot colour in geom_bar with ggplot2 in r (2 answers) Closed last year. How can I easily ad one color in each bar and make it descending? QG4 %>% filter(value=="Yes") %>% ggplot(aes(y=Freq, x=variable))+ geom_bar(position = "dodge", stat = "identity")+ theme_bw()+ coord_flip()+ labs(x="Mode", y=NULL, title = "What is your usual (or most frequently used) mode of travel to work/place of study?") I used dput(QG4) to avoid using a picture of the dataset: structure(list(variable = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("Bicycle", "Bicycle (Yélo)", "Bus", "Car", "Car (Yélo)", "Carpool", "Motorcycle/scooter", "On foot", "Scooter (trottinette)", "Train"), class = "factor"), value = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), Freq = c(1634L, 2143L, 1781L, 1532L, 2281L, 2202L, 2267L, 1331L, 2265L, 2172L, 655L, 146L, 508L, 757L, 8L, 87L, 22L, 958L, 24L, 117L)), class = "data.frame", row.names = c(NA, -20L)) enter image description here
Calculating number of observations per group in R
I would like to calculate column D based on the date column A. Column D should represent the number of observations grouped by column B. Edit: fake data below data <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 9L, 10L, 11L, 12L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1/1/2015", "1/2/2015", "1/3/2015", "1/4/2015", "1/5/2015", "1/6/2015", "5/10/2015", "5/11/2015", "5/6/2015", "5/7/2015", "5/8/2015", "5/9/2015"), class = "factor"), Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Value = c(215630672L, 1650864L, 124017368L, 128073224L, 97393448L, 128832128L, 14533968L, 46202296L, 214383720L, 243346080L, 85127128L, 115676688L, 79694024L, 109398680L, 235562856L, 235473648L, 158246712L, 185424928L), Number.of.Observations.So.Far = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L )), class = "data.frame", row.names = c(NA, -18L)) What function in R will create a column D like so?
We can group by 'Country' and create sequence column with row_number() library(dplyr) df1 %>% group_by(Country) %>% mutate(NumberOfObs = row_number()) Or with base R df1$NumberOfObs <- with(df1, ave(seq_along(Country), Country, FUN = seq_along)) Or with table df1$NumberOfObs <- sequence(table(df1$Country)) Or in data.table library(data.table) setDT(df1)[, NumberOfObs := rowid(Country)][] data df1 <- read.csv('file.csv')
Why assign() is behaving oddly in for() loop with dplyr pipes in R?
I need to loop different functions in dataframes allocated in my Global Environment and save the output of each "run" of the loop in a new dataframe that includes the initial name. For this end, I'm using assign() with for() loop. It works well, except if I use the dplyr pipe %>%. The function itself works, but there is some error with the name assigned to the output dataframe. How can I fix this issue with %>% ? If not possible to fix, can I change assign() for another function? This works well: code1: for(i in unique(table$V1)){ assign(paste0(i, "_target"),table[grepl(i,table$V1),]) } Explanation: Selects unique entries in column 1 of the "table" and subset the rows with these entries to a new dataframe per entry. Output: the new dataframe name is "entry name" + "_target" This doesn't work well (and I would like to know why): code2: for(i in mget(ls(pattern = "_target"))){ assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__)))) } Explanation: Selects all dataframes in the Global Env that name contains "_target". In each dataframe: it does the mean of the values "(C__)" associated to entries with same characters "(Sample.Name)". Should be output: the new dataframe name is "entry name_target" + "_slim". Real output: the new dataframe presents the mean of the same characters, but is named "c(aleatory numbers)_slim". code2 input: STA_target <- structure(list(Well = structure(c(8L, 9L, 10L, 21L, 22L, 23L, 33L, 34L, 35L, 46L, 47L, 48L, 58L, 59L, 60L, 73L, 74L, 75L, 85L, 86L, 87L, 97L, 98L, 99L), .Label = c("", "A1", "A10", "A11", "A12", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "Analysis Type", "B1", "B10", "B11", "B12", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "B9", "C1", "C10", "C11", "C12", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9", "Chemistry", "D1", "D10", "D11", "D12", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "E1", "E10", "E11", "E12", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9", "Endogenous Control", "Experiment File Name", "Experiment Run End Time", "F1", "F10", "F11", "F12", "F2", "F3", "F4", "F5", "F6", "F7", "F8", "F9", "G1", "G10", "G11", "G12", "G2", "G3", "G4", "G5", "G6", "G7", "G8", "G9", "H1", "H10", "H11", "H12", "H2", "H3", "H4", "H5", "H6", "H7", "H8", "H9", "Instrument Type", "Passive Reference", "Reference Sample", "RQ Min/Max Confidence Level", "Well"), class = "factor"), Sample.Name = c("Control_in", "Control_in", "Control_in", "Sample2_in", "Sample2_in", "Sample2_in", "Sample5_in", "Sample5_in", "Sample5_in", "Sample3_in", "Sample3_in", "Sample3_in", "Control_c", "Control_c", "Control_c", "Sample2_c", "Sample2_c", "Sample2_c", "Sample3_c", "Sample3_c", "Sample3_c", "Sample5_c", "Sample5_c", "Sample5_c"), Target.Name = c("STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA"), Task = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "Task", "UNKNOWN"), class = "factor"), Reporter = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L ), .Label = c("", "Reporter", "SYBR"), class = "factor"), Quencher = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c("", "None", "Quencher"), class = "factor"), RQ = structure(c(12L, 12L, 12L, 8L, 8L, 8L, 6L, 6L, 6L, 11L, 11L, 11L, 1L, 1L, 1L, 5L, 5L, 5L, 14L, 14L, 14L, 18L, 18L, 18L), .Label = c("", "0.706286132", "0.714652956", "0.724364996", "0.7665869", "0.828774512", "0.838611245", "0.846661508", "0.863589227", "0.896049678", "0.929288268", "1", "1.829339266", "15.57538891", "17.64183807", "27.67574501", "3.064466953", "34.78881073", "41.82569504", "8.117406845", "8.884188652", "RQ"), class = "factor"), RQ.Min = structure(c(9L, 9L, 9L, 7L, 7L, 7L, 8L, 8L, 8L, 10L, 10L, 10L, 1L, 1L, 1L, 2L, 2L, 2L, 21L, 21L, 21L, 17L, 17L, 17L), .Label = c("", "0.032458056", "0.429091513", "0.460811675", "0.541289926", "0.611138761", "0.674698055", "0.71383971", "0.742018044", "0.753834546", "0.772591949", "0.7868222", "0.803419232", "0.820919514", "0.826185584", "0.989573121", "22.58564949", "27.2142868", "4.501103401", "4.745172024", "4.843928814", "4.979007244", "9.076541901", "RQ Min"), class = "factor"), RQ.Max = structure(c(13L, 13L, 13L, 8L, 8L, 8L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 16L, 16L, 16L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("", "0.858568788", "0.910271943", "0.943540215", "0.947846115", "0.962214947", "0.971821666", "1.062453985", "1.145578504", "1.162549496", "1.218146205", "1.244680166", "1.347676158", "14.63914394", "15.85231876", "18.10507202", "20.37916756", "3.381742954", "50.08181381", "53.58541107", "64.28199768", "65.58969879", "84.38751984", "RQ Max"), class = "factor"), C_ = c(25.48042297, 25.4738903, 25.83390617, 25.7304306, 25.78297043, 25.41260529, 25.49670792, 25.52298164, 25.6956234, 25.34812355, 25.51462555, 25.15455437, 0, 0, 0, 32.29237366, 37.10370636, 32.22016525, 29.50172043, 30.18544579, 29.91492081, 25.14842796, 24.89806747, 24.99397278), C_.Mean = c(25.59607506, 25.59607506, 25.59607506, 25.64200401, 25.64200401, 25.64200401, 25.57177162, 25.57177162, 25.57177162, 25.33910179, 25.33910179, 25.33910179, NA, NA, NA, 33.87208176, 33.87208176, 33.87208176, 29.86736107, 29.86736107, 29.86736107, 25.01348877, 25.01348877, 25.01348877), C_.SD = structure(c(21L, 21L, 21L, 20L, 20L, 20L, 12L, 12L, 12L, 19L, 19L, 19L, 1L, 1L, 1L, 31L, 31L, 31L, 23L, 23L, 23L, 14L, 14L, 14L), .Label = c("", "0.039937571", "0.043110434", "0.049541138", "0.05469643", "0.061177365", "0.066671595", "0.07365533", "0.079849631", "0.082057081", "0.095515646", "0.108060829", "0.120047837", "0.126316145", "0.129658803", "0.130481929", "0.142733917", "0.172286868", "0.180205062", "0.200392827", "0.205995336", "0.236968249", "0.344334781", "0.36769405", "0.413046211", "0.445171326", "0.514641941", "0.640576839", "0.895943522", "0.993181109", "2.798901796", "C_ SD"), class = "factor"), `_C_` = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "_C_"), class = "factor"), `_C_.Mean` = structure(c(8L, 8L, 8L, 5L, 5L, 5L, 4L, 4L, 4L, 7L, 7L, 7L, 1L, 1L, 1L, 3L, 3L, 3L, 13L, 13L, 13L, 14L, 14L, 14L), .Label = c("", "_C_ Mean", "-0.577166259", "-0.68969661", "-0.720502198", "-0.776381195", "-0.85484314", "-0.96064502", "-1.058534026", "-2.04822278", "-2.545912504", "-3.293611526", "-4.921841145", "-6.081196308", "0.477069855", "1.373315215", "2.092705965", "2.244637728", "2.251055479", "2.346632004", "2.456220627", "2.557917356", "2.729323149", "2.746313095" ), class = "factor"), `_C_.SE` = structure(c(13L, 13L, 13L, 11L, 11L, 11L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 24L, 24L, 24L, 21L, 21L, 21L, 15L, 15L, 15L), .Label = c("", "_C_ SE", "0.042180877", "0.042606823", "0.048373949", "0.077573851", "0.088320434", "0.102536619", "0.108728357", "0.113733612", "0.117972165", "0.144372106", "0.155044988", "0.223316222", "0.224465802", "0.258952528", "0.300881863", "0.306413502", "0.319273174", "0.579304695", "0.606897891", "0.635279417", "0.682336032", "1.643036604"), class = "factor"), HK.Control._C_.Mean = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ Mean" ), class = "factor"), HK.Control._C_.SE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ SE" ), class = "factor"), `__C_` = structure(c(12L, 12L, 12L, 16L, 16L, 16L, 18L, 18L, 18L, 13L, 13L, 13L, 1L, 1L, 1L, 19L, 19L, 19L, 7L, 7L, 7L, 10L, 10L, 10L), .Label = c("", "__C_", "-0.871322632", "-1.61563623", "-3.021018982", "-3.15124011", "-3.961196184", "-4.140928745", "-4.790550232", "-5.120551586", "-5.38631773", "0", "0.105801903", "0.15834935", "0.211582825", "0.240142822", "0.253925949", "0.27094841", "0.383478791", "0.465211242", "0.484685272", "0.501675308"), class = "factor"), Automatic.Ct.Threshold = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "Automatic Ct Threshold", "TRUE"), class = "factor"), Ct.Threshold = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "0.056211855", "0.208910329", "0.693888608", "0.704941193", "Ct Threshold" ), class = "factor"), Automatic.Baseline = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "Automatic Baseline", "TRUE"), class = "factor"), Baseline.Start = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "3", "Baseline Start" ), class = "factor"), Baseline.End = structure(c(3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 13L, 14L, 14L, 8L, 12L, 8L, 6L, 7L, 7L, 3L, 3L, 3L), .Label = c("", "21", "22", "23", "25", "26", "27", "29", "30", "31", "32", "34", "35", "39", "Baseline End"), class = "factor"), Efficiency = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "1", "Efficiency" ), class = "factor"), Comments = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Comments"), class = "factor"), HIGHSD = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L ), .Label = c("", "HIGHSD", "N", "Y"), class = "factor"), NOAMP = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "N", "NOAMP", "Y"), class = "factor"), OUTLIERRG = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "N", "OUTLIERRG", "Y"), class = "factor"), EXPFAIL = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "EXPFAIL", "N", "Y" ), class = "factor")), .Names = c("Well", "Sample.Name", "Target.Name", "Task", "Reporter", "Quencher", "RQ", "RQ.Min", "RQ.Max", "C_", "C_.Mean", "C_.SD", "_C_", "_C_.Mean", "_C_.SE", "HK.Control._C_.Mean", "HK.Control._C_.SE", "__C_", "Automatic.Ct.Threshold", "Ct.Threshold", "Automatic.Baseline", "Baseline.Start", "Baseline.End", "Efficiency", "Comments", "HIGHSD", "NOAMP", "OUTLIERRG", "EXPFAIL" ), row.names = c(12L, 13L, 14L, 24L, 25L, 26L, 36L, 37L, 38L, 48L, 49L, 50L, 60L, 61L, 62L, 72L, 73L, 74L, 84L, 85L, 86L, 96L, 97L, 98L), class = "data.frame") code2 "output": > dput(`c(8, 9, 10, 21, 22, 23, 33, 34, 35, 46, 47, 48, 58, 59, 60, 73, 74, 75, 85, 86, 87, 97, 98, 99)_slim`) structure(list(Group.1 = c("Sample2_c", "Sample2_in", "Sample3_c", "Sample5_in", "Control_c", "Control_in", "Sample5_c", "Sample3_in" ), x = c(33.8720817566667, 25.6420021066667, 29.8673623433333, 25.5717709866667, 0, 25.5960731466667, 25.0134894033333, 25.3391011566667 )), .Names = c("Group.1", "x"), row.names = c(NA, -8L), class = "data.frame") I don't know if this is really the output because of the given name. But the expected output should be something like that with the correct name: STA_slim Thank you for your time
First of all, I strongly suggest you avoid assign() in your R code. It's much better to use one of the many mapping/apply function in R to build related data in lists. Using get/assign is sign that you are not doing things in a very R-like way. Your problem has nothing to do with dplyr really, it's what you are looping over in your loop. When you do for(i in mget(ls(pattern = "_target"))){ assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__)))) } that i isn't the name of the data.frame, because you did mget() it's the data frame itself. It doesn't make sense to paste that into a new name. To "fix" this, you could do for(i in ls(pattern = "_target")){ assign(paste0(i, "_slim"),data.frame(get(i) %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__)))) } But even then you don't have a column named C__ in your example data set. You have C_ or _C_ or __C_ (what do these names even mean??). So you'd need to fix that. The better list way would be slim <- lapply(mget(ls(pattern = "_target$")) , function(x) { x %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C_)) })
how do you create linear line on geom_bar in ggplot2
I need to create stacked ggplot bar plot given this data set with linear line drawn: dput(t) structure(list(Date = structure(c(16436, 16436, 16436, 16467, 16467, 16467, 16467, 16467, 16679, 16679, 16679, 16679, 16679 ), class = "Date"), Applicatio = structure(c(4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L), .Label = c("DB", "Opt", "Tom", "Web"), class = "factor"), Code = structure(c(1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 3L, 1L, 2L, 4L, 3L), .Label = c("ch", "db", "tt", "zz"), class = "factor"), m = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("2015-01", "2015-02", "2015-09"), class = "factor"), count = c(1L, 3L, 1L, 4L, 1L, 7L, 1L, 9L, 1L, 6L, 4L, 7L, 9L), Total = c(1L, 12L, 1L, 2L, 1L, 20L, 1L, 7L, 7L, 9L, 50L, 3L, 6L)), .Names = c("Date", "Applicatio", "Code", "m", "count", "Total"), row.names = c(NA, -13L), class = "data.frame") I am trying this: ggplot(subset(t, Date> as.Date(c("2015-01-01", format="%Y-%m-%d"))), aes(m,fill=Code))+geom_bar()+ geom_smooth(aes(m,Total),method="lm", se=FALSE)+ guides(colour=FALSE)
I am not entirely sure what you are trying to achieve, but it looks like you want this: ggplot(subset(t, Date > as.Date("2015-01-01", format="%Y-%m-%d")), aes(m,fill=Code))+geom_bar()+ geom_smooth(aes(m,Total,group=1),method="lm", se=FALSE)+ guides(colour=FALSE) Basically, you had a c function in the subset function that was not needed and then you needed to use group=1 inside the geom_smooth function as mentioned by the warning. So, yeah you can have a linear line on geom_bar.