How to add missing months to a column of a data frame?

How to add missing months to a column of a data frame? - r

I have a dataframe with a "month" column that has a number in it but I'm missing all the other months of the year because they have 0 data in it.
So I would like to have all the months of the year and put 0 to the months that have no data.
Here is an example of my data frame:
(I have a "yearmonth" column (2015 jui -2022 mai) & "month" column & "year" column & an "n" column.)
Can you help me ? Thanks in advance!

If you have set your locale to French (using Sys.setlocale()) this will output the completed dataframe, with the month-names in French, using complete from tidyverse:
# copy of the data -----------------------
df <- structure(list(yearmonth = c("2015 jui", "2015 aoû", "2015 sep",
"2015 nov", "2015 déc", "2016 fév", "2016 mar", "2016 avr",
"2016 jul", "2016 aoû", "2016 oct", "2016 nov", "2017 fév",
"2017 jui", "2017 jul", "2017 aoû", "2017 nov", "2017 déc",
"2018 avr", "2018 mai", "2018 aoû", "2018 sep", "2019 jan",
"2019 fév", "2019 mar", "2019 jul", "2019 aoû", "2019 sep"),
month = c(6, 8, 9, 11, 12, 2, 3, 4, 7, 8, 10, 11, 2, 6, 7,
8, 11, 12, 4, 5, 8, 9, 1, 2, 3, 7, 8, 9), year = c(2015,
2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018,
2018, 2019, 2019, 2019, 2019, 2019, 2019), n = c(1, 2, 1,
1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1,
1, 3, 3, 1, 1, 1)), row.names = c(NA, -28L), class = "data.frame")
# answer ----------------------------------
library(tidyverse)
df %>% complete(year = 2015:2019, month = 1:12, fill = list(n = 0)) %>%
mutate(date = as.Date(paste(year, month, "01", sep = "-")),
yearmonth = format(date, format = "%Y %b")) %>%
filter(between(date, as.Date("2015-06-01"), as.Date("2022-05-01"))) %>%
select(-date)
#> # A tibble: 55 × 4
#> year month yearmonth n
#> <dbl> <dbl> <chr> <dbl>
#> 1 2015 6 2015 Jun 1
#> 2 2015 7 2015 Jul 0
#> 3 2015 8 2015 Aug 2
#> 4 2015 9 2015 Sep 1
#> 5 2015 10 2015 Oct 0
#> 6 2015 11 2015 Nov 1
#> 7 2015 12 2015 Dec 1
#> 8 2016 1 2016 Jan 0
#> 9 2016 2 2016 Feb 1
#> 10 2016 3 2016 Mar 1
#> # … with 45 more rows
Created on 2022-06-19 by the reprex package (v2.0.1)

Related

How to create before and after scores in two different columns based on date?

I have two tables first table has stress score recorded at various time points and second table has date of treatment. I want to get the stress scores before and after treatment for each participant who has received the treatment. Also I want a column that gives information on when was the stress score recorded before and after treatment. I do not understand from where do I begin,and what should my code look like.
score.dt = data.table(
participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44),
repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1),
date.recorded = c(
'2017-07-13',
'2017-06-26',
'2018-09-17',
'2016-04-14',
'2014-03-24',
'2016-05-30',
'2018-06-20',
'2014-08-03',
'2015-07-06',
'2014-12-17',
'2014-09-05',
'2013-06-10',
'2015-10-04',
'2016-11-04',
'2016-04-18',
'2014-02-13',
'2013-05-24',
'2014-09-10',
'2014-11-25'
),
subscale = c(
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress"
),
score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)
)
date.treatment.dt = data.table (
participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
date.treatment = c(
'2018 - 06 - 27',
'2001 - 07 - 16',
'2009 - 12 - 09',
'2009 - 05 - 20',
'2009 - 07 - 22',
'2008-07 - 02',
'2009 - 11 - 25',
'2009 - 09 - 16',
'1991 - 07 - 30',
'2016 - 05 - 25',
'2012 - 07 - 25',
'2007 - 03 - 19',
'2012 - 01 - 25',
'2011 - 09 - 21',
'2000 - 03 - 06',
'2001 - 09 - 25',
'1999 - 12 - 20',
'1997 -07 - 28',
'2002 - 03 - 12',
'2008 - 01 - 23'
))
Desired output columns: is something like this
score.date.dt = c("candidate.index.x", "repeat.instance", "subscale", "score", "date.treatment", "date.recorded", "score.before.treatment", "score.after.treatment", "months.before.treatment", "months.after.treatment")
Here the columns months.before.treatment indicates how many months before treatment the stress score was measured and month.after.treatment indicates how many months after treatment the stress score was measured.

In your example set, you only have four individuals with stress scores that have any rows in the treatment table (participants 1,4,21,and 25). Only one of these, participant 1, has both a pre-treatment stress measures and post-treatment stress measure...
Here is one way to produce the information you need:
inner_join(score.dt,date.treatment.dt, by="participant.index") %>%
group_by(participant.index, date.treatment) %>%
summarize(pre_treatment = min(date.recorded[date.recorded<=date.treatment]),
post_treatment = max(date.recorded[date.recorded>=date.treatment])) %>%
pivot_longer(cols = -(participant.index:date.treatment), names_to = "period", values_to = "date.recorded") %>%
left_join(score.dt, by=c("participant.index", "date.recorded" )) %>%
mutate(period=str_extract(period,".*(?=_)"),
months = abs(as.numeric(date.treatment-date.recorded))/(365.25/12)) %>%
pivot_wider(id_cols = participant.index:date.treatment, names_from = period, values_from=c(date.recorded, subscale, months,score))
Output:
participant.index date.treatment date.recorded_pre date.recorded_post subscale_pre subscale_post months_pre months_post score_pre score_post
<dbl> <date> <date> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2018-06-27 2017-06-26 2018-09-17 stress stress 12.0 2.69 10 18
2 4 2001-07-16 NA 2016-05-30 NA stress Inf 178. NA 30
3 21 2000-03-06 NA 2015-07-06 NA stress Inf 184. NA 12
4 25 2002-03-12 NA 2014-12-17 NA stress Inf 153. NA 40
Note: you will have to fix the date inputs to the two source files, like this:
# first correct, your date.treatment column, and convert to date
date.treatment.dt[, date.treatment := as.Date(str_replace_all(date.treatment," ",""), "%Y-%m-%d")]
# second, similarly fix the date column in your stress score table
score.dt[,date.recorded := as.Date(date.recorded,"%Y-%m-%d")]

It seems like there are a few parts to what you're asking. First, you need to merge the two tables together. Here I use dplyr::inner_join() which automatically detects that the candidate.index is the only column in common and merges on that while discarding records found in only one of the tables. Second, we convert to a date format for both dates to enable the calculation of elapsed months.
library(tidyverse)
library(data.table)
library(lubridate)
score.dt <- structure(list(participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44), repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1), date.recorded = c("2017-07-13", "2017-06-26", "2018-09-17", "2016-04-14", "2014-03-24", "2016-05-30", "2018-06-20", "2014-08-03", "2015-07-06", "2014-12-17", "2014-09-05", "2013-06-10", "2015-10-04", "2016-11-04", "2016-04-18", "2014-02-13", "2013-05-24", "2014-09-10", "2014-11-25"), subscale = c("stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress"), score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)), row.names = c(NA, -19L), class = c("data.table", "data.frame"))
date.treatment.dt <- structure(list(participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), date.treatment = c("2018 - 06 - 27", "2001 - 07 - 16", "2009 - 12 - 09", "2009 - 05 - 20", "2009 - 07 - 22", "2008-07 - 02", "2009 - 11 - 25", "2009 - 09 - 16", "1991 - 07 - 30", "2016 - 05 - 25", "2012 - 07 - 25", "2007 - 03 - 19", "2012 - 01 - 25", "2011 - 09 - 21", "2000 - 03 - 06", "2001 - 09 - 25", "1999 - 12 - 20", "1997 -07 - 28", "2002 - 03 - 12", "2008 - 01 - 23")), row.names = c(NA, -20L), class = c("data.table", "data.frame"))
inner_join(date.treatment.dt, score.dt) %>%
mutate(across(contains("date"), as_date)) %>%
mutate(months.after = interval(date.treatment, date.recorded) %/% months(1)) %>%
mutate(months.before = 0 - months.after)
#> Joining, by = "participant.index"
#> participant.index date.treatment repeat.instance date.recorded subscale
#> 1: 1 2018-06-27 2 2017-07-13 stress
#> 2: 1 2018-06-27 3 2017-06-26 stress
#> 3: 1 2018-06-27 6 2018-09-17 stress
#> 4: 4 2001-07-16 1 2014-03-24 stress
#> 5: 4 2001-07-16 2 2016-05-30 stress
#> 6: 21 2000-03-06 1 2014-08-03 stress
#> 7: 21 2000-03-06 2 2015-07-06 stress
#> 8: 25 2002-03-12 1 2014-12-17 stress
#> score months.after months.before
#> 1: 18 -11 11
#> 2: 10 -12 12
#> 3: 18 2 -2
#> 4: 16 152 -152
#> 5: 30 178 -178
#> 6: 10 172 -172
#> 7: 12 184 -184
#> 8: 40 153 -153
Created on 2022-04-05 by the reprex package (v2.0.1)

How to build dummy panel in R? [duplicate]

This question already has an answer here:
Create a matrix of dummy variables from my data frame; use `NA` for missing values
(1 answer)
Closed last year.
How do I generate a dummy variable which is zero before year and takes the value 1 from year and onwards to 2019?
Original data:
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), Year = c(2017,
2015, 2018, 2018, 2018, 2018, 2018, 2018)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
what I need:
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), Year = c(2017,
2015, 2018, 2018, 2018, 2018, 2018, 2018), `2015` = c(NA, 1,
NA, NA, NA, NA, NA, NA), `2016` = c(NA, 1, NA, NA, NA, NA, NA,
NA), `2017` = c(1, 1, NA, NA, NA, NA, NA, NA), `2018` = c(1,
1, 1, 1, 1, 1, 1, 1), `2019` = c(1, 1, 1, 1, 1, 1, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))

split on id, extend year range i:2019, then reshape from long-to-wide:
res <- reshape(stack(sapply(split(df2$Year, df2$id), function(i) i:2019)),
timevar = "values", v.names = "values", idvar = "ind",
direction = "wide")
# fix the column names order
res <- res[ sort(colnames(res)) ]
res
# ind values.2015 values.2016 values.2017 values.2018 values.2019
# 1 1 NA NA 2017 2018 2019
# 4 2 2015 2016 2017 2018 2019
# 9 3 NA NA NA 2018 2019
# 11 4 NA NA NA 2018 2019
# 13 5 NA NA NA 2018 2019
# 15 6 NA NA NA 2018 2019
# 17 7 NA NA NA 2018 2019
# 19 8 NA NA NA 2018 2019

Merging of two datasets of different lenghts

I'm trying to merge two datasets I have.
df1:
day
month
year
lon
lat
month-year
3
5
2009
5.7
53.9
May 2009
8
9
2004
6.9
52.6
Sep 2004
15
9
2004
3.8
50.4
Sep 2004
5
5
2009
2.7
51.2
May 2009
28
7
2005
14.8
62.4
Jul 2005
18
9
2004
5.1
52.5
Sep 2004
df2:
nao-value
sign
month-year
- 2.1
Negative
Sep 2004
1.3
Positive
Jul 2005
- 1.1
Negative
May 2009
I want to merge this to add the NAO value for each month and year in the occurrence data, meaning i want the NAO value for each specific month repeated for all registrations of that month in the occurrence data.
Problem is I cannot get the NAO values to line up where it should by the occurrence data, its either placed just repetitive and not aligned with the date it should, given as month-year.x and month-year.y ,or it is given back as NA value.
I have tried a few different approaches:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
But not of those give the result I desire.
Edit to include dput:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

merge function is case sensitive. You have different cases in two dataframes that you are merging. Make the case in both the dataframe same and then perform the merge. Try -
result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')

Using tidyverse
library(dplyr)
df1 %>%
mutate(Date = tolower(Date)) %>%
inner_join(df2, by = 'Date')

I'd like to show data values on stacked bar chart in ggplot2

I have successfully created a stacked bar plot but I cannot add labels indicating the percentages. That is all that I am missing.
I basically do not know how to use the geom_label/geom_text correctly, I have tried many many solutions but nothing has worked for me.
I have tried the geom_text function but it keeps telling me I am doing it wrong.
year Month2 Month Day HE Supply MUnit MPrice MBlock Fuel
2017 1 Jan 01 8 9408 SD2 15.38 126 COAL
2017 1 Jan 01 9 9388 SD3 15.46 218 COAL
2017 1 Jan 01 10 9393 SD3 15.46 218 COAL
2017 1 Jan 01 11 9628 SD4 15.47 203 COAL
2017 1 Jan 01 12 9943 EGC1 21.40 72 GAS
2017 1 Jan 01 13 10106 BR5 21.41 245 COAL
2017 1 Jan 01 14 10114 BR5 21.41 245 COAL
2017 1 Jan 01 15 9971 EGC1 20.75 75 GAS
2017 1 Jan 01 16 10302 BR5 21.41 245 COAL
2017 1 Jan 01 17 10655 TC01 22.77 11 GAS
2017 1 Jan 01 18 10811 CAL1 24.88 25 GAS
2017 1 Jan 01 19 10821 CAL1 24.88 25 GAS
2017 1 Jan 01 20 10765 BIG 26.00 30 HYDRO
2017 1 Jan 02 8 10428 CAL1 22.04 30 GAS
2017 1 Jan 02 9 10723 CAL1 29.97 59 GAS
2017 1 Jan 02 10 10933 BRA 44.50 30 HYDRO
2017 1 Jan 02 11 11107 ANC1 46.46 63 GAS
2017 1 Jan 02 12 11098 ANC1 46.46 38 GAS
2017 1 Jan 02 13 10839 JOF1 26.59 45 GAS
2017 1 Jan 02 14 10814 JOF1 26.09 15 GAS
2017 1 Jan 02 15 10797 BIG 26.00 30 HYDRO
sp <- ggplot(data = MU17) +
geom_bar(mapping = aes(x = factor(Month,levels=month.abb),
fill = factor(Fuel, levels=c("COAL", "GAS","HYDRO","BIOMASS"))),
position = "Fill") +
scale_y_continuous(labels = scales::percent)
sp + scale_fill_manual(breaks=c("COAL", "GAS","HYDRO","BIOMASS"),
values=c("black","yellow","blue","green")) +
labs(x = "2017" , y="Marginal Fuel Between HE8 & HE20") +
labs(fill="Fuel Type")
I am hoping to get the exact same plot that I get, just with labels indicating percentages.

I personally prefer using geom_col over geom_bar and process the data myself rather than let ggplot2 do it. This way you have more control over whats going on.
Since you have not provided all of you data I just use the snippet you provided.
library(tibble)
MU17 <- tribble(~year, ~Month2, ~Month, ~Day, ~HE, ~Supply, ~MUnit, ~MPrice, ~MBlock, ~Fuel,
2017, 1, "Jan", 01, 8, 9408, "SD2", 15.38, 126, "COAL",
2017, 1, "Jan", 01, 9, 9388, "SD3", 15.46, 218, "COAL",
2017, 1, "Jan", 01, 10, 9393, "SD3", 15.46, 218, "COAL",
2017, 1, "Jan", 01, 11, 9628, "SD4", 15.47, 203, "COAL",
2017, 1, "Jan", 01, 12, 9943, "EGC1", 21.40, 72, "GAS",
2017, 1, "Jan", 01, 13, 10106, "BR5", 21.41, 245, "COAL",
2017, 1, "Jan", 01, 14, 10114, "BR5", 21.41, 245, "COAL",
2017, 1, "Jan", 01, 15, 9971, "EGC1", 20.75, 75, "GAS",
2017, 1, "Jan", 01, 16, 10302, "BR5", 21.41, 245, "COAL",
2017, 1, "Jan", 01, 17, 10655, "TC01", 22.77, 11, "GAS",
2017, 1, "Jan", 01, 18, 10811, "CAL1", 24.88, 25, "GAS",
2017, 1, "Jan", 01, 19, 10821, "CAL1", 24.88, 25, "GAS",
2017, 1, "Jan", 01, 20, 10765, "BIG", 26.00, 30, "HYDRO",
2017, 1, "Jan", 02, 8, 10428, "CAL1", 22.04, 30, "GAS",
2017, 1, "Jan", 02, 9, 10723, "CAL1", 29.97, 59, "GAS",
2017, 1, "Jan", 02, 10, 10933, "BRA", 44.50, 30, "HYDRO",
2017, 1, "Jan", 02, 11, 11107, "ANC1", 46.46, 63, "GAS",
2017, 1, "Jan", 02, 12, 11098, "ANC1", 46.46, 38, "GAS",
2017, 1, "Jan", 02, 13, 10839, "JOF1", 26.59, 45, "GAS",
2017, 1, "Jan", 02, 14, 10814, "JOF1", 26.09, 15, "HYDRO",
2017, 1, "Jan", 02, 15, 10797, "BIG", 26.00, 30, "BIOMASS",
2017, 2, "Feb", 01, 8, 9408, "SD2", 15.38, 126, "COAL",
2017, 2, "Feb", 01, 9, 9388, "SD3", 15.46, 218, "COAL",
2017, 2, "Feb", 01, 10, 9393, "SD3", 15.46, 218, "COAL",
2017, 2, "Feb", 01, 11, 9628, "SD4", 15.47, 203, "COAL",
2017, 2, "Feb", 01, 12, 9943, "EGC1", 21.40, 72, "GAS",
2017, 2, "Feb", 01, 13, 10106, "BR5", 21.41, 245, "COAL",
2017, 2, "Feb", 01, 14, 10114, "BR5", 21.41, 245, "COAL",
2017, 2, "Feb", 01, 15, 9971, "EGC1", 20.75, 75, "GAS",
2017, 2, "Feb", 01, 16, 10302, "BR5", 21.41, 245, "COAL",
2017, 2, "Feb", 01, 17, 10655, "TC01", 22.77, 11, "GAS",
2017, 2, "Feb", 01, 18, 10811, "CAL1", 24.88, 25, "GAS",
2017, 2, "Feb", 01, 19, 10821, "CAL1", 24.88, 25, "GAS",
2017, 2, "Feb", 01, 20, 10765, "BIG", 26.00, 30, "HYDRO",
2017, 2, "Feb", 02, 8, 10428, "CAL1", 22.04, 30, "GAS",
2017, 2, "Feb", 02, 9, 10723, "CAL1", 29.97, 59, "GAS",
2017, 2, "Feb", 02, 10, 10933, "BRA", 44.50, 30, "HYDRO",
2017, 2, "Feb", 02, 11, 11107, "ANC1", 46.46, 63, "GAS",
2017, 2, "Feb", 02, 12, 11098, "ANC1", 46.46, 38, "GAS",
2017, 2, "Feb", 02, 13, 10839, "JOF1", 26.59, 45, "GAS",
2017, 2, "Feb", 02, 14, 10814, "JOF1", 26.09, 15, "HYDRO",
2017, 2, "Feb", 02, 15, 10797, "BIG", 26.00, 30, "BIOMASS"
)
When doing the processing I calculate:
the number of occurences/observations (n)
their relative frequency per month (p)
a percent label of p (p2)
the y-position in the bar chart of each label (pos)
This data I pipe into ggplot. Important is that I use geom_col with position = “fill”. Since I provide a positon value pos for geom_text, it is necessary to use position = “identity” here . Further, you need some kind of ifelse-Statement to adjust the colour of geom_text to white #FFFFFF for darker background colors in HYDRO and COAL.
Good luck using this approach on your original data.
library(ggplot2)
library(dplyr)
MU17 %>%
mutate(Fuel = factor(Fuel),
Month = factor(Month,levels = month.abb)) %>%
group_by(Month, Month2, Fuel) %>%
summarise(n = n()) %>%
group_by(Month) %>%
mutate(p = n / sum(n),
p2 = paste(formatC(p*100, digits = 2, format = "fg"),"%",sep = ""),
pos = cumsum(p) - (0.5 * p)) %>%
ggplot(aes(x = Month, y = p, fill = factor(Fuel, levels = rev(levels(Fuel))))) +
geom_col(width = 0.5, position = "fill") +
scale_y_continuous(limits = c(0, 1), breaks = c(-.5,-.25,0,.25,.5,.75,1), expand = c(0, 0),
labels = scales::percent) +
scale_fill_manual(breaks = c("COAL", "GAS","HYDRO","BIOMASS"),
values = c("black","yellow","blue","green")) +
geom_text(aes(label = p2, y = pos),
position = "identity",
vjust = 0.5,
colour = ifelse(data$Fuel == "COAL" | data$Fuel == "HYDRO", "#FFFFFF", "#000000")) +
labs(x = "2017" , y = "Marginal Fuel Between HE8 & HE20") +
labs(fill = "Fuel Type")

Joining data with sqldf in R [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 3 years ago.
I have a piece of code:
sql_iv <- "select year, month, day,
count(HR)
from y2
group by year, month, day
order by year, month, day"
y3=sqldf(sql_iv)
Which calculates how many times a measurement was taken in a single day (amount varies day to day):
Year Month Day count(HR)
1 2018 4 7 88
2 2018 4 8 327
3 2018 4 9 318
4 2018 4 10 274
5 2018 4 11 345
6 2018 4 12 275
.
.
.
189 2018 10 12 167
Now I need to take these calculated values and join them with my data which has every measurement in a different row (i.e. all the measurements made of April 4th would have to have value 88 in the last column). Could anyone help me out with this?
Data structure for first 10 measurements (out of 48650):
structure(list(Date = structure(c(1523119800, 1523119920, 1523119980,
1523120280, 1523120340, 1523120400, 1523120460, 1523120520, 1523120580,
1523120640), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
HR = c("97.0", "98.0", "95.0", "93.0", "94.0", "94.0", "92.0",
"96.0", "89.0", "90.0"), Year = c(2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018), Month = c(4, 4, 4, 4,
4, 4, 4, 4, 4, 4), Day = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7),
Hour = c(16, 16, 16, 16, 16, 17, 17, 17, 17, 17), Minute = c(50,
52, 53, 58, 59, 0, 1, 2, 3, 4)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))

are you looking for this?
library(dplyr)
mydata %>%
as_tibble() %>%
left_join(sqldf %>% as_tibble, by = c("Year", "Month", "Day"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to add missing months to a column of a data frame? - r

Related

How to create before and after scores in two different columns based on date?

How to build dummy panel in R? [duplicate]

Merging of two datasets of different lenghts

I'd like to show data values on stacked bar chart in ggplot2

Joining data with sqldf in R [duplicate]

Categories

Resources