Subtracting dates in the same row by a factor. R - r

I have the following data frame:
DF<-data.frame(stringsAsFactors = TRUE,
Sample = c(rep("s1",4),rep("s2",4)),
date = c("21/07/2020","24/07/2020","25/07/2020","27/07/2020",
"03/08/2020","06/08/2020","09/08/2020","10/08/2020"))
First I want to obtain the number of days between consecutive dates by the factor "Sample". so the output would be like this:
DF_2<-data.frame(stringsAsFactors = TRUE,
Sample = c(rep("s1",4),rep("s2",4)),
date = c("21/07/2020","24/07/2020","25/07/2020","27/07/2020",
"03/08/2020","06/08/2020","09/08/2020","10/08/2020"),
days = c(NA,3,1,2,NA,3,3,1))
Where variable "days" is my outcome variable.
Afterwards I want to add all those "days" by factor. But that is easy, will do it like this:
df_3<-aggregate(days~Sample,DF_2,sum)
I would much appreciate it if someone helps me to get right first step, to get DF_2.

We can use diff to get the difference between Date class converted 'date' column
library(dplyr)
library(lubridate)
DF1 <- DF %>%
mutate(date = dmy(date)) %>%
group_by(Sample) %>%
mutate(days = c(NA, diff(date))) %>%
ungroup

Related

add new column represent the number of occurrence of weekday within the specific month in R dataframe

I would like to add a new column in my data frame (image1), this new column represents the number of occurrences of weekdays within the specific month, at the end, I need to have something like the "working day in the month" in "image2"
how I can achieve this result in R?
This is a solution if you only have one month
for(i in 1:length(df$day_name))
{
b<- as.character(df[i,2])
c<- a[1:i,2]
df$working_day[i] <- length(which(c==b))
}
assuming the dataframe is df, you could do something like this
df <- df %>% mutate(month = month(date), year = year(date))
df <- df %>% group_by(day_name, month)
df <- df %>% summarize(working_day_in_month = n())
df <- df %>% arrange(day_name)
df

How do I sort dates by month in R?

new R student here. I am trying to sort data by month. Here is a sample of the data I need to use, followed by the code, then my results. Any tips for how to accomplish this?! I'm super stuck...
This is the latest code I have been trying:
library(readr)
weather <- read_csv("R/weather.csv", col_types = cols(High = col_number(),
Low = col_number(), Precip = col_number(),
Snow = col_number(), Snowd = col_integer()))
View(weather)
library(ggplot2)
library(ggridges)
library(dplyr)
library(lubridate)
class(weather) #what class is dataset = dataframe
head(weather) #structure of the dataset
weather.month <- weather %>% # Group data by month
mutate(weather, 'month') %>%
group_by(month = lubridate::floor_date(weather$Day, 'month')) %>%
summarise(weather.month$High)
Then this is the errors I get:
Any help getting through this would be greatly appreciated!!!
The code can be modified by converting the Day to Date class (with mdy or dmy - as it is not clear whether it is month-day-year or day-month-year format), then apply the floor_date by 'month' and apply the function on High column
library(dplyr)
library(lubridate)
weather %>% #
group_by(month = lubridate::floor_date(mdy(Day), 'month')) %>%
summarise(High = sum(High, na.rm = TRUE))

Creating a Column to return value in 3months

I am just starting off in R and I am stuck on a fairly simple problem. I have the following test dataset:
T1 <- data.frame(Make = c("Nissan","Nissan","Nissan","Nissan","Nissan","Nissan","Nissan",
"FORD","FORD","FORD","FORD","FORD","FORD"),
YearMonth = c("Apr-13","May-13","Jun-13","Jul-13","Aug-13","Sep-13","Oct-13","Apr-16","May-16",
"Jun-16","Jul-16","Aug-16","Sep-16")),
Value = c(10000,9500,8000,7500,6000,5000,4000,12000,11000,10000,8000,7000,5000))
I would like to create two extra columns to return "value in 3months" and "final value" so something like:
Any help would be greatly appreciated
We could do a lead by n = 2 after grouping
library(dplyr)
T1 %>%
group_by(Make) %>%
mutate(Value_in_3_months = lead(Value, 2), FinalValue = last(Value))
If we want to create an index base on the 'YearMonth' column, convert to yearmon class and do a match with the original column
library(zoo)
T1 %>%
mutate(YearMonth = as.yearmon(YearMonth, "%b-%y")) %>%
group_by(Make) %>%
mutate(Valuein3 = Value[match(YearMonth + 2/12, YearMonth)])

Generating additional rows based on a condition within the same data frame

I have a data frame like DF below which will be imported directly from the database (as tibble).
library(tidyverse)
library(lubridate)
date_until <- dmy("31.05.2019")
date_val <- dmy("30.06.2018")
DF <- data.frame( date_bal = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
department = c("A","A","A","B","B"),
amount = c(10,20,30,40,50)
)
DF <- DF %>%
as_tibble()
DF
It represents the amount of money spent by each department in a specific month. My task is to project how much money will be spent by each department in the following months until a specified date in the future (in this case date_until=31.05.2019)
I would like to use tidyverse in order to generate additional rows for each department where the first column date_bal would be a sequence of dates from the last one from "original" DF up until date_until which is predefined. Then I would like to add additional column called "DIFF" which would represent the difference between DATE_BAL and DATE_VAL, where DATE_VAL is also predefined. My final result would look like this:
Final result
I have managed to do this in the following way:
first filter data from DF for department A
Create another DF2 by populating it with date sequence from min(dat_bal) to date_until from 1.
Merge data frames from 1. and 2. and then add calculated columns using mutate
Since I will have to repeat this procedure for many departments I wonder if it's possible to add rows (create date sequence) in existing DF (without creating a second DF and then merging).
Thanks in advance for your help and time.
I add one day to the dates, create a sequence and then rollback to the last day of the previous month.
seq(min(date_val + days(1)), date_until + days(1), by = 'months')[-1] %>%
rollback() %>%
tibble(date_bal = .) %>%
crossing(DF %>% distinct(department)) %>%
bind_rows(DF %>% select(date_bal, department)) %>%
left_join(DF) %>%
arrange(department, date_bal) %>%
mutate(
amount = if_else(is.na(amount), 0, amount),
DIFF = interval(
rollback(date_val, roll_to_first = TRUE),
rollback(date_bal, roll_to_first = TRUE)) %/% months(1)
)

function applied to summarise + group_by doesn't work correctly

I extract my data
fluo <- read.csv("data/ctd_SOMLIT.csv", sep=";", stringsAsFactors=FALSE)
I display in three columns : the day, the month and the year based on the original date : Y - m - d
fluo$day <- day(as.POSIXlt(fluo$DATE, format = "%Y-%m-%d"))
fluo$month <- month(as.POSIXlt(fluo$DATE, format = "%Y-%m-%d"))
fluo$year <- year(as.POSIXlt(fluo$DATE, format = "%Y-%m-%d"))
This is a part of my data_frame:
Then, I do summarise and group_by in order to apply the function :
prof_DCM = fluo[max(fluo$FLUORESCENCE..Fluorescence.),2]
=> I want the depth of the max of FLUORESCENCE measured for each month, for each year.
mean_fluo <- summarise(group_by(fluo, month, year),
prof_DCM = fluo[max(fluo$FLUORESCENCE..Fluorescence.),2])
mean_fluo <- arrange(mean_fluo, year, month)
View(mean_fluo)
But it's not working ...
The values of prof_DCM still the same all along the column 3 of the data_frame:
Maybe try the following code.
library(dplyr)
mean_fluo <- fluo %>%
group_by(month,year) %>%
filter(FLUORESCENCE..Fluorescence. == max(FLUORESCENCE..Fluorescence.)) %>%
arrange(year,month)
View(mean_fluo)
You can select the variables you want to keep with 'select'
mean_fluo <- fluo %>%
group_by(month,year) %>%
filter(FLUORESCENCE..Fluorescence. == max(FLUORESCENCE..Fluorescence.)) %>%
arrange(year,month)%>%
select(c(month,year,PROFONDEUR))

Resources