I'm trying to back into a fake birthdate based on the age of a consumer. I'm using lubridate package. Here is my code:
ymd(today) - years(df$age) - months(sample(1:12, 1)) - days(sample(1:31, 1)).
I want to use this to generate a different dob that equals the age. When I run this inline it gives every row the same month and day and different year. I want the month and day to vary as well.
You can make a date with the year of birth at 1st of January and then add random duration of days to it.
library(lubridate)
library(dplyr)
set.seed(5)
df <- data.frame(age = c(18, 33, 58, 63))
df %>%
mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) +
duration(sample(0:364, n()), unit = "days"))
In base R, we can extract the year from the age column subtract it from current year, select a random month and date, paste the values together and create a Date object.
set.seed(123)
df <- data.frame(age = sample(100, 5))
as.Date(paste(as.integer(format(Sys.Date(), "%Y")) - df$age,
sprintf("%02d", sample(12, nrow(df))),
sprintf("%02d", sample(30, nrow(df))), sep = "-"))
#[1] "1990-01-29" "1940-06-14" "1978-09-19" "1933-05-16" "1928-04-03"
However, in this case you might need to make an extra check for month of February, or to be safe you might want to sample dates only from 28 instead of 30 here.
Related
I have some data that requires that the first day of every year be removed (1/1/xx), but I want to leave all of the other days of January. It doesn't follow traditional date formats, but has a Month and day columns. I've tried subsetting several ways using "&" in an effort to make sure it only removes the row when both requirements are met, but it isn't working and it removes all days from January. Here's a reproducible example of what I have been trying.
Month <- sample(1:12, 1000, replace=TRUE)
Day <- sample(1:30, 1000, replace = TRUE)
X <- sample(1:100, 1000, replace = TRUE)
df.ex <- as.data.frame(cbind(Month, Day, X))
library(dplyr)
df.ex <- df.ex %>%
filter(Month != 1 & Day != 1)
This results in all January dates being removed and all of the 1st days of the months being removed.
Here, you need to use the ! operator to negate your whole condition:
df.ex %>%
filter(!(Month == 1 & Day == 1))
I wish to generate some Tidy data.
26 companies are observed everyday for 10 days.
Each day a value is recorded.
The first day is: 2020/1/1
How do I create a list of dates so that the first 26 rows of the date column of the date frame is "2020/1/1" (Year, Month, Day) and the next 26 rows are "2020/1/2" etc.
Here is the data frame without the date column:
library(tidyverse)
set.seed(33)
date_chunk <- rep(as.Date("2020/1/1"), 26)
# Tidy data. 10 sequential days starting 2020/1/1/
df <- tibble(
company = rep(letters, 10),
value = sample(0:5, 260, replace = TRUE),
color = "grey"
)
You can try this
rep(seq(as.Date("2020-01-01"),as.Date("2020-01-10"),1),each=26)
This will return a list of dates from 2020-01-01 to 2020-01-10 where each date will be repeated 26 times
For each company we can add row_number() to first date_chunk to get an incremental sequence of dates.
library(dplyr)
df %>%
group_by(company) %>%
mutate(date = first(date_chunk) + row_number() - 1)
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I have data about ID and the corresponding amount over multiple years. Something like this:
ID <- c(rep("A", 5), rep("B", 7), rep("C", 3))
amount <- c(sample(1:10000, 15))
Date <- c("2016-01-22","2016-07-25", "2016-09-22", "2017-10-22", "2017-01-02",
"2016-08-22", "2016-09-22", "2016-10-22", "2017-08-22", "2017-09-22", "2017-10-22", "2018-08-22",
"2016-10-22","2017-10-25", "2018-10-22")
Now, I want to analyse every year of every ID. Specifically, I am interested in the amount. For one, I want to know the overall amount for every year. Then, i also want to know the overall amount for first 11 months of every year, first 10 months of every year, first 9 months of every year and first 8 months of every year. For this purpose I have calculated the cumSum for every ID per year as follows:
myData <- cbind(ID, amount, Date)
myData <- as.data.table(myData)
# createe cumsum per ID per Year
myData$Date <- as.Date(myData$Date, format = "%Y-%m-%d")
myData[order(clientID, clDate)]
myData[, CumSum := cumsum(amount), by =.(ID, year(Date))]
How can summarise the data.table such that i get columns amount9month, amount10month, amount11month for every ID in every year?
Between cumsum, by and dcast this is almost quite straightforward. The most difficult bit is dealing with those months without any data in. Hence this solution isn't as brief as it almost was, but it does do things the "data.table way" and avoids slow operations like looping through rows.
# Just sort the formatting out first
myData[, Date:=as.Date(Date)]
myData[, `:=`(amount = as.numeric(amount),
year = year(Date),
month = month(Date))]
bycols <- c('ID', 'year', 'month')
# Summarise all transactions for the same ID in the same month
summary <- myData[, .(amt = sum(amount)), by=bycols]
# Create a skeleton table with all possible combinations of ID, year and month, to fill in any gaps.
skeleton <- myData[, CJ(ID, year, month = 1:12, unique = TRUE)]
# Join the skeleton to the actual data, to recreate the data but with no gaps in
result.long <- summary[skeleton, on=bycols, allow.cartesian=TRUE]
result.long[, amt.cum:=cumsum(fcoalesce(amt, 0)), by=c('ID', 'year')]
# Cast the data into wide format to have one column per month
result.wide <- dcast(result.long, ID + year ~ paste0('amount',month,'month'), value.var='amt.cum')
NB. If you don't have fcoalesce, update your data.table package.
In which format do you want it? There are two simple options. You can get the requested result easily in two different formats:
# Prepare the data
ID <- c(rep("A", 5), rep("B", 7), rep("C", 3))
amount <- c(sample(1:1, 15, replace = TRUE))
Date <- c("2016-01-22","2016-07-25", "2016-09-22", "2017-10-22", "2017-01-02", "2016-08-22", "2016-09-22", "2016-10-22", "2017-08-22", "2017-09-22", "2017-10-22", "2018-08-22", "2016-10-22","2017-10-25", "2018-10-22")
myData <- data.frame(ID, amount, Date)
# Add year column
myData$Date <- as.Date(myData$Date, format = "%Y-%m-%d")
myData$year <- format(myData$Date,"%Y")
Please note that I changed the amounts for testing purposes. Now two solutions.
# Format 1
by(myData$amount, list(myData$ID, myData$year), cumsum, simplify = TRUE)
# Format 2
aggregate(myData$amount, list(ID = myData$ID, Date = myData$year), cumsum)
However, you might want to have the result to be a new column in the data frame? You can solve it:
# Format: New column
myData <- myData[order(myData$year, myData$ID),] # sort by year and ID
myData$cumsum <- rep(0, nrow(myData))
for (r in 1:nrow(myData)) {
if (r > 1 && myData$year[r-1] == myData$year[r] && myData$ID[r-1] == myData$ID[r])
myData$cumsum[r] <- myData$cumsum[r-1] + myData$amount[r]
else
myData$cumsum[r] <- myData$amount[r]
}
I do not know a smooth solution with basic R. Maybe someone from the "dplr faction" has a neat trick up their sleeve?
I have seen many questions in SO on converting daily data to weekly using xts, zoo or lubridate packages. None of the answers was found appropriate for my problem. I have tried the following code
library(zoo)
library(lubridate)
library(xts)
library(tidyverse)
#Calculation for multistation
set.seed(123)
df <- data.frame("date"= seq(from = as.Date("1970-1-1"), to = as.Date("2000-12-31"), by = "day"),
"Station1" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 10, 30),
"Station2" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 11, 29),
"Station3" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 9, 28))
head(df)
# Aggregate over week
df %>%
mutate(Week = week(ymd(date)),
Year = year(ymd(date))) %>%
pivot_longer(-c(Week, date, Year), values_to = "value", names_to = "Station") %>%
group_by(Year, Week, Station) %>%
summarise(Weekly = mean(value)) %>%
arrange(Station) %>%
print(n = 55)
From the output you can see that 1970 cotains 53 weeks which I don't want. I want to start the week from the first date of every year and the 52nd week should have 8 days in a nonleap year and in case of leap years 9th and 52nd week should have 8 days so that every year contains 52 weeks only. How to do that in R?
Why not just write a function that gives the meteorological week from the definition you gave? Package lubridate will give you the day of the year with yday, which can act as the index for a vector of the correct week labels. These are straightforward to construct with simple modular math and concatenation.
You then only need to figure out if you are in a leap year, which again is possible using lubridate::leap_year. Combine these in an ifelse and you have an easy-to-use function:
met_week <- function(dates)
{
normal_year <- c((0:363 %/% 7 + 1), 52)
leap_year <- c(normal_year[1:59], 9, normal_year[60:365])
year_day <- lubridate::yday(dates)
return(ifelse(lubridate::leap_year(dates), leap_year[year_day], normal_year[year_day]))
}
and you can do
df %>% mutate(week = met_week(date))
You could just do it manually on the day of the year, not sure there is a function already built for that.
df %>%
mutate(Week = pmin(52, ceiling(yday(date) / 7)),
Year = year(ymd(date)))