This question already has answers here:
dcast warning: ‘Aggregation function missing: defaulting to length’
(2 answers)
Closed 1 year ago.
I have the following dataframe
FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")
df<-data.frame(FileNumber,EventCode,EventDate)
FileNumber EventCode EventDate
1 510708396 CASCRT 8/21/2018 12:00:00 AM
2 510708396 DISCSENT 12/3/2018 2:41:18 PM
3 510708396 DISCSENT 12/3/2018 3:50:16 PM
4 510708485 CASCRT 8/23/2018 12:00:00 AM
5 510667325 DISCSENT 12/12/2018 9:11:28 AM
I want to change this long format dataframe into a wide format data with using EventCodes CASRT and DISCSENT as the column names. I tried the following
library(reshape2)
dcast(df,FileNumber~EventCode,value.var = "EventDate")
however I recieve the following and a message that "Aggregation function missing: defaulting to length" where as I was expecting the EventDate values.
FileNumber CASCRT DISCSENT
1 510667325 0 1
2 510708396 1 2
3 510708485 1 0
I'm guessing this has something do to do with the non-unique values in the FileNumber how do I make sure that I get the Event Date values instead of 1's and 0's.
You get this error because there are multiple rows with same EventNumber and EventCode. When trying to cast the data into wide format, reshape does not know how to handle multiple values and uses its fallback solution which is lenght (i.e. counting how many elements there are in this cell)
You need to decide how you want to proceed in the case where there are more than value per cell.
You could transform the EventDate column to date-time format, so that you can compute the mean value. Or use only the max or min.
If you want to keep each date in a list, I'd highly suggest using tidyr s pivot_wider function:
FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")
df<-data.frame(FileNumber,EventCode,EventDate)
library(dplyr)
library(tidyr)
df2 <- df %>%
pivot_wider(names_from = EventCode,
values_from = EventDate)
This raises a warning, but puts the multiple elements in a list:
df2 is now:
# A tibble: 3 x 3
FileNumber CASCRT DISCSENT
<fct> <list<fct>> <list<fct>>
1 510708396 [1] [2]
2 510708485 [1] [0]
3 510667325 [0] [1]
And we can access the elements in the list:
df2$DISCSENT[1]
Returns:
list_of<factor<b7763>>[1]>
[[1]]
[1] 12/3/2018 2:41:18 PM 12/3/2018 3:50:16 PM
5 Levels: 12/12/2018 9:11:28 AM ... 8/23/2018 12:00:00 AM
Related
This question already has an answer here:
Convert factor to date class for multiple columns
(1 answer)
Closed 2 years ago.
I read in an array from Excel using read_excel, and get two datetime columns, but what I need is two columns of dates
User DOB Answer_dt Question Answer
<chr> <dttm> <dttm> <int> <int>
1 User1 1900-01-01 00:00:00 2017-01-26 00:00:00 1 7
2 User2 1900-01-01 00:00:00 2017-01-26 00:00:00 2 8
I would like the datetime columns to be converted to dates (the times are irrelevant), and have tried using mutate and lubridate in various combinations, but have succeeded only in getting an error message that I don't understand:
> library(lubridate)
> dt <- eML_daily[1, "DOB"]
> dt
# A tibble: 1 x 1
DOB
<dttm>
1 1900-01-01 00:00:00
Warning message:
`...` is not empty.
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
> as_date(dt)
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
> as_date(df[,"DOB"])
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
I don't understand the warning messages, and can't quite see what I am doing wrong. Surely it should be a simple matter to convert from dttm to date and discard the time, which I don't need.
I'd be very appreciative for a pointer.
Sincerely and with many thanks in advance
Thomas Philips
In as_date(dt) you are attempting to convert a tibble to a datetime. That unsurprisingly fails. In as_date(df[,"DOB"]), I can't say what you are trying to do as you haven't given us df.
Working example;
library(tidyverse)
library(lubridate)
dt <- tibble(x=as_datetime("2017-01-26 00:00:00"))
dt
# A tibble: 1 x 1
x
<dttm>
1 2017-01-26 00:00:00
dt %>% mutate(x=as_date(x))
# A tibble: 1 x 1
x
<date>
1 2017-01-26
You can use as.Date to convert date-time columns to date.
If you want to change columns 2 and 3 to date, you can do.
eML_daily[2:3] <- lapply(eML_daily[2:3], as.Date)
Or with dplyr :
library(dplyr)
eML_daily %>% mutate(across(2:3, as.Date))
#For dplyr < 1.0.0
#eML_daily %>% mutate_at(2:3, as.Date)
Have you tried to convert it to character first?
Here's a quick sample:
x <- tibble(dt = c(Sys.time(),Sys.time() - 345767)) %>%
mutate(dt = as_date(as.character(dt)))
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 5 years ago.
I have three variables: Year, Month, and Day. How can I merge them into one variable ("Date") so that the variable is represented as such:
yyyy-mm-dd
Thanks in advance and best regards!
How do you merge three variables into one variable?
Consider two methods:
Old school
With dplyr, lubridate, and data frames
And consider the data types. You can have:
Number or character
Date or POSIXct final type
Old School Method
The old school method is straightforward. I assume you are using vectors or lists and don't know data frames yet. Let's take your data, force it to a standardized, unambiguous format, and concatenate the data.
> y <- 2012:2015
> y
[1] 2012 2013 2014 2015
> m <- 1:4
> m
[1] 1 2 3 4
> d <- 10:13
> d
[1] 10 11 12 13
Use as.numeric if you want to be safe and convert everything to the same format before concatenation. If you get any NA values you will need to handle them with the is.na function and provide a default value.
Use paste with the sep separator value set to your delimiter, in this case, the hyphen.
> paste(y,m,d, sep = '-')
[1] "2012-1-10" "2013-2-11" "2014-3-12" "2015-4-13"
Dataframe / Dplyr / Lubridate Way
> df <- data.frame(year = y, mon = m, day = d)
> df
year mon day
1 2012 1 10
2 2013 2 11
3 2014 3 12
4 2015 4 13
Below I do the following:
Take the df object
Create a new variable name Date
Concatenate the numeric variables y, m, and d with a - separator
Convert the string literal into a Date format with ymd()
> df %>%
mutate(Date = ymd(
paste(y,m,d, sep = '-')
)
)
year mon day Date
1 2012 1 10 2012-01-10
2 2013 2 11 2013-02-11
3 2014 3 12 2014-03-12
4 2015 4 13 2015-04-13
Below we create year-month-day character strings, yyyy-mm-dd character strings (similar except one digit month and day are zero padded out to 2 digits) and Date class. The last one prints out as yyyy-mm-dd and can be manipulated in ways that character strings can't, for example adding one to a Date class object gives the next day.
First we set up some sample input:
year <- c(2017, 2015, 2014)
month <- c(3, 1, 10)
day <- c(15, 9, 25)
convert to year-month-day character string This is not quite yyyy-mm-dd since 1 digit months and days are not zero padded to 2 digits:
paste(year, month, day, sep = "-")
## [1] "2017-3-15" "2015-1-9" "2014-10-25"
convert to Date class It prints on console as yyyy-mm-dd. Two alternatives:
as.Date(paste(year, month, day, sep = "-"))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
as.Date(ISOdate(year, month, day))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
convert to character string yyyy-mm-dd In this case 1 digit month and day are zero padded out to 2 characters. Two alternatives:
as.character(as.Date(paste(year, month, day, sep = "-")))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
sprintf("%d-%02d-%02d", year, month, day)
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
This question already has answers here:
Calculate ages in R
(8 answers)
Closed 5 years ago.
Sample
df <- data.frame(
Birth_Date = c("1952-03-21", "1963-12-20", "1956-02-25", "1974-08-04", "1963-06-13", "1956-11-20", "1974-03-07", "1963-10-23", "1952-11-24", "1974-12-16"),
Items_Amount = c(68,189,69,19,299,79,149,149,29,189)
)
df
I'm trying to analyse a data-set, which has column Item_Amount(in $) and customer's birth-date spread across 90 years. Goal is to compare the sales percentage based on suitable age groups.
The main data frame contains date "BirthDate" column from "1902-02-13" to "1991-12-11" as dates not string
'data.frame': 350241 obs. of 1 variable:
$ BirthDate: Date, format: "1964-06-08" "1964-06-08" "1964-06-08" "1964-06-08" ...
> min(Trans_Cust$Birth_Date)
[1] "1902-02-13"
> difftime(max(Trans_Cust$Birth_Date),min(Trans_Cust$Birth_Date),units = "auto")
Time difference of 32808 days
> max(Trans_Cust$Birth_Date)
[1] "1991-12-11"
How Do I find the present ages based on "Birth_Date" column, store it to new column "Present_ages" and then proceed with calculating sum(Items_Amount) grouped by present_ages.
I am assuming that your birth dates are just strings, so you need to convert them to some form of date. I am using POSIXct. Once converted, you can just set up the decade boundaries and use cut to break the dates into groups.
BirthDate = c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08",
"1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24")
StartDecade = seq(as.POSIXct("1900-01-01"), as.POSIXct("2000-01-01"), by="10 years")
cut(as.POSIXct(BirthDate), breaks=StartDecade)
[1] 1960-01-01 1960-01-01 1960-01-01 1960-01-01 1900-01-01 1990-01-01 1940-01-01 1920-01-01
It may be prettier to simplify the names
as.numeric(cut(as.POSIXct(BirthDate), breaks=StartDecade)) - 1
[1] 6 6 6 6 0 9 4 2
This will return a numeric value "rounded" to the decade:
BirthDate = as.Date(c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08", "1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24"))
BDdecade <- round( as.numeric( format(BirthDate, "%Y"))-5, -1)
BDdecade
#[1] 1960 1960 1960 1960 1900 1990 1940 1920
Needed to extract the year, convert to numeric and subtract 5, since the floor function does not have the same capacity for rounding to tens and hundreds as does round.
It wasn't clear what your desired starting point for the "decades was supposed to be. This would split on the basis of the minimum date.
> BDdecade2 <- cut(BirthDate, breaks= seq( min(BirthDate), max(BirthDate), by= "10 years"))
> BDdecade2
[1] 1962-02-13 1962-02-13 1962-02-13 1962-02-13 1902-02-13 <NA> 1942-02-13
[8] 1922-02-13
8 Levels: 1902-02-13 1912-02-13 1922-02-13 1932-02-13 1942-02-13 ... 1972-02-13
The NA suggest you might need to add +365 (or perhaps even more) to the max date.
I'm using R's ff package with ffdf objects named MyData, (dim=c(10819740,16)). I'm trying to split the variable Date into Day, Month and Year and add these 3 variables into ffdf existing data MyData.
For instance: My Date column named SalesReportDate with VirtualVmode and PhysicalVmode = double after I've changed SalesReportDate to as.date(,format="%m/%d/%Y").
Example of SalesReportDate are as follow:
> B
SalesReportDate
1 2013-02-01
2 2013-05-02
3 2013-05-04
4 2013-10-06
5 2013-15-10
6 2013-11-01
7 2013-11-03
8 2013-30-02
9 2013-12-12
10 2014-01-01
I've refer to Split date into different columns for year, month and day and try to apply it but keep getting error warning.
So, is there any way for me to do this? Thanks in advance.
Credit to #jwijffels for this great solution:
require(ffbase)
MyData$SalesReportDateYear <- with(MyData["SalesReportDate"], format(SalesReportDate, "%Y"), by = 250000)
MyData$SalesReportDateMonth <- with(MyData["SalesReportDate"], format(SalesReportDate, "%m"), by = 250000)
MyData$SalesReportDateDay <- with(MyData["SalesReportDate"], format(SalesReportDate, "%d"), by = 250000)
I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.