I have these columns in my dataframe, df:
year month day hour minute
2013 1 7 21 54
2013 3 20 13 59
2013 1 3 18 40
.. cols(
.. year = col_double(),
.. month = col_double(),
.. day = col_double(),
.. hour = col_double(),
.. minute = col_double(),
I want to have a new column, datetime:
datetime
2013/1/7 21:54
2013/3/20 13:59
2013/1/3 18:40
I have tried this:
library(readr)
library(dplyr)
df$datetime <- with(df, as.POSIXct(paste(year, month, day, hour, minute),
format = "%Y/%m/%d %H:%M:%S"))
and this:
df$DT <- as.POSIXct((paste(df$year, df$month, df$day, df$hour, df$minute)), format="%Y/%m/%d %H:%M:%S")
However, it gives me all NA values.
I could merge just the year, month and day with as.Date() though. How can I add times to it?
Also, how can I sort by datetime later on?
You could use your original syntax, but make sure you put the right separators between the various components of the date-time:
dat <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
dat$datetime <- with(dat, as.POSIXct(paste0(year, "/", month, "/", day, " ", hour, ":", minute, ":00"),
format = "%Y/%m/%d %H:%M:%S"))
dat
#> # A tibble: 3 × 6
#> year month day hour minute datetime
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
#> 1 2013 1 7 21 54 2013-01-07 21:54:00
#> 2 2013 3 20 13 59 2013-03-20 13:59:00
#> 3 2013 1 3 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)
When you tell as.POSIXct() that the format is "%Y/%m/%d %H:%M:%S", it expects to see a string that looks like that (e.g., "2013/01/03 13:59:00"). Your syntax was pasting them together with just spaces, making something like "2013 01 03 13 59" so when as.POSIXct() tried to parse the string, it didn't see the expected separators. You could also have gotten the same result by maintaining your original paste() specification and changing the format:
library(dplyr)
dat <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
dat$datetime <- with(dat, as.POSIXct(paste(year, month, day, hour, minute),
format = "%Y %m %d %H %M"))
arrange(dat, desc(datetime))
#> # A tibble: 3 × 6
#> year month day hour minute datetime
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
#> 1 2013 3 20 13 59 2013-03-20 13:59:00
#> 2 2013 1 7 21 54 2013-01-07 21:54:00
#> 3 2013 1 3 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)
The easiest way is to use make_datetime from lubridate. This function accepts the double inputs directly so you don't need to concatenate into a string yourself.
library(dplyr)
library(lubridate)
df |> mutate(datetime = make_datetime(year, month, day, hour, minute))
Output:
# A tibble: 3 × 6
year month day hour minute datetime
<dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 2013 1 7 21 54 2013-01-07 21:54:00
2 2013 3 20 13 59 2013-03-20 13:59:00
3 2013 1 3 18 40 2013-01-03 18:40:00
Data:
library(readr)
df <- read_table("year month day hour minute
2013 1 7 21 54
2013 3 20 13 59
2013 1 3 18 40")
Update: This can also be sorted using arrange:
library(dplyr)
library(lubridate)
df |>
mutate(datetime = make_datetime(year, month, day, hour, minute)) |>
arrange(datetime)
Output:
# A tibble: 3 × 6
year month day hour minute datetime
<dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 2013 1 3 18 40 2013-01-03 18:40:00
2 2013 1 7 21 54 2013-01-07 21:54:00
3 2013 3 20 13 59 2013-03-20 13:59:00
An alternative to #DaveArmstrong's answer, using tidyverse:
suppressPackageStartupMessages({
library(tidyr)
library(lubridate)
library(dplyr)
})
#> Warning: package 'lubridate' was built under R version 4.2.2
#> Warning: package 'timechange' was built under R version 4.2.2
test <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
test
#> # A tibble: 3 × 5
#> year month day hour minute
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 7 21 54
#> 2 2013 3 20 13 59
#> 3 2013 1 3 18 40
test |>
unite(col = datetime, everything(), sep = "-", remove = FALSE) |>
mutate(
datetime = ymd_hm(datetime)
)
#> # A tibble: 3 × 6
#> datetime year month day hour minute
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-07 21:54:00 2013 1 7 21 54
#> 2 2013-03-20 13:59:00 2013 3 20 13 59
#> 3 2013-01-03 18:40:00 2013 1 3 18 40
Created on 2022-12-06 with reprex v2.0.2
library(magrittr)
df <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40,
)
df %>%
# pad date elements with leading zeros so parsing works out
dplyr::mutate(month = stringr::str_pad(month, width = 2, pad = "0"),
day = stringr::str_pad(day, width = 2, pad = "0")) %>%
# parse as actual datetime
dplyr::mutate(datetime = lubridate::ymd_hm(paste0(year, month, day, hour, minute)))
#> # A tibble: 3 x 6
#> year month day hour minute datetime
#> <dbl> <chr> <chr> <dbl> <dbl> <dttm>
#> 1 2013 01 07 21 54 2013-01-07 21:54:00
#> 2 2013 03 20 13 59 2013-03-20 13:59:00
#> 3 2013 01 03 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)
Related
I am working with a large list of dataframes that use inconsistent date formats. I would like to conditionally mutate across the list so that any dataframe that contains a string will use one date format, and those that do not contain the string use another format. In other words, I want to distinguish between dataframes launched in year 2019 (which use mdy) and those launched in all others years (which use dmy).
The following code will conditionally mutate rows within a dataframe, but I am unsure how to conditionally mutate across the entire column.
dataframes %>% map(~.x %>%
mutate(date_time = if_else(str_detect(date_time, "/19 "),
mdy_hms(date_time), dmy_hms(date_time)))
Thank you!
edit
Data and code example. There are dataframes that contain a mixture of years.
library(tidyverse)
library(lubridate)
dataframes <- list(
tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2) # July 6th
)
dataframes %>%
map(~.x %>%
mutate(date_time = if_else(str_detect(date_time, "/19 "),
mdy_hms(date_time), dmy_hms(date_time)),
date = date(date_time),
month = month(date_time),
doy = yday(date_time)))
[[1]]
# A tibble: 2 × 5
date_time num date month doy
<dttm> <int> <date> <dbl> <dbl>
1 2019-07-06 13:00:00 1 2019-07-06 7 187
2 2020-06-07 13:00:00 2 2020-06-07 6 159
[[2]]
# A tibble: 2 × 5
date_time num date month doy
<dttm> <int> <date> <dbl> <dbl>
1 2020-07-06 13:00:00 1 2020-07-06 7 188
2 2021-07-06 13:00:00 2 2021-07-06 7 187
If you are trying to determine the format of the date column for the whole data.frame based on the presence of any date from 2019, then a small tweak of your code should work.
Instead of evaluating each record for the presence of /19 , you set the condition of the if_else() to be any(str_detect(...)) which returns TRUE if any of the values are TRUE. However the result of any() is always of length 1 so you then need to rep() the result to match the length of the whole data.frame using dplyr::n().
library(tidyverse)
library(lubridate)
dataframes <- list(
tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2) # July 6th
)
dataframes %>%
map( ~ .x %>%
mutate(
date_time = if_else(str_detect(date_time, "/19 ") %>%
any() %>%
rep(n()),
mdy_hms(date_time),
dmy_hms(date_time)),
date = date(date_time),
month = month(date_time),
doy = yday(date_time)
))
#> [[1]]
#> # A tibble: 2 × 5
#> date_time num date month doy
#> <dttm> <int> <date> <dbl> <dbl>
#> 1 2019-07-06 13:00:00 1 2019-07-06 7 187
#> 2 2020-07-06 13:00:00 2 2020-07-06 7 188
#>
#> [[2]]
#> # A tibble: 2 × 5
#> date_time num date month doy
#> <dttm> <int> <date> <dbl> <dbl>
#> 1 2020-07-06 13:00:00 1 2020-07-06 7 188
#> 2 2021-07-06 13:00:00 2 2021-07-06 7 187
Created on 2022-07-20 by the reprex package (v2.0.1)
I have data as given in input section (dput below), need to convert to output with all values of two rows in one long column. I tried using transpose but cells were getting trimmed.
I don't want to hardcode since in future I might have data in 3 or 4 rows in a similar way.
P.S - I also tried pivot_longer but it didnt help
structure(list(Header = c("Sat 12/3 \n358a-947a\n1017a-229p HRS 10.02",
"Sat 12/10 \n559a-1106a\n1134a-227p HRS 8.00"), X = c("Sun 12/4 ",
"Sun 12/11 "), X.1 = c("Mon 12/5 \n548a-1121a\n1149a-618p\n650p-845p HRS 13.95",
"Mon 12/12 \n500a-1121a\n1151a-547p\n616p-830p HRS 14.53"),
X.2 = c("Tue 12/6 \n359a-1120a\n1150a-400p HRS 11.53",
"Tue 12/13 \n548a-1120a\n1148a-449p HRS 10.54"), X.3 = c("Wed 12/7 \n548a-1119a\n1149a-515p HRS 10.95",
"Wed 12/14 \n429a-1120a\n1150a-432p HRS 11.56"), X.4 = c("Thu 12/8 \n549a-1120a\n1149a-447p HRS 10.48",
"Thu 12/15 \n429a-1121a\n1152a-431p HRS 11.52"), X.5 = c("Fri 12/9 \n548a-1120a\n1148a-218p HRS 8.03",
"Fri 12/16 \n430a-1120a\n1150a-432p HRS 11.55")), class = "data.frame", row.names = c(NA,
-2L))
My try (with a little help)
pivot_longer(df, cols = c(1:7)) %>%
select(value) %>%
mutate(value=str_replace(value,"HRS","")) %>%
separate(.,value,into=c("day","entry1","entry2","entry3"),sep="\n") %>%
separate(.,entry1,into=c("time_in1","time_out1"),sep="-") %>%
separate(.,entry2,into=c("time_in2","time_out2"),sep="-") %>%
separate(.,time_out2,into=c("time_out2","duration1"),remove = FALSE,sep=" ",extra = "merge") %>%
separate(.,entry3,into=c("time_in3","time_out3"),sep="-") %>%
separate(.,time_out3,into=c("time_out3","duration2"),remove = FALSE,sep=" ") %>%
mutate(duration=coalesce(duration1,duration2)) %>%
select(day, duration, time_in1,time_out1,time_in2,time_out2,time_in3,time_out3) %>%
separate(.,day,into=c("date","day"),extra="merge") %>%
mutate(day=mdy(paste0(day,"2021")),
duration=str_trim(duration))
Approach
The key was tidyr::separate_rows(), which not only separates the cell by "\n" but also splits the components into rows rather than columns.
Here, it is much better to split into rows than into columns. Suppose that most cells have 2 or 3 entries separated by "\n"; but there is a "rogue" cell, with an unusually large number (say 9) of entries, generated by someone who repeatedly clocked in and out throughout the day.
While splitting into columns would create arbitrarily many time_in* | time_out* columns, which remain empty (NA) in all rows except the "rogue"
date day duration time_in1 time_out1 time_in2 time_out2 time_in3 time_out3 time_in4 time_out4 time_in5 time_out5 time_in6 time_out6 time_in7 time_out7 time_in8 time_out8 time_in9 time_out9
<chr> <date> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# ... ... ... ... ... ... ... ... ... NA NA NA NA NA NA NA NA NA NA NA NA
splitting into rows will maintain a tame (and stable) columnar structure
date day duration time_in time_out
<date> <chr> <dbl> <chr> <chr>
# ... ... ... ... ...
# ... ... ... ... ...
# ... ... ... ... ...
without any "extraneous" columns (or rows).
Solution
Given your sample data df
df <- structure(list(Header = c("Sat 12/3 \n358a-947a\n1017a-229p HRS 10.02", "Sat 12/10 \n559a-1106a\n1134a-227p HRS 8.00"),
X = c("Sun 12/4 ", "Sun 12/11 "),
X.1 = c("Mon 12/5 \n548a-1121a\n1149a-618p\n650p-845p HRS 13.95", "Mon 12/12 \n500a-1121a\n1151a-547p\n616p-830p HRS 14.53"),
X.2 = c("Tue 12/6 \n359a-1120a\n1150a-400p HRS 11.53", "Tue 12/13 \n548a-1120a\n1148a-449p HRS 10.54"),
X.3 = c("Wed 12/7 \n548a-1119a\n1149a-515p HRS 10.95", "Wed 12/14 \n429a-1120a\n1150a-432p HRS 11.56"),
X.4 = c("Thu 12/8 \n549a-1120a\n1149a-447p HRS 10.48", "Thu 12/15 \n429a-1121a\n1152a-431p HRS 11.52"),
X.5 = c("Fri 12/9 \n548a-1120a\n1148a-218p HRS 8.03", "Fri 12/16 \n430a-1120a\n1150a-432p HRS 11.55")),
class = "data.frame", row.names = c(NA, -2L))
the following workflow
library(tidyverse)
library(stringr)
# ...
# Code to generate 'df'.
# ...
year_observed <- 2016
results <- df %>%
mutate(id = row_number()) %>%
pivot_longer(!id, names_to = "column") %>%
separate(value, into = c("date", "entries"), sep = "\n", fill = "right", extra = "merge", remove = TRUE) %>%
separate(entries, into = c("times", "duration"), sep = "HRS", fill = "right", extra = "warn", remove = TRUE) %>%
mutate(across(date:duration, trimws),
date = as.Date(paste(str_extract(date, "\\d{1,2}/\\d{1,2}$"), year_observed, sep = "/"), format = "%m/%d/%Y"),
duration = as.numeric(duration),
duration = if_else(is.na(duration), 0, duration),
day = format(date, format = "%a")) %>%
separate_rows(times, sep = "\n") %>%
separate(times, into = c("time_in", "time_out"), sep = "-", fill = "warn", extra = "warn", remove = TRUE) %>%
# ...Further Transformations... %>%
select(id, date, day, duration, time_in, time_out)
# View results.
results
should yield results like
# A tibble: 28 x 6
id date day duration time_in time_out
<int> <date> <chr> <dbl> <chr> <chr>
1 1 2016-12-03 Sat 10.0 358a 947a
2 1 2016-12-03 Sat 10.0 1017a 229p
3 1 2016-12-04 Sun 0 NA NA
4 1 2016-12-05 Mon 14.0 548a 1121a
5 1 2016-12-05 Mon 14.0 1149a 618p
6 1 2016-12-05 Mon 14.0 650p 845p
7 1 2016-12-06 Tue 11.5 359a 1120a
8 1 2016-12-06 Tue 11.5 1150a 400p
9 1 2016-12-07 Wed 11.0 548a 1119a
10 1 2016-12-07 Wed 11.0 1149a 515p
# ... with 18 more rows
where id identifies (by row number) the original record in df.
To pivot into your newly specified output, simply execute this code, or append it to the existing workflow:
wide_results <- results %>%
group_by(id, date) %>% mutate(entry = row_number()) %>% ungroup() %>%
pivot_wider(id_cols = c(date, day, duration), names_from = entry, names_glue = "{.value}_{entry}", values_from = c(time_in, time_out)) %>%
# Select so as to alternate between 'time_in_*' and 'time_out_*'.
select(order(as.numeric(str_extract(colnames(.), "\\d+$")), str_extract(colnames(.), "^time_(in|out)"), na.last = FALSE))
# View results.
wide_results
You should obtain wide_results like:
# A tibble: 14 x 9
date day duration time_in_1 time_out_1 time_in_2 time_out_2 time_in_3 time_out_3
<date> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2016-12-03 Sat 10.0 358a 947a 1017a 229p NA NA
2 2016-12-04 Sun 0 NA NA NA NA NA NA
3 2016-12-05 Mon 14.0 548a 1121a 1149a 618p 650p 845p
4 2016-12-06 Tue 11.5 359a 1120a 1150a 400p NA NA
5 2016-12-07 Wed 11.0 548a 1119a 1149a 515p NA NA
6 2016-12-08 Thu 10.5 549a 1120a 1149a 447p NA NA
7 2016-12-09 Fri 8.03 548a 1120a 1148a 218p NA NA
8 2016-12-10 Sat 8 559a 1106a 1134a 227p NA NA
9 2016-12-11 Sun 0 NA NA NA NA NA NA
10 2016-12-12 Mon 14.5 500a 1121a 1151a 547p 616p 830p
11 2016-12-13 Tue 10.5 548a 1120a 1148a 449p NA NA
12 2016-12-14 Wed 11.6 429a 1120a 1150a 432p NA NA
13 2016-12-15 Thu 11.5 429a 1121a 1152a 431p NA NA
14 2016-12-16 Fri 11.6 430a 1120a 1150a 432p NA NA
Note
You must supply the year_observed (here 2016) to correctly contextualize the dates written in m/d format. Otherwise, they will calibrate to the year 2021, which will skew the days of the week.
Warning
These dates (12/3, etc.) are in December, and close to the end of the calendar year. If any of these entries "cross over" (from 2016) into the next year (ex. 1/1/2017), they will be incorrectly calibrated to the former year (ex. 1/1/2016), and thus have an incorrect date and weekday.
However, if your dates do cross over, that's a good indication that the full date (12/3/2016) should have been notated in the original cells, in which case
results <- df %>%
# ... %>%
mutate(
# ...
date = as.Date(str_extract(date, "(\\d{1,2}/){2,2}\\d{4,4}$"), format = "%m/%d/%Y")
# ...
) # ... %>%
would have sufficed to properly parse the dates.
I have below-mentioned dataframe in R.
DF
ID Datetime Value
T-1 2020-01-01 15:12:14 10
T-2 2020-01-01 00:12:10 20
T-3 2020-01-01 03:11:11 25
T-4 2020-01-01 14:01:01 20
T-5 2020-01-01 18:07:11 10
T-6 2020-01-01 20:10:09 15
T-7 2020-01-01 15:45:23 15
By utilizing the above-mentioned dataframe, I want to bifurcate the count basis month and time bucket considering the Datetime.
Required Output:
Month Count Sum
Jan-20 7 115
12:00 AM to 05:00 AM 2 45
06:00 AM to 12:00 PM 0 0
12:00 PM to 03:00 PM 1 20
03:00 PM to 08:00 PM 3 35
08:00 PM to 12:00 AM 1 15
You can bin the hours of the day by using hour from the lubridate package and then cut from base R, before summarizing with dplyr.
Here, I am assuming that your Datetime column is actually in a date-time format and not just a character string or factor. If it is, ensure you have done DF$Datetime <- as.POSIXct(as.character(DF$Datetime)) first to convert it.
library(tidyverse)
DF$bins <- cut(lubridate::hour(DF$Datetime), c(-1, 5.99, 11.99, 14.99, 19.99, 24))
levels(DF$bins) <- c("00:00 to 05:59", "06:00 to 11:59", "12:00 to 14:59",
"15:00 to 19:59", "20:00 to 23:59")
newDF <- DF %>%
group_by(bins, .drop = FALSE) %>%
summarise(Count = length(Value), Total = sum(Value))
This gives the following result:
newDF
#> # A tibble: 5 x 3
#> bins Count Total
#> <fct> <int> <dbl>
#> 1 00:00 to 05:59 2 45
#> 2 06:00 to 11:59 0 0
#> 3 12:00 to 14:59 1 20
#> 4 15:00 to 19:59 3 35
#> 5 20:00 to 23:59 1 15
And if you want to add January as a first row (though I'm not sure how much sense this makes in this context) you could do:
newDF %>%
summarise(bins = "January", Count = sum(Count), Total = sum(Total)) %>% bind_rows(newDF)
#> # A tibble: 6 x 3
#> bins Count Total
#> <chr> <int> <dbl>
#> 1 January 7 115
#> 2 00:00 to 05:59 2 45
#> 3 06:00 to 11:59 0 0
#> 4 12:00 to 14:59 1 20
#> 5 15:00 to 19:59 3 35
#> 6 20:00 to 23:59 1 15
Incidentally, the reproducible version of the data I used for this was:
structure(list(ID = structure(1:7, .Label = c("T-1", "T-2", "T-3",
"T-4", "T-5", "T-6", "T-7"), class = "factor"), Datetime = structure(c(1577891534,
1577837530, 1577848271, 1577887261, 1577902031, 1577909409, 1577893523
), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10,
20, 25, 20, 10, 15, 15)), class = "data.frame", row.names = c(NA,
-7L))
I want to use the Prophet() function in R, but I cannot transform my column "YearWeek" to a as.Date() column.
I have a column "YearWeek" that stores values from 201401 up to 201937 i.e. starting in 2014 week 1 up to 2019 week 37.
I don't know how to declare this column as a date in the form yyyy-ww needed to use the Prophet() function.
Does anyone know how to do this?
Thank you in advance.
One solution could be to append a 01 to the end of your yyyy-ww formatted dates.
Data:
library(tidyverse)
df <- cross2(2014:2019, str_pad(1:52, width = 2, pad = 0)) %>%
map_df(set_names, c("year", "week")) %>%
transmute(date = paste(year, week, sep = "")) %>%
arrange(date)
head(df)
#> # A tibble: 6 x 1
#> date
#> <chr>
#> 1 201401
#> 2 201402
#> 3 201403
#> 4 201404
#> 5 201405
#> 6 201406
Now let's append the 01 and convert to date:
df %>%
mutate(date = paste(date, "01", sep = ""),
new_date = as.Date(date, "%Y%U%w"))
#> # A tibble: 312 x 2
#> date new_date
#> <chr> <date>
#> 1 20140101 2014-01-05
#> 2 20140201 2014-01-12
#> 3 20140301 2014-01-19
#> 4 20140401 2014-01-26
#> 5 20140501 2014-02-02
#> 6 20140601 2014-02-09
#> 7 20140701 2014-02-16
#> 8 20140801 2014-02-23
#> 9 20140901 2014-03-02
#> 10 20141001 2014-03-09
#> # ... with 302 more rows
Created on 2019-10-10 by the reprex package (v0.3.0)
More info about a numeric week of the year can be found here.
This question already has an answer here:
Sort year-month column by year AND month
(1 answer)
Closed 1 year ago.
I have dates in the format mm/yyyy in column 1, and then results in column 2.
month Result
01/2018 96.13636
02/2018 96.40000
3/2018 94.00000
04/2018 97.92857
05/2018 95.75000
11/2017 98.66667
12/2017 97.78947
How can I order by month such that it will start from the first month (11/2017) and end (05/2018).
I have tried a few 'orders', but none seem to be ordering by year and then by month
In tidyverse (w/ lubridate added):
library(tidyverse)
library(lubridate)
dfYrMon <-
df1 %>%
mutate(date = parse_date_time(month, "my"),
year = year(date),
month = month(date)
) %>%
arrange(year, month) %>%
select(date, year, month, result)
With data:
df1 <- tibble(month = c("01/2018", "02/2018", "03/2018", "04/2018", "05/2018", "11/2017", "12/2017"),
result = c(96.13636, 96.4, 94, 97.92857, 95.75, 98.66667, 97.78947))
Will get you this 'dataframe':
# A tibble: 7 x 4
date year month result
<dttm> <dbl> <dbl> <dbl>
1 2017-11-01 2017 11 98.66667
2 2017-12-01 2017 12 97.78947
3 2018-01-01 2018 1 96.13636
4 2018-02-01 2018 2 96.40000
5 2018-03-01 2018 3 94.00000
6 2018-04-01 2018 4 97.92857
7 2018-05-01 2018 5 95.75000
Making your data values atomic (year in its own column, month in its own column) generally improves the ease of manipulation.
Or if you want to use base R date manipulations instead of lubridate's:
library(tidyverse)
dfYrMon_base <-
df1 %>%
mutate(date = as.Date(paste("01/", month, sep = ""), "%d/%m/%Y"),
year = format(as.Date(date, format="%d/%m/%Y"),"%Y"),
month = format(as.Date(date, format="%d/%m/%Y"),"%m")
) %>%
arrange(year, month) %>%
select(date, year, month, result)
dfYrMon_base
Note the datatypes created.
# A tibble: 7 x 4
date year month result
<date> <chr> <chr> <dbl>
1 2017-11-01 2017 11 98.66667
2 2017-12-01 2017 12 97.78947
3 2018-01-01 2018 01 96.13636
4 2018-02-01 2018 02 96.40000
5 2018-03-01 2018 03 94.00000
6 2018-04-01 2018 04 97.92857
7 2018-05-01 2018 05 95.75000
We can convert it to yearmon class and then do the order
library(zoo)
out <- df1[order(as.yearmon(df1$month, "%m/%Y"), df1$Result),]
row.names(out) <- NULL
out
# month Result
#1 11/2017 98.66667
#2 12/2017 97.78947
#3 01/2018 96.13636
#4 02/2018 96.40000
#5 03/2018 94.00000
#6 04/2018 97.92857
#7 05/2018 95.75000
data
df1 <- structure(list(month = c("01/2018", "02/2018", "03/2018", "04/2018",
"05/2018", "11/2017", "12/2017"), Result = c(96.13636, 96.4,
94, 97.92857, 95.75, 98.66667, 97.78947)), .Names = c("month",
"Result"), class = "data.frame",
row.names = c("1", "2", "3",
"4", "5", "6", "7"))