mutate and truncate functions in r not producing desired output - r

I have some date data in the format Start_year = 2018/19, 2019/20, 2020/21 etc. I want to put it in the format 2018, 2019, 2020, as integers, for a group by clause later in the code.
select_data <- data %>%
select(Product, Start_year, Number, Amount) %>%
mutate(Avg_paid = Amount/Number)%>% #This works fine
mutate(Start_year_short = as.integer(str_trunc(Start_year, 4, c("left"))))
The error message I get is:
Problem with `mutate()` column `Start_year_short`.
`Start_year_short = as.integer(str_trunc(Start_year, 4, c("left")))`.
NAs introduced by coercion
If I take the mutate out and do
Start_year <- as.integer(str_trunc(Start_year, 4, c("left")))
I get an object not found error instead.
I really can't work out what's going wrong.

How about this simpler truncation method:
data.frame(Start_year) %>%
mutate(Start_year_short = str_replace(Start_year, "(\\d+).*", "\\1"))
With conversion to integer:
data.frame(Start_year) %>%
mutate(Start_year_short = as.integer(str_replace(Start_year, "(\\d+).*", "\\1")))

Use this instead as.integer(substr(Start_year, 1, 4))

I would use stringr::str_split instead to be more flexible with the length of the numbers.
Start_year = c("2018/19", "2019/20", "2020/21")
Start_year <- as.integer(stringr::str_split(Start_year, pattern="/"))

Related

Date format converting in R

I am trying to convert the values in column date (the data below) into date format yyyy-mm-dd:
I did it by using as.Date() and then change the output list into a dataframe as follows:
date_new = as.Date(df$Date, origin = '1899-12-30')
better_date = data.frame(Date = unlist(date_new))
I continue to use the converted data to filter June in each year and some other tasks as follows:
me_ff = df |>
filter(month(better_date) == 6) |>
mutate(sorting_date = better_date %m+% months(1)) |>
select(ticker,sorting_date,me_ff = Mkt_cap)
and the error message is:
"Error in `filter()`:
! Problem while computing `..1 = month(better_date) == 6`.
Caused by error in `as.POSIXlt.default()`:
! do not know how to convert 'x' to class “POSIXlt”
Run `rlang::last_error()` to see where the error occurred."
Could you please help me to solve the problem?
Thank you so much for your help!
Looks to me as if you confuse the data.frame object (the "table") with the date column. Untested code:
df <- transform(df, Date=as.Date(Date, origin = '1899-12-30')) |>
subset(strftime(Date, "%M")=="06")
should select the June rows.

Sorting a column of a table in R

I am using the following code :
Daily_intensity %>%
mutate(weekdays = weekdays(date)) %>%
group_by(weekdays) %>%
summarise(minutes_fairly_very_active = sum(fairlyactiveminutes + veryactiveminutes))
The result is not in the order of the weekdays. What should I add so that I get result in order, from Monday to Sunday?
You could use lubridate instead of base and get an ordered factor of the desired kind without needing to specify the order of the weekdays yourself:
mutate(weekdays = lubridate::wday(date, label = TRUE, week_start = 1))

Transform chr to date format in R

I want to transform from chr to date format
I have this representing year -week:
2020-53
I ve tried to do this
mutate(semana=as_date(year_week,format="%Y-%U"))
but I get the same date in all dataset 2020-01-18
I also tried
mutate(semana=strptime(year_week, "%Y-%U"))
getting the same result
Here you can see the wrong convertion
Any idea?, thanks
I think I've got something that does the job.
library(tidyverse)
library(lubridate)
# Set up table like example in post
trybble <- tibble(year_week = c("2020-53", rep("2021-01", 5)),
country = c("UK", "FR", "GER", "ITA", "SPA", "UK"))
# Function to go into mutate with given info of year and week
y_wsetter <- function(fixme, yeargoal, weekgoal) {
lubridate::year(fixme) <- yeargoal
lubridate::week(fixme) <- weekgoal
return(fixme)
}
# Making a random date so col gets set right
rando <- make_datetime(year = 2021, month = 1, day = 1)
# Show time
trybble <- trybble %>%
add_column(semana = rando) %>% # Set up col of dates to fix
mutate(yerr = substr(year_week, 1, 4)) %>% # Get year as chr
mutate(week = substr(year_week, 6, 7)) %>% # Get week as chr
mutate(semana2 = y_wsetter(semana,
as.numeric(yerr),
as.numeric(week))) %>% # fixed dates
select(-c(yerr, week, semana))
Notes:
If you somehow plug in a week greater than 53, lubridate doesn't mind, and goes forward a year.
I really struggled to get mutate to play nicely without writing my own function y_wsetter. In my experience with mutates with multiple inputs, or where I'm changing a "property" of a value instead of the whole value itself, I need to probably write a function. I'm using the lubridate package to change just the year or week based on your year_week column, so this is one such situation where a quick function helps mutate out.
I was having a weird time when I tried setting rando to Sys.Date(), so I manually set it to something using make_datetime. YMMV

Using `mutate_at()` with `as.Date()`

I'm trying to use mutate_at() from dplyr to coerce date-like columns into columns of type Date using as.Date(), but I'm getting an error. Here's the code:
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(.vars = c("date_1", "date_2"), .funs = as.Date("%m/%d/%Y"))
This gives me an error: Error in charToDate(x): character string is not in a standard unambiguous format
Not sure what's going on here, so I'd appreciate your help. I prefer dplyr solutions, but if there's a better way to do it, I'm open to that as well.
I personally prefer using the syntax as so:
The . here refers to the column, which needs to be passed to the as.Date function.
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(vars(date_1, date_2), funs(as.Date(., "%m/%d/%Y")))

using the first and last character of a string to create another variable

My data looks like this:
df <- tibble(code = c("B12345A", "B12345C"))
I want to create a second variable, say 'code_2', that takes the first and last character of the string in the first variable like this:
df <- df %>%
mutate(code_2 = str_sub(code, 1, 1),
code_3 = str_sub(code, 7, 7)) %>%
unite(code_2, 2:3, sep = "", remove = TRUE)
But surely there's a more succinct way to achieve the above using dplyr tools? (I'm thinking I could create a function to achieve this too, but I'm not sure how to go about that either.) Thanks in advance for your help.
mutate(code_2 = paste0(substr(code,1,1), substr(code,7,7)))`
Or if the length of the strings can vary:
mutate(code_2 = paste0(substr(code,1,1), substr(code,nchar(code),nchar(code))))
Change substr to str_sub if you prefer the function from the stringr package.
You could also use a regular expression:
mutate(code_2 = gsub("(.).*(.)", "\\1\\2", code))

Resources