Transform chr to date format in R - r

I want to transform from chr to date format
I have this representing year -week:
2020-53
I ve tried to do this
mutate(semana=as_date(year_week,format="%Y-%U"))
but I get the same date in all dataset 2020-01-18
I also tried
mutate(semana=strptime(year_week, "%Y-%U"))
getting the same result
Here you can see the wrong convertion
Any idea?, thanks

I think I've got something that does the job.
library(tidyverse)
library(lubridate)
# Set up table like example in post
trybble <- tibble(year_week = c("2020-53", rep("2021-01", 5)),
country = c("UK", "FR", "GER", "ITA", "SPA", "UK"))
# Function to go into mutate with given info of year and week
y_wsetter <- function(fixme, yeargoal, weekgoal) {
lubridate::year(fixme) <- yeargoal
lubridate::week(fixme) <- weekgoal
return(fixme)
}
# Making a random date so col gets set right
rando <- make_datetime(year = 2021, month = 1, day = 1)
# Show time
trybble <- trybble %>%
add_column(semana = rando) %>% # Set up col of dates to fix
mutate(yerr = substr(year_week, 1, 4)) %>% # Get year as chr
mutate(week = substr(year_week, 6, 7)) %>% # Get week as chr
mutate(semana2 = y_wsetter(semana,
as.numeric(yerr),
as.numeric(week))) %>% # fixed dates
select(-c(yerr, week, semana))
Notes:
If you somehow plug in a week greater than 53, lubridate doesn't mind, and goes forward a year.
I really struggled to get mutate to play nicely without writing my own function y_wsetter. In my experience with mutates with multiple inputs, or where I'm changing a "property" of a value instead of the whole value itself, I need to probably write a function. I'm using the lubridate package to change just the year or week based on your year_week column, so this is one such situation where a quick function helps mutate out.
I was having a weird time when I tried setting rando to Sys.Date(), so I manually set it to something using make_datetime. YMMV

Related

Can't figure out how to change "X5.13.1996" to date class?

I have dates listed as "X5.13.1996", representing May 13th, 1996. The class for the date column is currently a character.
When using mdy from lubridate, it keeps populating NA. Is there a code I can use to get rid of the "X" to successfully use the code? Is there anything else I can do?
You can use substring(date_variable, 2) to drop the first character from the string.
substring("X5.13.1996", 2)
[1] "5.13.1996"
To convert a variable (i.e., column) in your data frame:
library(dplyr)
library(lubridate)
dates <- data.frame(
dt = c("X5.13.1996", "X11.15.2021")
)
dates %>%
mutate(converted = mdy(substring(dt, 2)))
or, without dplyr:
dates$converted <- mdy(substring(dates$dt, 2))
Output:
dt converted
1 X5.13.1996 1996-05-13
2 X11.15.2021 2021-11-15

mutate and truncate functions in r not producing desired output

I have some date data in the format Start_year = 2018/19, 2019/20, 2020/21 etc. I want to put it in the format 2018, 2019, 2020, as integers, for a group by clause later in the code.
select_data <- data %>%
select(Product, Start_year, Number, Amount) %>%
mutate(Avg_paid = Amount/Number)%>% #This works fine
mutate(Start_year_short = as.integer(str_trunc(Start_year, 4, c("left"))))
The error message I get is:
Problem with `mutate()` column `Start_year_short`.
`Start_year_short = as.integer(str_trunc(Start_year, 4, c("left")))`.
NAs introduced by coercion
If I take the mutate out and do
Start_year <- as.integer(str_trunc(Start_year, 4, c("left")))
I get an object not found error instead.
I really can't work out what's going wrong.
How about this simpler truncation method:
data.frame(Start_year) %>%
mutate(Start_year_short = str_replace(Start_year, "(\\d+).*", "\\1"))
With conversion to integer:
data.frame(Start_year) %>%
mutate(Start_year_short = as.integer(str_replace(Start_year, "(\\d+).*", "\\1")))
Use this instead as.integer(substr(Start_year, 1, 4))
I would use stringr::str_split instead to be more flexible with the length of the numbers.
Start_year = c("2018/19", "2019/20", "2020/21")
Start_year <- as.integer(stringr::str_split(Start_year, pattern="/"))

R: transforming character to date with only year and month to apply a dateRange input on a boxplot output in a Shiny app

I have dataframe (in the following called df) with the first column df$date being dates with type character, formatted as %Y-%m.
My first question would be: how can I transform the types of all the entries in this column from character to date, keeping the same format with only year and month? I've tried as.Date("2011-08", format(%Y-%m)), as.yearmon("2011-08") (returns a num, but I want a Date), format(as.Date("2011-08"),"%Y-%m"). All didn't work.
The reason I want to change the type of this column is that I want to implement a dateRange input in a Shiny app, ranging from the minimum to the maximum date in the column mentioned above. Maybe there is another solution to this without needing to change the type?
This is my input in the Shiny-App:
box(width = 4, height = "50px",
dateRangeInput(inputId = 'dateRange',
label = "Period of analysis : ",
format = "yyyy-mm",language="en",
start = min(df$date),
end = max(df$date),
startview = "year", separator = " - ")
)
I want to have min(df$date) resp. max(df$date) for start and end, but it is not working. Again, the problem seems to be, that df$date has the type chr, e.g. "2011-08".
My Output-Code in the server function of the Shiny-App looks like this:
output$PartPlot <- renderPlot({
PartPlot_new <- subset(df, date >= input$dateRange[1] & date <= input$dateRange[2])
boxplot(PartPlot_new[, input$Table2], xlab = "Part", ylab = "Percentage")
})
As you can see, the goal is to have boxplots from the other columns of df (containing percentages).
Appreciate any help! Thanks in advance.
You can transform date string into a date value, with make use of this handy library:
library(anytime)
times <- c("2004-03-21 12:45:33.123456",
"2004/03/21 12:45:33.123456",
"20040321 124533.123456",
"03/21/2004 12:45:33.123456",
"03-21-2004 12:45:33.123456",
"2004-03-21",
"20040321",
"03/21/2004",
"03-21-2004",
"20010101")
anydate(times)
[1] "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21" "2004-03-21"
[7] "2004-03-21" "2004-03-21" "2004-03-21" "2001-01-01"
In your case, if you want to convert with %Y-%M format, you can also try:
times <- c("2018-11")
anydate(times)
[1] "2018-11-01"
What novica commented is absolutely correct. Leave your date as a full proper date with year, month, and day, where day can be 1. Column type will obviously be date. If you need to show the year and month in a label client-side, take the date value, format as a string and drop off the day. In your shiny client page where you select the inputs, you can show the year and month "label" but the "value" you pass will be a full date (or you can add the day server-side before using dateRangeInput.

Is there a way to create a column based on a stock's return over a user-defined period?

EDIT:
I've did tried changes and opted for the tidyquant package shared in the comments below.
This time I've set a range with variables, but I think I'm having trouble turning it into a function or a vector. This could either be the result of me not writing a bad for loop orrr a limitation with the underlying library.
The idea behind this loop is that it pulls the adjusted prices for the period and then takes the first and last price to calculate a change (aka the return in share price.)
I'm not sure, but would love some thoughts!
start_date = "2019-05-20"
end_date = "2019-05-30"
Symbol_list <- c("CTVA","IBM", "GOOG", "GE")
range_returns <- for (Symbol in Symbol_List) {
frame <- tq_get(Symbol, get = "stock.prices", from = start_date, to = end_date, complete_cases = FALSE)[,7]
(frame[nrow(frame),] - frame[1,]) / frame[1,]
}
Old stuff
Let's say I've got a dataframe
symbol <- c("GOOG", "IBM","GE","F","BKR")
name <- c("Google", "IBM","General Electric","Ford","Berkshire Hathaway")
df <- cbind(symbol, name)
And I want to create a third column - df$custom_return that's defined based on my personal time frame.
I've tried working with the quantmod package and I'm having some trouble with it's constraints.
Where I'm at:
I have to pull the entire price history first which prohibits the ability create a new column like so:
start_date <- "2003-01-05"
end_date <- "2019-01-05"
df$defined_period_return <- ROC(getSymbol(df$symbol, src = yahoo, from = start_date, to = end_date, periodicity = "monthly"))
I know that I only want the adjusted close which is the 6th column for the Yahoo source. So, I could add the following and just pull the records into an environment.
price_history <- null
for (Symbol in sp_500$Symbol)
price_history <- cbind(price_history,
getSymbols(df$symbol, from = start_date,
to = end_date, periodicity = "daily",
auto.assign=FALSE)[,6])
Ok, that seems feasible, but it's not exactly seamless and now I run into an issue if one of my symbols (Tickers) falls outside of the range of dates provided. For example CTVA is one of them and it didn't start trading until after the the end date. The whole scrape stops in motion right there. How do I skip over that error?
And let's say we solve the "snag" of not finding relevant records...how would you calculate the return for each symbol over different timelines? For example - Google didn't start trading until 2004. getSymbol does pull the price history once it starts trading, but that return timeline is different than GE which had data at the start of my range.
No need for a for loop. You can do everything with tidyquant and dplyr. For the first and last observations of a group you can use the functions first and last from dplyr. See code below for a working example.
library(tidyquant)
library(dplyr)
start_date = "2019-05-20"
end_date = "2019-05-30"
Symbol_list <- c("CTVA","IBM", "GOOG", "GE")
stocks <- tq_get(Symbol_list, get = "stock.prices", from = start_date, to = end_date, complete_cases = FALSE)
stocks %>%
group_by(symbol) %>%
summarise(returns = (last(adjusted) / first(adjusted)) - 1) # calculate returns
# A tibble: 4 x 2
symbol returns
<chr> <dbl>
1 CTVA -0.0172
2 GE -0.0516
3 GOOG -0.0197
4 IBM -0.0402

Changing Dates in R from webscraper but not able to convert

I am trying to complete a problem that pulls from two data sets that need to be combined into one data set. To get to this point, I need to rbind both data sets by the year-month information. Unfortunately, the first data set needs to be tallied by year-month info, and I can't seem to figure out how to change the date so I can have month-year info rather than month-day-year info.
This is data on avalanches and I need to write code totally the number of avalanches each moth for the Snow Season, defined as Dec-Mar. How do I do that?
I keep trying to convert the format of the date to month-year but after I change it with
as.Date(avalancheslc$Date, format="%y-%m")
all the values for Date turn to NA's....help!
# write the webscraper
library(XML)
library(RCurl)
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
this.url<-paste(avalanche.url, page, sep=" ")
this.webpage<-htmlParse(getURL(this.url))
thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T)
avalanche<-rbind(avalanche,thispage.avalanche)
}
# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)
avalancheslc$monthyear<-format(as.Date(avalancheslc$Date),"%Y-%m")
# How can I tally the number of avalanches?
The final output of my dataset should be something like:
date avalanches
2000-1 18
2000-2 4
2000-3 10
2000-12 12
2001-1 52
This should work (I tried it on only 1 page, not all 203). Note the use of the option stringsAsFactors = F in the readHTMLTable function, and the need to add names because 1 column does not automatically get one.
library(XML)
library(RCurl)
library(dplyr)
avalanche <- data.frame()
avalanche.url <- "https://utahavalanchecenter.org/observations?page="
all.pages <- 0:202
for(page in all.pages){
this.url <- paste(avalanche.url, page, sep=" ")
this.webpage <- htmlParse(getURL(this.url))
thispage.avalanche <- readHTMLTable(this.webpage, which = 1, header = T,
stringsAsFactors = F)
names(thispage.avalanche) <- c('Date','Region','Location','Observer')
avalanche <- rbind(avalanche,thispage.avalanche)
}
avalancheslc <- subset(avalanche, Region == "Salt Lake")
str(avalancheslc)
avalancheslc <- mutate(avalancheslc, Date = as.Date(Date, format = "%m/%d/%Y"),
monthyear = paste(year(Date), month(Date), sep = "-"))

Resources