Converting time interval in R - r

My knowledge and experience of R is limited, so please bear with me.
I have a measurements of duration in the following form:
d+h:m:s.s
e.g. 3+23:12:11.931139, where d=days, h=hours, m=minutes, and s.s=decimal seconds. I would like to create a histogram of these values.
Is there a simple way to convert such string input into a numerical form, such as seconds? All the information I have found seems to be geared towards date-time objects.
Ideally I would like to be able to pipe a list of data to R on the command line and so create the histogram on the fly.
Cheers
Loris

Another solution based on SO:
op <- options(digits.secs=10)
z <- strptime("3+23:12:11.931139", "%d+%H:%M:%OS")
vec_z <- z + rnorm(100000)
hist(vec_z, breaks=20)
Short explanation: First, I set the option in such a way that the milliseconds are shown. Now, if you type z into the console you get "2012-05-03 23:12:11.93113". Then, I parse your string into a date-object. Then I create some more dates and plot a histogramm. I think the important step for you is the parsing and strptime should help you with that

I would do it like this:
str = "3+23:12:11.931139"
result = sum(as.numeric(unlist(strsplit(str, "[:\\+]", perl = TRUE))) * c(24*60*60, 60*60, 60, 1))
> result
[1] 342731.9
Then, you can wrap it into a function and apply over the list or vector.

Related

R: Building urls based on multiple variables of different lengths

I've been struggling to figure this out on my own, so reaching out for some assistance. I am trying to build urls based on multiple variables (months and years) of different lengths so that I have a url for each combination of month and year from the lists I created.
I've done something similar in Python but need to translate it into R, and I'm running into issues with building the function and for loops. Here's the Python code ..
# set years and months
oasis_market_yr = ('2020','2019','2018','2017','2016','2015','2014','2013','2012','2011')
oasis_market_mn = ('01','02','03','04','05','06','07','08','09','10','11','12')
# format url string
URL_FORMAT_STRING = 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year}_M{month}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year}{month}01T07:00-0000&enddatetime={year}{month}{last_day_of_month}T07:00-0000&version=1'
# create function to make urls
def make_url(year,month):
last_day_of_month = calendar.monthrange(int(year), int(month))[1]
return URL_FORMAT_STRING.format(year=year,month=month,last_day_of_month=last_day_of_month)
# build urls for download
for y in oasis_market_yr:
for m in oasis_market_mn:
url = make_url(y,m)
I've tried using sapply and mapply with str_glue and a few other methods but can't seem to replicate the outcome. I keep getting an error that reads: Error: Variables must be length 1 or 5. Or, for instance with mapply, it maps the first value in one list to the first in the other list and so on, then returns when the short list runs out of values. What I need is all the combinations from both lists.
Any assistance would be much appreciated.
Your syntax was a little too python and won't work like that in R.
In R, the same syntax would look like this:
# set years and months
oasis_market_yr = c('2020','2019','2018','2017','2016','2015','2014','2013','2012','2011')
oasis_market_mn = c('01','02','03','04','05','06','07','08','09','10','11','12')
# create function to make urls
make_url = function(year,month){
# format url string
URL_FORMAT_STRING = 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year}_M{month}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year}{month}01T07:00-0000&enddatetime={year}{month}{last_day_of_month}T07:00-0000&version=1'
lastdays = c(31,28,31,30,31,30,31,31,30,31,30,31)
if(as.integer(year)%%4==0 & as.integer(year)%%100 !=0){lastdays[2]=29}
last_day_of_month = as.character(lastdays[as.integer(month)])
fs = gsub("{month}",month,URL_FORMAT_STRING, fixed=T)
fs = gsub("{year}",year,fs, fixed=T)
fs = gsub("{last_day_of_month}",last_day_of_month, fs, fixed=T)
return(fs)
}
# build urls for download
for(y in oasis_market_yr){
for(m in oasis_market_mn){
url = make_url(y,m)
print(url)
}
}
As I am not aware of a direct correspondence of the string formatting method in R, I changed it to replacements (a = gsub(pattern, replacement, a) corresponds the python command a=a.replace(pattern,replacement). It should work beautifully.
Also, you don't really need a calendar package to get the last dates. Just offer it as a list and adjust it for leap days and Bob's your uncle.
I don't know whether the URLs that are generated are really the ones you need. But you might be able to work from this translation to correct it, if something is wrong.
An option using glue and lubridate. Note I added _i to the {month} and {year} variables to avoid confusion with the month and year functions in lubridate.
library(glue)
library(lubridate)
URL_FORMAT_STRING <- 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year_i}_M{month_i}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year_i}{month_i}01T07:00-0000&enddatetime={year_i}{month_i}{last_day_of_month}T07:00-0000&version=1'
make_url<- function(year_i, month_i){
last_day_of_month <- day(ceiling_date(my(paste(month_i, year_i)), 'month') - days(1))
glue(URL_FORMAT_STRING)
}
And then rather than a nested for loop you can use mapply to apply your function to all combinations of oasis_market_yr and oasis_market_mn.
df_vars <- expand.grid(year_i = oasis_market_yr, month_i = oasis_market_mn)
mapply(make_url, df_vars$year_i, df_vars$month_i)
# [1] "http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_2020_M01_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime=20200101T07:00-0000&enddatetime=20200131T07:00-0000&version=1"
# [2] "http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_2019_M01_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime=20190101T07:00-0000&enddatetime=20190131T07:00-0000&version=1"
#....

How to create a new column using function in R?

I have got a data frame with geographic position inside. The positions are strings.
This is my function to scrape the strings and get the positions by Degress.Decimal.
Example position 23º 30.0'N
latitud.decimal <- function(y) {
latregex <- str_match(y,"(\\d+)º\\s(\\d*.\\d*).(.)")
latitud <- (as.numeric(latregex[1,2])) +((as.numeric(latregex[1,3])) / 60)
if (latregex[1,4]=="S") {latitud <- -1*latitud}
return(latitud)
}
Results> 23.5
then I would like to create a new column in my original dataframe applying the function to every item in the Latitude column.
Is the same issue for the longitude. Another new column
I know how to do this using Python and Pandas buy I am newbie y R and cannot find the solution.
I am triying with
lapply(datos$Latitude, 2 , FUN= latitud.decimal(y))
but do not read the y "argument" which is every column value.
Note that the str_match is vectorized as stated in the help page of the function help("str_match").
For the sake of answering the question, I lack a reproducable example and data. This page describes how one can make questions that are more likely to be reproducable and thus obtain better answers.
As i lack data, and code, i cannot test whether i am actually hitting the spot, but i will give it a shot anyway.
Using the fact the str_match is vectorized, we can apply the entire function without using lapply, and thus create a new column simply. I'll slightly rewrite your function, to incorporate the vectorizations. Note the missing 1's in latregex[., .]
latitud.decimal <- function(y) {
latregex <- str_match(y,"(\\d+)º\\s(\\d*.\\d*).(.)")
latitud <- as.numeric(latregex[, 2]) + as.numeric(latregex[, 3]) / 60)
which_south <- which(latregex[, 4] == "S")
latitud[which_south] <- -latitud[which_south]
latitud
}
Now that the function is ready, creating a column can be done using the $ operator. If the data is very large, it can be performed more efficiently using the data.table. See this stackoverflow page for an example of how to assign via the data.table package.
In base R we would simply perform the action as
datos$new_column <- latitud.decimal(datos$Latitude)
datos$lat_decimal = sapply(datos$Latitude, latitud.decimal)

Converting time series into vector

I am looking for an approach to convert a time series data into vectors. An example of what I am trying to achieve is given below.
Data x = x1,x2,x3,..x100
Required vectors = V1(x1,x2,x3),V2(x2,x3,x4), V3(x3,x4,x5).. v98(x98,x99,x100)
I could convert the complete time series into Vector. But I do not know how I could achieve the above result.
Thanks for all leads.
I am trying this in R.
Use embed(x,98).
(Entering extra characters just to post this.)

Converting time-stamp to correct class in R

I have timestamp data that is of the "factor" class. It looks as follows:
"193:00:11" ; where it is hours:minutes:seconds ...
I am trying to convert this to the right timestamp class so I can perform calculations on it (like determine the mean, max, minimum etc.,). I have tried using lubridate, and doing:
hhmmss(df1$time) ; but this does not work and just gives me the seconds back.
Thank you for the help.
If the strings/factors are always in this format, this will give the number of seconds elapsed. The data must be in a character vector.
#example data
tm <- c("193:01:11", "96:22:47", "1:01:01", "2:02:02")
tmm <- matrix(as.numeric(unlist(strsplit(tm,":"))),ncol=3, byrow=T)
tmm %*% c(3600, 60, 1)

Unable to Convert Chi-Squared Values into a Numeric Column in R

I've been working on a project for a little bit for a homework assignment and I've been stuck on a logistical problem for a while now.
What I have at the moment is a list that returns 10000 values in the format:
[[10000]]
X-squared
0.1867083
(This is the 10000th value of the list)
What I really would like is to just have the chi-squared value alone so I can do things like create a histogram of the values.
Is there any way I can do this? I'm fine with repeating the test from the start if necessary.
My current code is:
nsims = 10000
for (i in 1:nsims) {cancer.cells <- c(rep("M",24),rep("B",13))
malig[i] <- sum(sample(cancer.cells,21)=="M")}
benign = 21 - malig
rbenign = 13 - benign
rmalig = 24 - malig
for (i in 1:nsims) {test = cbind(c(rbenign[i],benign[i]),c(rmalig[i],malig[i]))
cancerchi[i] = chisq.test(test,correct=FALSE) }
It gives me all I need, I just cannot perform follow-up analysis on it such as creating a histogram.
Thanks for taking the time to read this!
I'll provide an answer at the suggestion of #Dr. Mike.
hist requires a vector as input. The reason that hist(cancerchi) will not work is because cancerchi is a list, not a vector.
There a several ways to convert cancerchi, from a list into a format that hist can work with. Here are 3 ways:
hist(as.data.frame(unlist(cancerchi)))
Note that if you do not reassign cancerchi it will still be a list and cannot be passed directly to hist.
# i.e
class(cancerchi)
hist(cancerchi) # will still give you an error
If you reassign, it can be another type of object:
(class(cancerchi2 <- unlist(cancerchi)))
(class(cancerchi3 <- as.data.frame(unlist(cancerchi))))
# using the ldply function in the plyr package
library(plyr)
(class(cancerchi4 <- ldply(cancerchi)))
these new objects can be passed to hist directly
hist(cancerchi2)
hist(cancerchi3[,1]) # specify column because cancerchi3 is a data frame, not a vector
hist(cancerchi4[,1]) # specify column because cancerchi4 is a data frame, not a vector
A little extra information: other useful commands for looking at your objects include str and attributes.

Resources