creating time object in R - r

I have the following data
>d
2010-07-02
>t
835
I issue the following command
>dt<-paste(d,t)
>dt
"2010-07-02 835"
i then issue the following command, and it returns NA as below:
>dtt<-as.POSIXlt(strptime(dt,'%Y-%m-%d %H%M'))
>dtt
NA
so I made the following change
>t=1001
and now when I run
dt followed by dtt, it works fine, returning
dtt
"2010-07-02 10:01:00"
so, it seems to me that it is having a problem with the first digit of the hour being a 0, which is why when HHMM is less than 1000, it generates NA. can anyone please suggest how I fix this. thanks!

Use sprintf to format your time string before you pass it to as.POSIXlt:
d <- "2010-07-02"
h <- 835
dtt <- sprintf("%s %04d", d, h)
as.POSIXlt(dtt, format="%Y-%m-%d %H%M")
[1] "2010-07-02 08:35:00"
The string "%s %04d" tells sprintf to concatenate d as a string (%s) and h as a fixed width string of length 4, with leading zeroes (%04d).

Related

Error in converting character to time variable in r with Lubridate packages

Thanks for your help.
One of variable in my dataset looks like this:
> df$TM
> [1] "000054" "000020" "000056" "000051" "000025" "000116" "000219" "000207" "000233" "000206" "000142" "000126" "000237" "000215" "000236" "000246" "000219"
[18] "000227" "000803" "000920"...
The real meaning of each character is hours, minutes and seconds.
When I adjust hms function in Lubridates as follows
> df$TM <- hms(df$TM)
Warning message is coming: "In .parse_hms(..., order = "HMS", quiet = quiet) :
Some strings failed to parse, or all strings are NAs"
After that, all the values in the column changes to NA.
I also tried
> df$TM <- as.POSIXct(df$TM, format = "%H:%M:%S")
and
> df$TM <- chronicle(times = df$TM)
and
> df$TM <- strptime(df$TM, format = "%H:%M:%S")
but... these three trial also have same results.
(Actually all data has changed to NA, so warning message is same as error message to me)
I really appreciate your help.
You can make use of this answer to include a semicolon after every second element. After that you can transform the resulting character string as date (with day, month and year) or leave it as is.
For completeness, the solution for your problem then is
as.POSIXct(sub(":+$", "", gsub('(.{2})', '\\1:', df$TM)), format = "%H:%M:%S")

Concatenate multiple strings in R

I want to concatenate the below urls, I have written a below function to concatenate all the urls:
library(datetime)
library(lubridate)
get_thredds_url<- function(mon, hr){
a <-"http://abc.co.in/"
b <-"thredds/path/"
c <-paste0("%02d", ymd_h(mon))
d <-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e <-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c, hr))
url <-paste0(a,b,b,d)
return (url)
}
mon = datetime(2017, 9, 26, 0)
hr = 240
url = get_thredds_url(mon,hr)
print (url)
But I am getting below error when I execute the definition of get_thredds_url():
Error: unexpected ',' in:
" d<-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e<-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c,"
url <-paste0(a,b,b,d)
Error in paste0(a, b, b, d) : object 'a' not found
return (url)
Error: no function to return from, jumping to top level
}
Error: unexpected '}' in "}"
What is wrong with my function and how can I solve this?
The final output should be:
http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240
Using sprintf allows more control of values being inserted into string
library(lubridate)
get_thredds_url<- function(mon, hr){
sprintf("http://abc.co.in/thredds/path/%s/gfs.t%02dz.pgrb2.0p25.f%03d",
strftime(mon, format = "%Y%m%d%H", tz = "UTC"),
hour(mon),
hr)
}
mon <- make_datetime(2017, 9, 26, 0, tz = "UTC")
hr <- 240
get_thredds_url(mon, hr)
[1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
It was a bit messy to figure out what it is, you're trying to do. There seem to be quite a couple of contradicting pieces in your code, especially compared to your wanted final output. Therefore, I decided to focus on the wanted output and the inputs you provided in your variables.
get_thredds_url <- function(yr, mnth, day, hrs1, hrs2){
part1 <- "http://abc.co.in/"
part2 <- "thredds/path/"
ymdh <- c(yr, formatC(c(mnth, day, hrs1), width=2, flag="0"))
part3 <- paste0(ymdh, collapse="")
pre4 <- formatC(hrs1, width=2, flag="0")
part4 <- paste0("/gfs.t", pre4, "z.pgrb2.0p25.f", hrs2)
return(paste0(part1, part2, part3, part4))
}
get_thredds_url(2017, 9, 26, 0, 240)
# [1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
The key is using paste0() appropriately and I think formatC() may be new to some people (including me).
formatC() is used here to pad zeros in front of the number you provide, and thus makes sure that 9 is converted to 09, whereas 12 remains 12.
Note that this answer is in base R and does not require additional packages.
Also note that you should not use url and c as variable names. These names are already reserved for other functionalities in R. By using them as variable names, you are overwriting their actual purpose, which can (will) lead to problems at some point down the road

Truncate decimal to specified places

This seems like it should be a fairly easy problem to solve but I am having some trouble locating an answer.
I have a vector which contains long decimals and I want to truncate it to a specific number of decimals. I do not wish to round it, but rather just remove the values beyond my desired number of decimals.
For example I would like 0.123456789 to return 0.1234 if I desired 4 decimal digits. This is not an issue of printing a specific number of digits but rather returning the original value truncated to a given number.
Thanks.
trunc(x*10^4)/10^4
yields 0.1234 like expected.
More generally,
trunc <- function(x, ..., prec = 0) base::trunc(x * 10^prec, ...) / 10^prec;
print(trunc(0.123456789, prec = 4) # 0.1234
print(trunc(14035, prec = -2), # 14000
I used the technics above for a long time. One day I had some issues when I was copying the results to a text file and I solved my problem in this way:
trunc_number_n_decimals <- function(numberToTrunc, nDecimals){
numberToTrunc <- numberToTrunc + (10^-(nDecimals+5))
splitNumber <- strsplit(x=format(numberToTrunc, digits=20, format=f), split="\\.")[[1]]
decimalPartTrunc <- substr(x=splitNumber[2], start=1, stop=nDecimals)
truncatedNumber <- as.numeric(paste0(splitNumber[1], ".", decimalPartTrunc))
return(truncatedNumber)
}
print(trunc_number_n_decimals(9.1762034354551236, 6), digits=14)
[1] 9.176203
print(trunc_number_n_decimals(9.1762034354551236, 7), digits=14)
[1] 9.1762034
print(trunc_number_n_decimals(9.1762034354551236, 8), digits=14)
[1] 9.17620343
print(trunc_number_n_decimals(9.1762034354551236, 9), digits=14)
[1] 9.176203435
This solution is very handy in cases when its necessary to write to a file the number with many decimals, such as 16.
Just remember to convert the number to string before writing to the file, using format()
numberToWrite <- format(trunc_number_n_decimals(9.1762034354551236, 9), digits=20)
Not the most elegant way, but it'll work.
string_it<-sprintf("%06.9f", old_numbers)
pos_list<-gregexpr(pattern="\\.", string_it)
pos<-unlist(lapply(pos_list, '[[', 1)) # This returns a vector with the first
#elements
#you're probably going to have to play around with the pos- numbers here
new_number<-as.numeric(substring(string_it, pos-1,pos+4))

Scrape number of articles on a topic per year from NYT and WSJ?

I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
NYT WSJ
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <- as.data.frame(table(dat))
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
Thanks!
PS: This is my code that is not working (A NYT api key is needed http://developer.nytimes.com/apps/register)
# Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
# then load:
library(RJSONIO)
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F") # get them
res <- fromJSON(raw.data) # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F")
res <- fromJSON(raw.data)
dat <- append(dat, unlist(res$results)) # append
}
# aggregate counts for dates and coerce into a data frame
cts <- as.data.frame(table(dat))
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at https://github.com/rOpenGov/rtimes
There is a RNYTimes package created by Duncan Temple-Lang https://github.com/omegahat/RNYTimes - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
install.packages("devtools")
library(devtools)
install_github("rOpenGov/RNYTimes")
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
moocs$response$meta$hits
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880

Converting geo coordinates from degree to decimal

I want to convert my geographic coordinates from degrees to decimals, my data are as follows:
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281
I have this code but I can not see why it is does not work:
convert<-function(coord){
tmp1=strsplit(coord,"°")
tmp2=strsplit(tmp1[[1]][2],"\\.")
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]]))
return(dec[1]+dec[2]/60+dec[3]/3600)
}
don_convert=don1
for(i in 1:nrow(don1)){don_convert[i,2]=convert(as.character(don1[i,2])); don_convert[i,3]=convert(as.character(don1[i,3]))}
The convert function works but the code where I am asking the loop to do the job for me does not work.
Any suggestion is apperciated.
Use the measurements package from CRAN which has a unit conversion function already so you don't need to make your own:
x = read.table(text = "
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281",
header = TRUE, stringsAsFactors = FALSE)
Once your data.frame is set up then:
# change the degree symbol to a space
x$lat = gsub('°', ' ', x$lat)
x$long = gsub('°', ' ', x$long)
# convert from decimal minutes to decimal degrees
x$lat = measurements::conv_unit(x$lat, from = 'deg_dec_min', to = 'dec_deg')
x$long = measurements::conv_unit(x$long, from = 'deg_dec_min', to = 'dec_deg')
Resulting in the end product:
lat long
105252 30.4210666666667 9.02218333333333
105253 30.65395 8.18018333333333
105255 31.6293333333333 8.10066666666667
105258 31.6865 8.10928333333333
105259 31.68715 8.11036666666667
105260 31.6481833333333 8.10468333333333
Try using the char2dms function in the sp library. It has other functions that will additionally do decimal conversion.
library("sp")
?char2dms
A bit of vectorization and matrix manipulation will make your function much simpler:
x <- read.table(text="
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281",
header=TRUE, stringsAsFactors=FALSE)
x
The function itself makes use of:
strsplit() with the regex pattern "[°\\.]" - this does the string split in one step
sapply to loop over the vector
Try this:
convert<-function(x){
z <- sapply((strsplit(x, "[°\\.]")), as.numeric)
z[1, ] + z[2, ]/60 + z[3, ]/3600
}
Try it:
convert(x$long)
[1] 9.108611 8.391944 8.111111 8.254722 8.272778 8.178056
Disclaimer: I didn't check your math. Use at your own discretion.
Thanks for answers by #Gord Stephen and #CephBirk. Sure helped me out.
I thought I'd just mention that I also found that measurements::conv_unit doesn't deal with "E/W" "N/S" entries, it requires positive/negative degrees.
My coordinates comes as character strings "1 1 1W" and needs to first be converted to "-1 1 1".
I thought I'd share my solution for that.
df <- c("1 1 1E", "1 1 1W", "2 2 2N","2 2 2S")
measurements::conv_unit(df, from = 'deg_min_sec', to = 'dec_deg')
[1] "1.01694444444444" NA NA NA
Warning message:
In split(as.numeric(unlist(strsplit(x, " "))) * c(3600, 60, 1), :
NAs introduced by coercion
ewns <- ifelse( str_extract(df,"\\(?[EWNS,.]+\\)?") %in% c("E","N"),"+","-")
dms <- str_sub(df,1,str_length(df)-1)
df2 <- paste0(ewns,dms)
df_dec <- measurements::conv_unit(df2,
from = 'deg_min_sec',
to = 'dec_deg'))
df_dec
[1] "1.01694444444444" "-1.01694444444444" "2.03388888888889" "-2.03388888888889"
as.numeric(df_dec)
[1] 1.016944 -1.016944 2.033889 -2.033889
Have a look at the command degree in the package OSMscale.
As Jim Lewis commented before it seems your are using floating point minutes. Then you only concatenate two elements on
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]]))
Having degrees, minutes and seconds in the form 43°21'8.02 which as.character() returns "43°21'8.02\"", I updated your function to
convert<-function(coord){
tmp1=strsplit(coord,"°")
tmp2=strsplit(tmp1[[1]][2],"'")
tmp3=strsplit(tmp2[[1]][2],"\"")
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]][1]),as.numeric(tmp3[[1]]))
c<-abs(dec[1])+dec[2]/60+dec[3]/3600
c<-ifelse(dec[1]<0,-c,c)
return(c)
}
adding the alternative for negative coordinates, and works great for me . I still don't get why char2dms function in the sp library didn't work for me.
Thanks
Another less elegant option using substring instead of strsplit. This will only work if all your positions have the same number of digits. For negative co-ordinates just multiply by -1 for the correct decimal degree.
x$LatDD<-(as.numeric(substring(x$lat, 1,2))
+ (as.numeric(substring(x$lat, 4,9))/60))
x$LongDD<-(as.numeric(substring(x$long, 1,1))
+ (as.numeric(substring(x$long, 3,8))/60))

Resources