Sum aggregation in elastic search

Sum aggregation in elastic search - r

I have a data (df) in this format. I need to covert the Time stamp (tweetCreatedAt) into a date object so that I can further manipulate the data.
tweetCreatedAt comment_text
1 2014-05-17T00:00:49.000Z #truthout: India Elects Hard-Right Hindu
2 2014-05-17T00:00:49.000Z Narendra Modi is welcome to visit US !
Any help?
I have tried the following
df[,1] <- lapply(df[,1],function(x) as.POSIXct(x, '%Y-%m-%dT%H:%M:%S'))
But now I'm getting the dates only and not the actual time.

Not sure if this is the problem, but it's a possible one.
As I've mentioned in my comment, the elements of a column could be values, or lists, due to the process that generated this dataset.
Check this example:
# simplified example
dt = read.table(text = "tweetCreatedAt comment_text
1 2014-05-17T00:00:49.000Z #truthout
2 2014-05-19T00:00:49.000Z Narendra", header=T)
dt$tweetCreatedAt = as.character(dt$tweetCreatedAt)
# data set looks like
dt
# tweetCreatedAt comment_text
# 1 2014-05-17T00:00:49.000Z #truthout
# 2 2014-05-19T00:00:49.000Z Narendra
as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')
# [1] "2014-05-17 00:00:49 BST" "2014-05-19 00:00:49 BST"
# let's manually change this element to a list
dt$tweetCreatedAt[2] = list(c("2014-05-19T00:00:49.000Z","2014-05-20T00:00:49.000Z"))
# data set now looks like this
dt
# tweetCreatedAt comment_text
# 1 2014-05-17T00:00:49.000Z #truthout
# 2 2014-05-19T00:00:49.000Z, 2014-05-20T00:00:49.000Z Narendra
as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')
# Error in as.POSIXct.default(dt$tweetCreatedAt, format = "%Y-%m-%dT%H:%M:%S") :
# do not know how to convert 'dt$tweetCreatedAt' to class “POSIXct”

Related

Is there a function in R to get closest next quarter end date for a given date?

I am trying to find out closest next quarter end date for a given date in R.
For example, if the input is "2022-02-23", the output should be "2022-03-31"
and if the input is "2022-03-07", the output should be "2022-06-30".
If the input is "2021-12-15", the output should be "2022-03-31".
Is there any function in R for this?

lubridate::quarter with argument type = "date_last" will get you most of the way there. From the comments, it looks like you want to jump to the following quarter if the date is in the last month of a quarter; we can achieve this by adding a month to each date before passing to quarter. We can add months safely using the %m+% operator.
library(lubridate)
dates_in <- ymd(c("2022-02-23", "2022-03-07", "2021-12-15"))
dates_out <- quarter(dates_in %m+% months(1), type = "date_last")
dates_out
# "2022-03-31" "2022-06-30" "2022-03-31"

Please see this kind of function using lubridate's quarter function
last_day_in_quarter <- function(d){
require(lubridate)
last_month_in_quarter <- ymd(paste(year(d),quarter(d)*3,1))
return(last_month_in_quarter %m+% months(1) - 1)
}
last_day_in_quarter(ymd("2021-12-15")) #"2021-12-31"
last_day_in_quarter(ymd("2022-02-15")) #"2022-03-31"
last_day_in_quarter(ymd("2021-05-15")) #"2021-06-30"
last_day_in_quarter(ymd("2021-07-15")) #"2021-09-30"

I think these kinds of problems become immensely easier to understand if you work with a true year-quarter-day type. There is one of these in the clock package (I am the author).
library(clock)
x <- date_parse(c("2022-02-23", "2022-03-07", "2021-12-15"))
x
#> [1] "2022-02-23" "2022-03-07" "2021-12-15"
# What quarter are we in now?
yqd <- as_year_quarter_day(x)
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-54" "2022-Q1-66" "2021-Q4-76"
# Is the current month the same as the end-of-quarter month?
# (if so, we are going to shift forward by 1 quarter).
shift <- get_month(x) == get_month(as.Date(set_day(yqd, "last")))
shift
#> [1] FALSE TRUE TRUE
# Shift by 1 quarter where applicable
yqd[shift] <- yqd[shift] + duration_quarters(1)
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-54" "2022-Q2-66" "2022-Q1-76"
# Set day to end of quarter
yqd <- set_day(yqd, "last")
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-90" "2022-Q2-91" "2022-Q1-90"
# Now convert back to Date
as.Date(yqd)
#> [1] "2022-03-31" "2022-06-30" "2022-03-31"

How to read quarterly data with R?

I'm trying to use Bayesian VAR, but I can't even get my data right properly. I get them from https://sdw.ecb.europa.eu/ but since a lot of them are quarterly data I have a problem to merge my variables since I'm unable to convert for example "2020-Q1" from char to date with as.Date.
I used the sub function to get 2020-1 for example and then tried as.Date(, format="%Y-%q) but it doesn't work, so I'm stuck.
textData <- "yearQuarter,Amount
2019-Q1,1000
2019-Q2,2000
2019-Q3,3000"
df <- read.csv(text=textData,header = TRUE,stringsAsFactors = FALSE)
as.Date(df$yearQuarter,format="%Y-%q")
...which produces:
> as.Date(df$yearQuarter,format="%Y-%q")
[1] NA NA NA
Thank you for your help !

library(lubridate)
d = yq("2020-Q1")
d
# [1] "2020-01-01"
year(d)
# [1] 2020
quarter(d)
# [1] 1

Convert character column to a list within the data frame

When I read the csv file into df, SoftwareOwner is a character column
> df
Software SoftwareOwner
<chr> <chr>
1 I-DEAS Siemens
2 TeamViewer Autodesk, TeamViewer, Siemens
3 Inventor PTC, Google, SpaceClaim, Bricys
4 AutoCAD Autodesk
I want to make SoftwareOwner a list within this data frame so I tried the simple solution
> df$SoftwareOwner <- as.list(df$SoftwareOwner)
But all this did was make each entry in the column a list with one entry
> df$SoftwareOwner[2]
[[1]]
[1] "Autodesk, TeamViewer, Siemens"
I've tried adding parameters like sep = "," and all.names = TRUE to as.list but neither worked. Is there any way to access just Autodesk or TeamViewer or Siemens when calling something like what I have just above?

Might I recommend making Siemens, Autodesk, Teamviewer, etc. their own columns and coding a 1 or 0 to indicate ownership? In my experience this is a far more flexible approach.

A possible solution :
# recreate your data.frame
df <- read.csv(text=
"Software;SoftwareOwner
I-DEAS;Siemens
TeamViewer;Autodesk, TeamViewer, Siemens
Inventor;PTC, Google, SpaceClaim, Bricys
AutoCAD;Autodesk",sep=";")
df$SoftwareOwner <- lapply(strsplit(as.character(df$SoftwareOwner),split=','),trimws)
# > df$SoftwareOwner
# [[1]]
# [1] "Siemens"
#
# [[2]]
# [1] "Autodesk" "TeamViewer" "Siemens"
#
# [[3]]
# [1] "PTC" "Google" "SpaceClaim" "Bricys"
#
# [[4]]
# [1] "Autodesk"
# > df$SoftwareOwner[[2]][3]
# [1] "Siemens"
# > df$SoftwareOwner[[3]][2]
# [1] "Google"

Why does R add an "x" when renaming raster stack layers

I have a raster stack/brick in R containing 84 layers and I am trying to name them according to year and month from 199911 to 200610 (November 1999 to October 2006). However for some reason R keeps adding an "X" onto the beginning of any names I give my layers.
Does anyone know why this is happening and how to fix it? Here are some of the ways I've tried:
# Import raster brick
rast <- brick("rast.tif")
names(rast)[1:3]
[1] "MonthlyRainfall.1" "MonthlyRainfall.2" "MonthlyRainfall.3"
## Method 1
names(rast) <- paste0(rep(1999:2006, each=12), 1:12)[11:94]
names(rast)[1:3]
[1] "X199911" "X199912" "X20001"
## Method 2
# Create a vector of dates
dates <- format(seq(as.Date('1999/11/1'), as.Date('2006/10/1'), by='month'), '%Y%m')
dates[1:3]
[1] "199911" "199912" "200001"
# Set names
rast <- setNames(rast, dates)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
## Method 3
names(rast) <- paste0("", dates)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
## Method 4
substr(names(rast), 2, 7)[1:3]
[1] "199911" "199912" "200001"
names(rast) <- substr(names(rast), 2, 7)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
To some extent I have been able to work around the problem by adding "X" to the beginning of some of my other data but now its reached the point where I can't do that any more. Any help would be greatly appreciated!

R won't allow the column to begin with a numeral so it prepends a character to avoid that restriction.

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot

Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)

R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.