Integers change its values generating time series from dataframe in R - r

I have a list with dataframes inside it like this:
x = data.frame("city" = c("Madrid","Madrid","Madrid","Madrid"),
"date" = c('2018-11-01','2018-11-02','2018-11-03','2018-11-04'),
"visits" = c(100,200,80,38), "temp"=c(20,10,17,16))
list_of_cities= split(x, x$city) #In my original df there are a lot of cities
Then, to create a time series object (ts), I follow the next process:
madrid_data = select(list_of_cities[['Madrid']],date,visits,temp)
madrid = ts(madrid_data[,2:3], start = c(2018,305), frequency = 365)
In this example, the problem I have does not arise. However, with my original dataframe I get this:
How could I solve it? Thank you very much in advance

The problem comes from the type "integer64". It is needed to change integer64 to numeric, and in that way, everything is solved.
x$visits = as.numeric(x$visits)

Related

I am trying to Calculate for each year the standard deviation and average return in R

I started learning R three days ago so pls bear with me.... if you see any flaws in my code or calculations please call them out.
I have tried this, but get a error message every time:
table.AnnualizedReturns(Apple.Monthly.Returns[, 2:3, drop = FALSE], scale = 12,
Rf = 0, geometric = TRUE, digits = 4)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
As you can clearly see I have no clue what I am doing.
This is every line of code I have written this far:
Dates <- Data_Task2$`Names Date`[1801:2270]
as.numeric(Dates)
Dates <- ymd(Dates)
Monthly.Return <- Data_Task2$Returns[1801:2270]
Monthly.Return <- as.numeric(Monthly.Return)
Apple.Monthly.Returns <- data.frame(Dates, Monthly.Return)
Log.return = log(Monthly.Return + 1)
Apple.Monthly.Returns$Log.return = log(Apple.Monthly.Returns$Monthly.Return + 1)
You should check out the Tidyverse and specifically dplyr (https://dplyr.tidyverse.org/).
This gets you to a good starting point:
https://www.r-bloggers.com/2014/03/using-r-quickly-calculating-summary-statistics-with-dplyr/

Finding maximum or minimum date value for each individual

I have a dataframe in a wide format in R, denoting different visit dates for each individual (visitdate1, visitdate2, visitdate3, etc.). I'm trying to find the latest date for each individual and save it as a new column, but this doesn't seem to be working.
I checked the class of the dataframe and each visitdate is already recognized as a Date, so I don't know why the code is not working.
This is the code I tried:
df1$latestdate <- pmax(as_date(df1$visitdate1), as_date(df1$visitdate2),
as_date(df1$visitdate3))
The error I'm getting is the following:
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
The problem is that I'm asking R to find the maximum date value per row, not to convert any date (as it's already a date).
However, even when I leave as_date out of the code, I get the error that :
replacement has 0 rows, data has 120.
Any insight that might help? Thanks in advance! Btw, I'm new to R. :)
Below I provide an example, kind of guessing what your data looks like. pmax may not be the best thing for this.
DATES = seq(as.Date('2011-01-01'),as.Date('2017-01-01'),"months")
df = data.frame(id=1:10,
visitdate1 = sample(DATES,10),
visitdate2 = sample(DATES,10),
visitdate3 = sample(DATES,10)
)
#set columns to find row Max
COLUMNS = c("visitdate1","visitdate2","visitdate3")
df$latestdate = apply(df[,COLUMNS],1,max)

Rollapply in for loop

So, what I am doing is creating a heatmap for x and y coordinates. But I would like to do this for every 30 minute interval. For example the first heatmap will be created using the data from "00:00:00" to "00:30:00", then the next from "00:01:00" to "00:31:00".
What I need help with is writing a for loop that can extract these rows from a larger database and then spit out the heatmap for each bracket of data. I have been told that zoo::rollapply could be useful in the process but am not sure how it works.
The database has three columns x, y, and indiv.times. x and y are the coordinate systems and indiv.times is a character variable which contains the times in the format "13:04:46" for example.
for (i in ???) {
kde <- kde2d(x, y)
plot_ly(z = kde$z, type = "heatmap")
}
This is the code to create the heatmap so I really just need a way to extract the 30 minute intervals.
Any help would be greatly appreciated.
Here is a sample of the database:
structure(list(x = c(224.7666, 223.3886, 131.7025, 345.333),
y = c(60.7657, 85.73872, 77.35342, 26.24607), indiv.times = Sys.time() +
cumsum(60*sample(20, size = 10, replace = TRUE)), class = "data.frame", row.names = c(NA, -4L)))
So if anyone else is interested, I created an index i that has all the times from "00:00:00" all the way to "24:00:00". Than inside the for loop you just need to extract the rows from the data frame where df[time < i + 1800 & time > i,]. Make sure your times are in the time format and not just strings. Then you can perform any adjustments in the for loop using the new extracted data frame.

Changing the name of a dataframe with an = sign in it

My question is regarding changing the name of a dataframe that I imported using the quantmod package. I ran the following lines,
library(quantmod)
data <- getSymbols("GBP=x", from = "2013-01-01", to = "2017-06-01", src="yahoo")
Which then saved the data as GBP=x
I now want to change the name of this dataframe to something called "GBP".
I keep getting values and not a dataframe.
GBP GBP=x
When I run GBP <- as.data.frame('GBP=x') I just get a dataframe with the value of GBP=x - 1 observation of 1 variable.
Any help is much appreciated
(Alternatively if you can suggest a way to download FX data from quantmod storing it as a more convenient name that would do the trick also.
If I understand the documentation correctly,
data <- getSymbols("GBP=x", from = "2013-01-01", to = "2017-06-01", src="yahoo",auto.assign=FALSE)
will result in the FX data being stored in data.
Also, in case you have trouble finding the ` key, it's on the top left of most keyboards. It's used generally in R to enclose troublesome characters.
You need to use '`':
GBP = `GBP=X`
# remove the original dataframe from your workspace
rm(`GBP=X`)

Force xts() object to ts()

Data: DOWNLOAD .TXT
Code:
data = read.table("DistrBdaily1yrs.txt", header = TRUE, sep = "", dec = ",")
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
tseries = ts(dataXts, start = start(dataXts), end = end(dataXts), frequency = 6)
What I'm trying to do is to convert the xts dataXts object to a ts object with correct starting and ending date in order to use the decompose function. In this case start = start(dataXts) and end = end(dataXts) give me the right starting and ending date but tseries doesn't recognize the data column in dataXts and then think that all is data.
How can I fix this?
I am not sure I was able to "FORCE" xts to ts but i got the decompose part to function:
library("data.table")
# I was unable to read-in using read.table() for some reason.... used fread() as it is much faster
data <- fread("DistrBdaily1yrs.txt", header = TRUE, sep="\t")
# Set column names to the ones I saw on dropbox, as i was unable to read-in header for some reason!
colnames(data) <- c("DATE", "QUANTITY")
# Keep as-is
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
# Not sure what the "QUANTITY" Column means but it must be turned into "numeric"
# You can see this post on how to do it if the following is unsatisfactory:
# http://stackoverflow.com/questions/3605807/how-to-convert-numbers-with-comma-inside-from-character-to-numeric-in-r
a<-as.numeric(gsub(",",".",dataXts))
dataXts <- reclass(a, match.to=dataXts); colnames(dataXts)<- "QUANTITY"
# Now convert it to timeSeries
timeseries <- ts(dataXts,frequency=6)
# decompose
decompose(timeseries)
Also, when I convert xts to ts I assume that it will use the first and last dates in order to construct the ts which is why i left out start = start(dataXts), end = end(dataXts) in the ts() function. Also see ?ts since you cannot pass Dates in the start or end criteria, rather:
Either a single number or a vector of two integers, which specify a natural time unit and a (1-based) number of samples into the time unit.
You can always convert back to xts using reclass:
# for example: Say you only want the trend
reclass(decompose(timeseries)$trend,match.to=dataXts)

Resources