Creating Netcdf files issue - r

I have created some netcdf files in R before, but right now, I am having some problems to create a netcdf file that I don't know how handle it. I have been looking for the error but I am not sure why it is. Given that my data is too long, I include a smaller sample to give an idea about the structure:
#data.frame with the date and the values
dat <-dput(y.or[1:10,])
structure(list(date = structure(c(852073200, 852159600, 852246000,
852332400, 852418800, 852505200, 852591600, 852678000, 852764400,
852850800), class = c("POSIXct", "POSIXt"), tzone = ""), dymax = c(79.125,
75.375, 78, 72.375, 76.375, 76.571, 76.125, 82.75, 86.125, 86
)), .Names = c("date", "dymax"), row.names = c("1997-01-01.01",
"1997-01-01.02", "1997-01-01.03", "1997-01-01.04", "1997-01-01.05",
"1997-01-01.06", "1997-01-01.07", "1997-01-01.08", "1997-01-01.09",
"1997-01-01.10"), class = "data.frame")
#****Creating Netcdf files********
#One lat and lon, and 5478 days (14 years)
missval <- -999
dimX <- dim.def.ncdf( "longitude", "degrees_east",10)
dimY <- dim.def.ncdf( "latitude", "degrees_north", 50)
dimT <- dim.def.ncdf("time",as.Date(dates[1]),as.numeric(dates))
#Def.variable
var <- var.def.ncdf(name="max8hO3","ppb",list(dimX,dimY,dimT), missval=missval, longname="max8hO3",prec="double")
#creating the file
fil <- create.ncdf("fileout.nc",var)
Then, before put the variable into the file , I have:
Error in nc$var[[nc$varid2Rindex[varid]]] :
attempt to select less than one element
I am sure that I am missing something...but I don't know, any idea???
I really appreciate some help, thanks!

Related

Prediction on time series analysis using ARIMA in R

I am new to programming and am attempting to create a prediction model for multiple articles.
Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.
However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.
I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png
Here is my dput(sales_data) :
structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))
And also provided the code I have constructed so far with the help of blogs and websites :
library(forecast)
library(reshape2)
sales_data <- read.csv("sales_data.csv", header = TRUE)
sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")
for(i in 1:nrow(sales_data_long)) {
sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
arima_fit <- auto
arima_forecast <- forecast(arima_fit, h = 18)
print(arima_forecast)
print("Article: ", Code article[i])
}
With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"
Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.
Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?
Added the dput output as per jrcalabrese request.
Swapped to the replacement for reshape2 (tidyr).
Used pivot_longer.
Now doesn't give error, which was happening in reshape2::melt.
It doesn't matter so much what the csv structure is. Your structure was fine.
Hope this helps! :-)
library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")
sales_data_long <- sales_data |> pivot_longer(!var1,
names_to = "month",
values_to = "count")

R plotting annual data and "January" repeated at end of graph

I'm fairly new to R and am trying to plot some expenditure data. I read the data in from excel and then do some manipulation on the dates
data <- read.csv("Spending2019.csv", header = T)
#converts time so R can use the dates
strdate <- strptime(data$DATE,"%m/%d/%Y")
newdate <- cbind(data,strdate)
finaldata <- newdate[order(strdate),]
This probably isn't the most efficient, but it gets me there :)
Here's the relevant columns of the first four lines of my finaldata dataframe
dput(droplevels(finaldata[1:4,c(5,7)]))
structure(list(AMOUNT = c(25.13, 14.96, 43.22, 18.43), strdate = structure(c(1546578000,
1546750800, 1547010000, 1547010000), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, 4L), class = "data.frame")
The full data set has 146 rows and the dates range from 1/4/2019 to 12/30/2019
I then plot the data
plot(finaldata$strdate,finaldata$AMOUNT, xlab = "Month", ylab = "Amount Spent")
and I get this plot
This is fine for me getting started, EXCEPT why is JAN repeated at the far right end? I have tried various forms of xlim and can't seem to get it to go away.

Importing excel file with read_excel function: Date columns is not correctly imported

I did the following code to import a excel file in Rstudio:
(nms <- names(read_excel("myexcelfile.xlsx")))
(ct <- ifelse(grepl("^Date", nms), "text", "numeric"))
read_excel("myexcelfile.xlsx", col_types = ct)[-c(6:495),-c(3:71)]
The resul is this dataframe below:
structure(list(Data = c("41731", "41730", "41729", "41726", "41725"
), ABEV3 = c(15.2, 14.9, 15.22, 15.15, 15.18)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
The first column should be the dates (Brazilian Format: Day/Month/Year).
How can I fix this?
Try this:
(nms <- names(read_excel("myexcelfile.xlsx")))
(ct <- ifelse(grepl("^Date", nms), "date", "numeric"))
df <- read_excel("myexcelfile.xlsx", col_types = ct)[-c(6:495),-c(3:71)]
df$Date <- format(as.Date(as.character(df$Date)), "%d-%m-%Y")
Instead of importing as text, import it as date and then format it in the way described below (there I suppose that you will save the Excel in the df dataframe and that the column will be called Date, but adjust as needed).

Filling gaps of time data with zero-values

In my data https://pastebin.com/CernhBCg I have irregular timestamps and a corresponding value. Additionally to the irregularity I have large gaps, for which I have no value in my data. I know however that for those gaps value is zero and I would like to fill up to gaps with rows with value=0. How can I do this?
Data
> dput(head(hub2_select,10))
structure(list(time = structure(c(1492033212.648, 1492033212.659,
1492033212.68, 1492033212.691, 1492033212.702, 1492033212.724,
1492033212.735, 1492033212.757, 1492033212.768, 1492033212.779
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), value = c(3,
28, 246, 297, 704, 798, 1439, 1606, 1583, 1572)), .Names = c("time",
"value"), row.names = c(NA, 10L), class = "data.frame")
Please take the file I provided to see the data and read it into R with
library(readr)
df <- read_csv("data.csv", col_types = list(time = col_datetime(), value = col_double()))
Solutions
For one the the values left and right of a gap are usually 0 or 1. So that might help. I thought I'd use a rolling join, but from I understand by now, this seems not be the way to go.
What works is
library(dplyr)
library(lubridate)
threshold_time = dseconds(2)
time_prev = df$time[1]
addrows = data.frame()
for (i in seq(2, nrow(df),1)){
time_current <- df$time[i]
if ((time_current - time_prev) > threshold_time){
time_add <- seq(time_prev, time_current, dseconds(0.1))
addrows = bind_rows(addrows, data.frame(time=time_add, value=rep(0, length(time_add))))
}
time_prev <- time_current
}
addrows$type <- 'filled'
df$type <- 'orig'
df_new <- bind_rows(df, addrows)
library(ggplot2)
ggplot(df_new, aes(time,value,color=type)) + geom_point()
But this solution is neither elegant nor efficient (I did not test efficiency though).
Honestly I haven't tried it yet (I had to switch to Python for other reasons and solved it there and didn't get around to try it out), but I am pretty sure https://cran.r-project.org/web/packages/padr/vignettes/padr.html would have been the answer. I just wanted to write this here for other readers with the same question.

R Error: index is not in increasing order

NOTE: PROBLEM RESOLVED IN THE COMMENTS BELOW
I'm getting the following error when trying to turn a data.frame into xts following the answer in found here.
Error in .xts(DA[, 3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", :
index is not in increasing order
I've not been able to find much on this error or how to resolve it, so any help towards that would be greatly appreciated.
The data is daily S&P 500 in a comma delimited format with the following columns: "Date" "Time" "Open" "High" "Low" "Close".
Below is the code:
DA <- read.csv("SNP.csv", header = TRUE, stringsAsFactors = FALSE)
DAINDEX <- paste(DA$Date, DA$Time, sep = " ")
Data.hist <- .xts(DA[,3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", tzone = "GMT"))
As requested, some lines of the data
structure(list(Date = c("5/20/2016", "5/19/2016", "5/18/2016",
"5/17/2016", "5/16/2016", "5/13/2016"), Time = c("0:00:00", "0:00:00",
"0:00:00", "0:00:00", "0:00:00", "0:00:00"), Open = c(2041.880005,
2044.209961, 2044.380005, 2065.040039, 2046.530029, 2062.5),
High = c(2058.350098, 2044.209961, 2060.610107, 2065.689941,
2071.879883, 2066.790039), Low = c(2041.880005, 2025.910034,
2034.48999, 2040.819946, 2046.530029, 2043.130005), Close = c(2052.320068,
2040.040039, 2047.630005, 2047.209961, 2066.659912, 2046.609985
)), .Names = c("Date", "Time", "Open", "High", "Low", "Close"
), row.names = c(NA, 6L), class = "data.frame")
The above is the output of dput(head(DA))
The easiest thing to do is use the regular xts constructor instead of .xts. It will check if the index is sorted correctly, and sort the index and data, if necessary.
Data.hist <- xts(DA[,3:6], as.POSIXct(DAINDEX, "%m/%d/%Y %H:%M:%S", "GMT"))

Resources