So, what I am doing is creating a heatmap for x and y coordinates. But I would like to do this for every 30 minute interval. For example the first heatmap will be created using the data from "00:00:00" to "00:30:00", then the next from "00:01:00" to "00:31:00".
What I need help with is writing a for loop that can extract these rows from a larger database and then spit out the heatmap for each bracket of data. I have been told that zoo::rollapply could be useful in the process but am not sure how it works.
The database has three columns x, y, and indiv.times. x and y are the coordinate systems and indiv.times is a character variable which contains the times in the format "13:04:46" for example.
for (i in ???) {
kde <- kde2d(x, y)
plot_ly(z = kde$z, type = "heatmap")
}
This is the code to create the heatmap so I really just need a way to extract the 30 minute intervals.
Any help would be greatly appreciated.
Here is a sample of the database:
structure(list(x = c(224.7666, 223.3886, 131.7025, 345.333),
y = c(60.7657, 85.73872, 77.35342, 26.24607), indiv.times = Sys.time() +
cumsum(60*sample(20, size = 10, replace = TRUE)), class = "data.frame", row.names = c(NA, -4L)))
So if anyone else is interested, I created an index i that has all the times from "00:00:00" all the way to "24:00:00". Than inside the for loop you just need to extract the rows from the data frame where df[time < i + 1800 & time > i,]. Make sure your times are in the time format and not just strings. Then you can perform any adjustments in the for loop using the new extracted data frame.
Related
I started learning R three days ago so pls bear with me.... if you see any flaws in my code or calculations please call them out.
I have tried this, but get a error message every time:
table.AnnualizedReturns(Apple.Monthly.Returns[, 2:3, drop = FALSE], scale = 12,
Rf = 0, geometric = TRUE, digits = 4)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
As you can clearly see I have no clue what I am doing.
This is every line of code I have written this far:
Dates <- Data_Task2$`Names Date`[1801:2270]
as.numeric(Dates)
Dates <- ymd(Dates)
Monthly.Return <- Data_Task2$Returns[1801:2270]
Monthly.Return <- as.numeric(Monthly.Return)
Apple.Monthly.Returns <- data.frame(Dates, Monthly.Return)
Log.return = log(Monthly.Return + 1)
Apple.Monthly.Returns$Log.return = log(Apple.Monthly.Returns$Monthly.Return + 1)
You should check out the Tidyverse and specifically dplyr (https://dplyr.tidyverse.org/).
This gets you to a good starting point:
https://www.r-bloggers.com/2014/03/using-r-quickly-calculating-summary-statistics-with-dplyr/
I have a list with dataframes inside it like this:
x = data.frame("city" = c("Madrid","Madrid","Madrid","Madrid"),
"date" = c('2018-11-01','2018-11-02','2018-11-03','2018-11-04'),
"visits" = c(100,200,80,38), "temp"=c(20,10,17,16))
list_of_cities= split(x, x$city) #In my original df there are a lot of cities
Then, to create a time series object (ts), I follow the next process:
madrid_data = select(list_of_cities[['Madrid']],date,visits,temp)
madrid = ts(madrid_data[,2:3], start = c(2018,305), frequency = 365)
In this example, the problem I have does not arise. However, with my original dataframe I get this:
How could I solve it? Thank you very much in advance
The problem comes from the type "integer64". It is needed to change integer64 to numeric, and in that way, everything is solved.
x$visits = as.numeric(x$visits)
Please help me as I am new to R and also programming
I am trying to write a loop in such that it should read the data for every 1000 rows and create a data-set in r
Following is my trial
for(i in 0:nl){
df[i] = fread('RM.csv',skip = 1000*i, nrows =1000,
col.names = colnames(read.csv('RM.csv', nrow=1, header = T)))
}
where nl is a integer and is equal to length of data 'RM.csv'
What I am trying to do is create a function which will skip every 1000 rows and read next 1000 rows and terminates once it reaches nl which is length of original data.
Now it is not mandatory to use only this approach.
You can try reading in the entire file into a single data frame, and then subsetting off the rows you don't want:
df <- read.csv('RM.csv', header=TRUE)
y <- seq(from = 0, to = 100000, by = 1) # replace the 'to' value with a value
seq.keep <- y[floor(y / 1000) %% 2 == 0] # large enough for the whole file
df.keep <- df[seq.keep, ]
Here is a rather messy demo which shows that the above sequence logic be correct:
Demo
You can inspect that the sequence generated is:
0-999
2000-2999
4000-4999
etc.
As mentioned in the code comment, make sure you generate a sequence large enough to accommodate the actual size of the data frame.
If you need to continue with your current approach, then try reading in only every other 1000 lines, e.g.
sq <- seq(from=0, to=nl, by=2)
names <- colnames(read.csv('RM.csv', nrow=1, header=TRUE))
for(i in sq) {
df_i <- fread('RM.csv', skip=1000*i, nrows=1000, col.names=names)
# process this chunk and move on
}
I am trying to create a NetCDF from a .csv file. I have read several tutorials here and other places and still have some doubts.
I have a table according to this:
lat,long,time,rh,temp
41,-109,6,1,1
40,-107,18,2,2
39,-105,6,3,3
41,-103,18,4,4
40,-109,6,5,2
39,-107,18,6,4
I create the NetCDF using the ncdf4 package in R.
xvals <- data$lon
yvals <- data$lat
nx <- length(xvals)
ny <- length(yvals)
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
mv <- -999 #missing value to use
var_temp <- ncvar_def("temperatura", "celsius", list(lon1, lat2, time), longname="Temp. da superfĂcie", mv)
var_rh <- ncvar_def("humidade", "%", list(lon1, lat2, time), longname = "humidade relativa", mv )
ncnew <- nc_create(filename, list(var_temp, var_rh))
ncvar_put(ncnew, var_temp, dadostemp, start=c(1,1,1), count=c(nx,ny,nt))
When I follow the procedure it states that the NC expects 3 times the number of data that I have.
I understand why, one matrix for each dimension, since I stated that the variables are according to the Longitude, Latitude and Time.
So, how would I import this kind of data, where I already have one Lon, Lat, Time and other variables for each data acquisition?
Could someone shed some light?
PS: The data used here is not my real data, just some example I was using for the tutorials.
I think there is more than one problem in your code. Step by step:
Create dimensions
In a nc file dimensions don't work as key-values there just a vector of values defining what each position in a variable array means.
This means you should create your dimensions like this:
xvals <- unique(data$lon)
xvals <- xvals[order(xvals)]
yvals <- yvals[order(unique(data$lat))]
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
time_d <- ncdim_def("time","h",unique(time))
Where I work we use unlimited dimensions as mere indexes while a 1d-variable with same name as the dimension holds the values. I'm not sure how unlimited dimensions work in R. Since you don't ask for it I leave this out :-)
define variables
mv <- -999 #missing value to use
var_temp <- ncvar_def("temperatura", "celsius",
list(lon1, lat2, time_d),
longname="Temp. da superfĂcie", mv)
var_rh <- ncvar_def("humidade", "%",
list(lon1, lat2, time_d),
longname = "humidade relativa", mv )
add data
Create an nc file: ncnew <- nc_create(f, list(var_temp, var_rh))
When adding values the object holding the data is molten to a 1d-array and a sequential write is started at the position specified by start. The dimension to write along is controlled by the values in count. If you have data like this:
long, lat, time, t
1, 1, 1, 1
2, 1, 1, 2
1, 2, 1, 3
2, 2, 1, 4
The command ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1)) would give you what you (probably) expect.
For you're data the first step is to create the indexes for the dimensions:
data$idx_lon <- match(data$long,xvals)
data$idx_lat <- match(data$lat,yvals)
data$idx_time <- match(data$time,unique(time))
Then create an array with the dimensions appropriate for your data:
m <- array(mv,dim = c(length(yvals),length(xvals),length(unique(time))))
Then fill the array with you're values:
for(i in 1:NROW(data)){
m[data$idx_lat[i],data$idx_lon[i],data$idx_time[i]] <- data$temp[i]
}
if speed is a concern you could calculate the linear index vectorised and use this for value assignment.
Write the data
ncvar_put(ncnew, var_temp,m)
Note that you don't need start and count.
Finally close the nc file to write data to the disk nc_close(ncnew)
Optionally I would recommend you the ncdump console command to check your file.
Edit
Regarding your question whether to write a complete array or use start and count I believe both methods work reliable. Which one to prefer depends on your data and you're personal preferences.
I think the method of building an array, add the values and then write it as whole is easier to understand. However, when asking what is more efficient it depends on the data. If you're data is big and has many NA values I believe using multiple writes with start and count could be faster. If NA's are rare creating one matrix and do single write would be faster. If you're data is so big creating an extra array would exceed you're available memory you have to combine both methods.
I am organizing weather data into netCDF files in R. Everything goes fine until I try to populate the netcdf variables with data, because it is asking me to specify only one dimension for two-dimensional variables.
library(ncdf)
These are the dimension tags for the variables. Each variable uses the Threshold dimension and one of the other two dimensions.
th <- dim.def.ncdf("Threshold", "level", c(5,6,7,8,9,10,50,75,100))
rt <- dim.def.ncdf("RainMinimum", "cm", c(5, 10, 25))
wt <- dim.def.ncdf("WindMinimum", "m/s", c(18, 30, 50))
The variables are created in a loop, and there are a lot of them, so for the sake of easy understanding, in my example I'll only populate the list of variables with one variable.
vars <- list()
v1 <- var.def.ncdf("ARMM_rain", "percent", list(th, rt), -1, prec="double")
vars[[length(vars)+1]] <- v1
ncdata <- create.ncdf("composite.nc", vars)
I use another loop to extract data from different data files into a 9x3 data frame named subframe while iterating through the variables of the netcdf file with varindex. For the sake of reproducing, I'll give a quick initialization for these values.
varindex <- 1
subframe <- data.frame(matrix(nrow=9, ncol=3, rep(.01, 27)))
The desired outcome from there is to populate each ncdf variable with the contents of subframe. The code to do so is:
for(x in 1:9) {
for(y in 1:3) {
value <- ifelse(is.na(subframe[x,y]), -1, subframe[x,y])
put.var.ncdf(ncdata, varindex, value, start=c(x,y), count=1)
}
}
The error message is:
Error in put.var.ncdf(ncdata, varindex, value, start = c(x, y), count = 1) :
'start' should specify 1 dims but actually specifies 2
tl;dr: I have defined two-dimensional variables using ncdf in R, I am trying to write data to them, but I am getting an error message because R believes they are single-dimensional variables instead.
Anyone know how to fix this error?