I have 19 weather stations and having approx. 15 years daily rainfall data. I want to interpolate the rainfall data for a certain location. I have written a function for the interpolation as below:
Then, I have written a for loop to make the interpolation for each day of the time series. However, my station's data are not consistent. They include NA values. For example station 1 covers [2000-2014], but station 2 [2007-2009], station 3 [2004-2012] etc.
For this reason I want to add an if statement if number of available station is more than 3, interpolate the data ....
My codes are given below.
The problem is that, my code interpolates the data for the period when all stations having data (the period without NA).
How can I fix this?
Thanks
read the data "matrix having 19 cols, 6500 rows"
example
prec <-matrix(rnorm(123500) , nrow = 6500, ncol = 19)
statlist is the data frame including station coordinates
distt is the function calculationg the distances of the stations to my centeral point
Distt <- function(statlist) {
pointx =661967
pointy =277271
coordx=statlist$X
coordy=statlist$Y
dist = sqrt((coordx-pointx)^2+(coordy-pointy)^2)
distance <<- dist
}
the loop calculates the distances for all stations
for (i in 1:19) {
Distt(statlist)
}
interpolation function
idwfunc <- function(prec,distance) {
weight = 1/distance
interp = sum(prec*weight)/sum(weight)
interp <<- interp
}
calculate how many stations have data in each day
val_per_day <- apply(prec, 1, function (x) sum(!is.na(x)))
this is the number of stations plot
count how many days having more than 3 measurement
ii<-which(val_per_day > 3)[2]
output
output <- as.data.frame(matrix(NA, nrow=nrow(prec), ncol=1, dimnames=list(1:nrow(prec),c("interpolated_prec")) ) )
this is the for loop to interpolate the data for each day
for (iii in seq(1, length(val_per_day), by=1)){
if (as.numeric(val_per_day[iii])>2) {
output[iii,1]<-idwfunc(prec[iii,],distance)
}else{
output[iii,1]<- NA
}
}
this for loop interpolating the data for only the period without any NA values. However, I need a loop to calculate if 3 or more data are available in a day.
Related
I'm brand new to R. I've only been coding with it for a month or so.
What I am trying to do is generate and assign x and y coordinates for a growing/spreading population at each time step using a for loop.
The problem I'm having is storing the returned x and y outputs from the for loop. Key issue is that the number of returns increases with each iteration and I would like them stored at each time step. For example, at T0 there is one return and at T20 there are 496.
I have tried creating a lattice, arrays, data frames, etc. Everything I have found online hasn't helped.
set.seed(29807)
Adults <- allYears[3,] # Adult tree population yrs 1:20
Time <- 21
dist <- abs(rnorm(Distance, mean = 51.5, sd = 63)) # dispersal distribution
# Distance refers to the data table with 29807 distances.
# CREATE A FOR LOOP WHERE YOU DRAW X NUMBER OF DISTANCES EQUIVALENT TO THE NUM OF ADULTS AT THAT TIME STEP
for (i in 1:Time) {
N <- sample(dist, size = round(Adults[i]), replace = FALSE)
theta <- runif(N,0,2*pi) # distance is assigned direction
x <- cos(theta)*N # assign variate a polar coordinate x
y <- sin(theta)*N # assign variate a polar coordinate y
}
I have a time series of 10 daily raster data and I need to calculate the -5 to +5 time steps cumulative sums according to the pixel values (=time 0) of a reference raster.
I am able to calculate the yearly cumulative sum (here below called ‘sum_YY’)
step1 create the 10 daily raster timeseries
require(raster)
idx <- seq(as.Date("2010/1/1"), as.Date("2011/12/31"), '10 days')
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, values = sample(x = c(-5:5),size = ncell(r), replace = T))))
s <- setZ(s, idx)
step 2 calculate the cumulative sum by year
Y <- unique(format(as.Date(getZ(s), format = "X%Y.%m.%d"), format = "%Y"))
sum_YY <- stack()
for (i in Y) {
dates <- format(getZ(s), "%Y") == i
t <- calc(subset(s, which(dates)), cumsum)
sum_YY <- stack(sum_YY,t)
}
But this is not the output I am looking for.
I need to use cumsum according to the reference raster 'refer'.
step 3 create the reference raster
refer <- raster(ncol=5, nrow=5)
refer <- setValues(refer,sample(x = c(15:25),size = ncell(refer), replace = T))
The ‘refer’ raster pixel values indicate the timestep ‘X’ and I want to calculate the cumulative sums from the ‘s stack’ timesteps X-5 to X+5 for each of the years
i.e. if ‘refer’ pixel value X = 20 I need to calculate the cumulative sums from the ‘s stack’ layers from 15 to 25 for each of the years. The same approach for the other pixels in 'refer'
I have no idea how to introduce the reference raster in the calculation, any help??
Thanks a million!!!
I can't think of a clean approach. But here is one way to do it:
rs <- stack(refer, s)
a <- calc(rs, fun=function(x) sum(x[(x[1]-4):(x[1]+6)]))
StackSelect was designed for problems like this, but I now realize that it needs more flexibility. It would be nice to be able to select multiple layers at once, and to apply a function. But you could do this (probably inefficient)
x <- calc(refer, function(x) (x-5):(x+5), forceapply=T)
y <- stackSelect(s, x)
z <- sum(y)
I wish to apply a set of pre-written functions to subsets of data in a data frame that progressively increase in size. In this example, the pre-written functions calculate 1) the distance between each consecutive pair of locations in a series of data points, 2) the total distance of the series of data points (sum of step 1), 3) the straight line distance between the start and end location of the series of data points and 4) the ratio between the straight line distance (step3) and the total distance (step 2). I wish to know how to apply these steps (and consequently similar functions) to sub-groups of increasing size within a data frame. Below are some example data and the pre-written functions.
Example data:
> dput(df)
structure(list(latitude = c(52.640715, 52.940366, 53.267749,
53.512608, 53.53215, 53.536443), longitude = c(3.305727, 3.103194,
2.973257, 2.966621, 3.013587, 3.002674)), .Names = c("latitude",
"longitude"), class = "data.frame", row.names = c(NA, -6L))
Latitude Longitude
1 52.64072 3.305727
2 52.94037 3.103194
3 53.26775 2.973257
4 53.51261 2.966621
5 53.53215 3.013587
6 53.53644 3.002674
Pre-written functions:
# Step 1: To calculate the distance between a pair of locations
pairdist = sapply(2:nrow(df), function(x) with(df, trackDistance(longitude[x-1], latitude[x-1], longitude[x], latitude[x], longlat=TRUE)))
# Step 2: To sum the total distance between all locations
totdist = sum(pairdist)
# Step 3: To calculate the distance between the first and end location
straight = trackDistance(df[1,2], df[1,1], df[nrow(df),2], df[nrow(df),1], longlat=TRUE)
# Step 4: To calculate the ratio between the straightline distance & total distance
distrat = straight/totdist
I would like to apply the functions firstly to a sub-group of only the first two rows (i.e. rows 1-2), then to a subgroup of the first three rows (rows 1-3), then four rows…and so on…until I get to the end of the data frame (in the example this would be a sub-group containing rows 1-6, but it would be nice to know how to apply this to any data frame).
Desired output:
Subgroup Totdist Straight Ratio
1 36.017 36.017 1.000
2 73.455 73.230 0.997
3 100.694 99.600 0.989
4 104.492 101.060 0.967
5 105.360 101.672 0.965
I have attempted to do this with no success and at the moment this is beyond my ability. Any advice would be very much appreciated!
There's a lot of optimization that can be done.
trackDistance() is vectorized, so you don't need apply for that.
to get a vectorized way of calculating the total distance, use cumsum()
You only need to calculate the pairwise distances once. Recalculating that every time you look at a different subset, is a waste of resources. So try to thinkg in terms of the complete data frame when constructing your functions.
To get everything in one function that outputs the desired data frame, you can do something along those lines :
myFun <- function(x){
# This is just to make typing easier in the rest of the function
lat <- x[["Latitude"]]
lon <- x[["Longitude"]]
nr <- nrow(x)
pairdist <-trackDistance(lon[-nr],lat[-nr],
lon[-1],lat[-1],
longlat=TRUE)
totdist <- cumsum(pairdist)
straight <- trackDistance(rep(lon[1],nr-1),
rep(lat[1],nr-1),
lon[-1],lat[-1],
longlat=TRUE)
ratio <- straight/totdist
data.frame(totdist,straight,ratio)
}
Proof of concept:
> myFun(df)
totdist straight ratio
1 36.01777 36.01777 1.0000000
2 73.45542 73.22986 0.9969293
3 100.69421 99.60013 0.9891346
4 104.49261 101.06023 0.9671519
5 105.35956 101.67203 0.9650005
Note that you can add extra arguments to define the latitude and longitude columns. And watch your capitalization, in your question you use Latitude in the data frame, but latitude (small l) in your code.
I have been looking for a way to select the frequency of values within a range in R from a sequence of matrices of sea surface temperature (sst) by month. These data are arranged as (sst, 80*40*172) i.e., (sst, longitude,latitude, month) for the north Atlantic. I'am interested in those frequencies because I use them to calculate the surface area of a range in sst for every month say between 4°C and 12°C isotherms. I have used R frequently for statistical analyses of time-series and spatial data, but I'am not used to programming so my approach is probably not the most efficient. I have succeeded in extracting frequencies for sst values of say >12 °C for a all latitudes in one month:
My data from IRI/LDEO climate research web site
dat <- read.table("c:/Temp/[Y+X]datatable.tsv", header=FALSE, sep="\t")
A 5x5 sample matrix of the first month
Nov 1981 30.5000 31.5000 32.5000 33.5000
9.50000 21.7906 21.9431 22.1324 22.1662
8.50000 21.7267 21.8573 21.9981 21.8757
7.50000 21.6644 21.7781 21.8960 21.7393
6.50000 21.5989 21.7025 21.8044 21.6304
I basically skip the longitude, row labels I also intend to add the date latter. First I extract the first month as a matrix (t) and apply two custom functions surface area of
lat strip SA and surface area of all lat strips for the month MSA.
ro=2:81
co=2:41
t <- as.matrix(dat[ro,co])
Then
SA <- function (lat,tmu){
l <- c(t[,lat]>=tmu,0)
la <- as.data.frame(l)
x <- la[,1]
n <- length(x)
sau <- array(0,n)
x. <- lat
for (i in 1:n) sau[i] <- (x[i]*111.320)*(cos(x.*(3.1415/180))*111.320)
s <- as.matrix(sum(sau))
}
MSA <- function(tmu){
m <-1:40
su <- array(0,0)
for (i in 1:40) su[i] <-SA(m[i],tmu)
ms <- as.data.frame(su)
sa <- as.data.frame(colSums (ms))
return(sa)
}
Functions SA and MSA or surface area an surface of one latitude (lat) strip SA and area of all strips for the month SAM, for temperature upper limit (tmu).
Matrix (s) from function SA has the surface area of sst> tmu(say 12 °C) for latitude(lat)(say at 30°N) and Matrix (sa) from function SAM has the sums of all the latitudes (from 30°N to 70°N).This does the job for one month and I can repeat the functions 12 times to obtain the year or 172 times for the 16 years this way:
I define a starting latitude (lat) then a step of 81 longitude cells (st) to extract next and the desired temperature;
lat= 30.5
st=81
t=12
Then calculate the total surface area for each month;
SA1 <- {
i=0*st
t <- as.matrix(dat[ro+i,co])
SA(lat,t)
MSA(12)
}
SA2 <- {
i=1*st
t <- as.matrix(dat[ro+i,co])
SA(30.5,t)
MSA(12)
}
My question is if it is possible to create a loop or a function that would iterate through all months that is 172 times and so skip repeating SA1,SA2...SA172. Thanks in advance.
If I understand your question, your code reduces to this
#create some dummy data
lat <- 30:33
lon <- 6:9
sst <- array(rnorm(length(lat) * length(lon) * 12, mean = 21, sd = 4),
dim = c(length(lat), length(lon), 12), dimnames = list(lat, lon, 1:12))
#the actual code
SA <- function (test, tmu){
colSums(test >= tmu) * 111.320 ^ 2 *
cos(as.numeric(rownames(test)) * (pi/180))
}
apply(sst, 3, SA, tmu = 21)
If this is not correct, please make your question more clear and provide a reproducible input and output dataset.
I have two data sets A and B and i wanna to find the correlation and plot the contour map.
A is just a simple vector with 230 stream flow data.
B is a complicated sea surface temperature(SST) data under a series date. On each date, the SST has a matrix of 360row *180columns of recorded temperatures.
The vector A (230 data) is :
Houlgrave_flow_1981_2000 = window(Houlgrave_flow_average, start = as.Date("1981-11-15"),end = as.Date("2000-12-15"))
Houlgrave_SF_1981_2000 = coredata(Houlgrave_flow_1981_2000)
The dimension of matrix B is shown below and i only use from 1 to 230.
> dim(ssta_sst)
[1] 360 180 362
My idea for finding correlation is below.
z_correlation = cor(Houlgrave_SF_SST_1981_2000,ssta_sst[c(181:360, 1:180),,i])
Try, i=1. However, it doesn't work.The error message says:
"Error in cor(Houlgrave_SF_SST_1981_2000, ssta_sst[c(181:360, 1:180), , :
incompatible dimensions.".
Also, this is my contour map code,
require(maps)
par(ask=TRUE)
for (i in 1:230) {
maps::map(database="world", fill=TRUE, col="light blue")
maps::map.axes()
contour(x=lon_sst, y=lat_sst, z=cor(Houlgrave_SF_1981_2000,ssta_sst[c(181:360, 1:180),,i]), zlim=c(-3,3), add=TRUE)
title(paste("Year: ", year_sst[i], ", Month: ", month_sst[i]))
}
I think i just need to modify z under contour code. Is it necessary to redefine each A's data as a 360*180 data matrix?
If I understand the problem correctly, you have a time series, i.e., a vector whose index can be interpreted as time, and a 3-dimensional array, whose indices can be interpreted as time and position.
# Sample data
n <- 230
m <- 100
dates <- seq.Date( from=Sys.Date(), length=n, by="day" )
flow <- rnorm(n)
names(flow) <- as.character(dates)
temperatures <- array( rlnorm(n*m*m), dim=c(n,m,m) )
dimnames( temperatures ) <- list(
time = as.character( dates ),
longitude = NULL,
latitude = NULL
)
For each position, you can compute the correlation between your "flow" time series and the "temperature" time series (u, in the code below) for that position, using apply.
correlations <- apply(
temperatures,
2:3,
function (u) cor(u, flow)
)
image(correlations)