Loop or function for sequence of matrices of sst data - r

I have been looking for a way to select the frequency of values within a range in R from a sequence of matrices of sea surface temperature (sst) by month. These data are arranged as (sst, 80*40*172) i.e., (sst, longitude,latitude, month) for the north Atlantic. I'am interested in those frequencies because I use them to calculate the surface area of a range in sst for every month say between 4°C and 12°C isotherms. I have used R frequently for statistical analyses of time-series and spatial data, but I'am not used to programming so my approach is probably not the most efficient. I have succeeded in extracting frequencies for sst values of say >12 °C for a all latitudes in one month:
My data from IRI/LDEO climate research web site
dat <- read.table("c:/Temp/[Y+X]datatable.tsv", header=FALSE, sep="\t")
A 5x5 sample matrix of the first month
Nov 1981 30.5000 31.5000 32.5000 33.5000
9.50000 21.7906 21.9431 22.1324 22.1662
8.50000 21.7267 21.8573 21.9981 21.8757
7.50000 21.6644 21.7781 21.8960 21.7393
6.50000 21.5989 21.7025 21.8044 21.6304
I basically skip the longitude, row labels I also intend to add the date latter. First I extract the first month as a matrix (t) and apply two custom functions surface area of
lat strip SA and surface area of all lat strips for the month MSA.
ro=2:81
co=2:41
t <- as.matrix(dat[ro,co])
Then
SA <- function (lat,tmu){
l <- c(t[,lat]>=tmu,0)
la <- as.data.frame(l)
x <- la[,1]
n <- length(x)
sau <- array(0,n)
x. <- lat
for (i in 1:n) sau[i] <- (x[i]*111.320)*(cos(x.*(3.1415/180))*111.320)
s <- as.matrix(sum(sau))
}
MSA <- function(tmu){
m <-1:40
su <- array(0,0)
for (i in 1:40) su[i] <-SA(m[i],tmu)
ms <- as.data.frame(su)
sa <- as.data.frame(colSums (ms))
return(sa)
}
Functions SA and MSA or surface area an surface of one latitude (lat) strip SA and area of all strips for the month SAM, for temperature upper limit (tmu).
Matrix (s) from function SA has the surface area of sst> tmu(say 12 °C) for latitude(lat)(say at 30°N) and Matrix (sa) from function SAM has the sums of all the latitudes (from 30°N to 70°N).This does the job for one month and I can repeat the functions 12 times to obtain the year or 172 times for the 16 years this way:
I define a starting latitude (lat) then a step of 81 longitude cells (st) to extract next and the desired temperature;
lat= 30.5
st=81
t=12
Then calculate the total surface area for each month;
SA1 <- {
i=0*st
t <- as.matrix(dat[ro+i,co])
SA(lat,t)
MSA(12)
}
SA2 <- {
i=1*st
t <- as.matrix(dat[ro+i,co])
SA(30.5,t)
MSA(12)
}
My question is if it is possible to create a loop or a function that would iterate through all months that is 172 times and so skip repeating SA1,SA2...SA172. Thanks in advance.

If I understand your question, your code reduces to this
#create some dummy data
lat <- 30:33
lon <- 6:9
sst <- array(rnorm(length(lat) * length(lon) * 12, mean = 21, sd = 4),
dim = c(length(lat), length(lon), 12), dimnames = list(lat, lon, 1:12))
#the actual code
SA <- function (test, tmu){
colSums(test >= tmu) * 111.320 ^ 2 *
cos(as.numeric(rownames(test)) * (pi/180))
}
apply(sst, 3, SA, tmu = 21)
If this is not correct, please make your question more clear and provide a reproducible input and output dataset.

Related

Why my interpolation function passing the days with NA values

I have 19 weather stations and having approx. 15 years daily rainfall data. I want to interpolate the rainfall data for a certain location. I have written a function for the interpolation as below:
Then, I have written a for loop to make the interpolation for each day of the time series. However, my station's data are not consistent. They include NA values. For example station 1 covers [2000-2014], but station 2 [2007-2009], station 3 [2004-2012] etc.
For this reason I want to add an if statement if number of available station is more than 3, interpolate the data ....
My codes are given below.
The problem is that, my code interpolates the data for the period when all stations having data (the period without NA).
How can I fix this?
Thanks
read the data "matrix having 19 cols, 6500 rows"
example
prec <-matrix(rnorm(123500) , nrow = 6500, ncol = 19)
statlist is the data frame including station coordinates
distt is the function calculationg the distances of the stations to my centeral point
Distt <- function(statlist) {
pointx =661967
pointy =277271
coordx=statlist$X
coordy=statlist$Y
dist = sqrt((coordx-pointx)^2+(coordy-pointy)^2)
distance <<- dist
}
the loop calculates the distances for all stations
for (i in 1:19) {
Distt(statlist)
}
interpolation function
idwfunc <- function(prec,distance) {
weight = 1/distance
interp = sum(prec*weight)/sum(weight)
interp <<- interp
}
calculate how many stations have data in each day
val_per_day <- apply(prec, 1, function (x) sum(!is.na(x)))
this is the number of stations plot
count how many days having more than 3 measurement
ii<-which(val_per_day > 3)[2]
output
output <- as.data.frame(matrix(NA, nrow=nrow(prec), ncol=1, dimnames=list(1:nrow(prec),c("interpolated_prec")) ) )
this is the for loop to interpolate the data for each day
for (iii in seq(1, length(val_per_day), by=1)){
if (as.numeric(val_per_day[iii])>2) {
output[iii,1]<-idwfunc(prec[iii,],distance)
}else{
output[iii,1]<- NA
}
}
this for loop interpolating the data for only the period without any NA values. However, I need a loop to calculate if 3 or more data are available in a day.

Create centroid tag within a radius for longitude-latitude data in R

I have longitude-latitude data (220000 observation [duplicates included]) and I want to create two new columns defining the number of points within 5km and 10 km radius.
My problem is the vector size, meaning that the algorithm does not run for this size of data.
Here is my code to create the first column, radius_5km.
long <- runif(n=220000,min=-153,max=174)
lat <- runif(n=220000, min=-42,max=67)
df <- data.frame(long,lat)
library(geosphere)
dfgesphe <- cbind(df,Radius_5km=rowSums(distm(df[,c("long","lat")],
fun = distHaversine) / 1000 <= 5)) # number of points within distance 5 km

Storing results from a for loop where the number of returns increases each iteration

I'm brand new to R. I've only been coding with it for a month or so.
What I am trying to do is generate and assign x and y coordinates for a growing/spreading population at each time step using a for loop.
The problem I'm having is storing the returned x and y outputs from the for loop. Key issue is that the number of returns increases with each iteration and I would like them stored at each time step. For example, at T0 there is one return and at T20 there are 496.
I have tried creating a lattice, arrays, data frames, etc. Everything I have found online hasn't helped.
set.seed(29807)
Adults <- allYears[3,] # Adult tree population yrs 1:20
Time <- 21
dist <- abs(rnorm(Distance, mean = 51.5, sd = 63)) # dispersal distribution
# Distance refers to the data table with 29807 distances.
# CREATE A FOR LOOP WHERE YOU DRAW X NUMBER OF DISTANCES EQUIVALENT TO THE NUM OF ADULTS AT THAT TIME STEP
for (i in 1:Time) {
N <- sample(dist, size = round(Adults[i]), replace = FALSE)
theta <- runif(N,0,2*pi) # distance is assigned direction
x <- cos(theta)*N # assign variate a polar coordinate x
y <- sin(theta)*N # assign variate a polar coordinate y
}

r raster cumulative sum calculation according to pre-defined time steps

I have a time series of 10 daily raster data and I need to calculate the -5 to +5 time steps cumulative sums according to the pixel values (=time 0) of a reference raster.
I am able to calculate the yearly cumulative sum (here below called ‘sum_YY’)
step1 create the 10 daily raster timeseries
require(raster)
idx <- seq(as.Date("2010/1/1"), as.Date("2011/12/31"), '10 days')
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, values = sample(x = c(-5:5),size = ncell(r), replace = T))))
s <- setZ(s, idx)
step 2 calculate the cumulative sum by year
Y <- unique(format(as.Date(getZ(s), format = "X%Y.%m.%d"), format = "%Y"))
sum_YY <- stack()
for (i in Y) {
dates <- format(getZ(s), "%Y") == i
t <- calc(subset(s, which(dates)), cumsum)
sum_YY <- stack(sum_YY,t)
}
But this is not the output I am looking for.
I need to use cumsum according to the reference raster 'refer'.
step 3 create the reference raster
refer <- raster(ncol=5, nrow=5)
refer <- setValues(refer,sample(x = c(15:25),size = ncell(refer), replace = T))
The ‘refer’ raster pixel values indicate the timestep ‘X’ and I want to calculate the cumulative sums from the ‘s stack’ timesteps X-5 to X+5 for each of the years
i.e. if ‘refer’ pixel value X = 20 I need to calculate the cumulative sums from the ‘s stack’ layers from 15 to 25 for each of the years. The same approach for the other pixels in 'refer'
I have no idea how to introduce the reference raster in the calculation, any help??
Thanks a million!!!
I can't think of a clean approach. But here is one way to do it:
rs <- stack(refer, s)
a <- calc(rs, fun=function(x) sum(x[(x[1]-4):(x[1]+6)]))
StackSelect was designed for problems like this, but I now realize that it needs more flexibility. It would be nice to be able to select multiple layers at once, and to apply a function. But you could do this (probably inefficient)
x <- calc(refer, function(x) (x-5):(x+5), forceapply=T)
y <- stackSelect(s, x)
z <- sum(y)

XYZ data to 2D plot in R

I need to make a 2D plot of distance travelled versus my value at that point ("intensity").
My data is formatted as:
lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917
and it continues for 282 rows like this. I was looking at a few packages that calculate distance between longitude (lon) and latitude (lat) points (such as geosphere), but I couldn't understand how to get my data into the format that it wanted. I know the total distance travelled in degrees should be 4.01538, evenly spaced out between the 282 points, but I don't know how I could make a column in R with this in mind.
dfrm$dist<- cumsum(c(0, with(dfrm, sqrt( (lat[-1]-lat[-nrow(dfrm)])^2+
(lon[-1]-lon[-nrow(dfrm)])^2
))) )
with(dfrm, plot(dist, intensity, type="b"))
Or choose a more "geographic" distance measure with the lagged column values. But given the increments, I doubt the error from using a naive distance measure can be that much.
From here I found some packages to calculate distance between coordinates. Assuming your data is called dtf and using the RSEIS package:
dtf <- data.frame(rbind(c(-85.01478,37.99030,-68.3167),
c(-85.00752,37.97601,-68.0247),c(-85.00027,37.96172,-67.9565),
c(-84.99302,37.94743,-67.8917)))
names(dtf) <- c('lon','lat','int')
library(RSEIS)
travelint <- function(i,data){
ddeg <- GreatDist(dtf$lon[i],dtf$lat[i],dtf$lon[i+1],dtf$lat[i+1])$ddeg;
dint <- dtf$int[i+1] - dtf$int[i]; return(list(ddeg,dint))}
out <- sapply(1:(nrow(dtf)-1),data=dtf,travelint)
out <- data.frame(matrix(as.numeric(out),ncol=2,byrow=T))
out$X1 <- cumsum(out$X1)
This will take your data, calculate the distance traveled between points and the intensity change between them. After that it can be plotted like this:
ggplot(out,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity Change")
If instead you want increasing intensity , you can use cumsum again to get the cumulative change in intensity and then add it to the first intensity:
out2 <- out
out2 <- rbind(c(0,0),out2)
out2$X2 <- cumsum(out2$X2) + dtf$int[1]
ggplot(out2,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity")
As mentioned by DWin you can use naive measure or geographic distance measure. Here I am using gdist function from Imap package calculates Great-circle distance .
library(Imap)
library(lattice)
#Dummy data
longlat <- read.table(text="lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917", header=TRUE)
ll <- lapply(seq(nrow(longlat)-1), function(x){
start <- longlat[x,]
end <- longlat[x+1,]
cbind(distance = gdist(start$lon, start$lat, end$lon, end$lat,units = "m"),
intensity = end$intensity - start$intensity)
})
dd <- as.data.frame(do.call(rbind,ll))
library(lattice)
xyplot(intensity~distance,dd,type= c('p','l'),pch=20,cex=2)

Resources