How to change the number of column of raster data in R? - r

I have two raster data, both have same resolution and origin. And both have 3600 columns.
I use r <- rotate(r) on one raster data to change the log from 0,360 to -180,180. but after this process, the number of columns of this raster data increases to 3601 from 3600. However I need to do a calculation of two raster data, and need the number of columns of two raster data to be same, which is 3600.
I expect to get a raster data with 3600 columns.

Related

R - filtering negative and positive spikes in data

I am plotting a data that consists of some intervals that are more or less constant, and spikes in the data originating from the data being a quotient from two parameters. The relatively high and large quotients aren't not relevant for my purpose, so I have been looking for a way to filter these out. The dataset contains 40k+ values so I can not manually remove the high/low quotients.
Is there any function that can trim/filter out the very large/small quotients?
You can use the filter() function from dplyr. This can create a new dataframe without outliers that you can then plot. For example:
no_spikes <- filter(original_df, x > -100 & x < 100)
This would create a new dataframe, no_spikes, that only contains observations where the variable x is between the values -100 and 100.

R keras LSTM input shape

There are lots of answers on how to reshape data for Keras LSTM, but they are all about Python, not R.
Array transformation for KerasR LSTM in R
This answer shows the transformation method, but I still have a question. What if the number of features is 2.
This is my data.
It is a temperature data. It has 290 rows and 122 cols. Each col represents a time series data for a station. And each row means the max temp for one day. I want to predict the max temp of next day using historical data, so the number of features is 122, but I do not know what is the samples and timesteps.
Based on my experience, you should reshape data into a 3D array such that the dimensions are: samples: timesteps: features
Originally I have an input matrix, X, with n columns (features) and r rows (observations, days). I apply a time-lag of m periods on each column of the matrix, so now I have n separate matrices (one for each feature) with the same r rows, but with m columns, corresponding to the number of time-lags I implemented. I squish all of these individual matrices together in the z-dimension, so I now have a matrix with r rows, m columns, and a depth of n.
I actually had my own question about this and I will be posting my own query soon on either StackOverflow or CrossValidated.

extracting pixel values above a value per polygon in R

I have a shapefile containing 38 polygons i.e. 38 states of a country. This shapefile is overlaid on a raster. I need to extract/reclassify pixel above a certain value, specific to each polygon.
For example, I need to extract the raster pixels> 120 for state/polygon 1, pixels> 189 for polygon 2 etc with the resulting raster being the extracted pixels with value 1 and everything else as NoData. Hence, it seems like I need to extract first and then reclassify.
I have the valuees, for extraction, saved as a data frame with a column containing names, matching the names of the states,which is stored as an attribute "Name" in the shapefile.
Any suggestion on how I could go about this?
Should I extract the raster for each state into 38 separate rasters, then do reclassify () and then mosaic to make one raster i.e. the country?

Sampling according to distribution from a large vetor in R

I have a large vector of 11 billion values. The distribution of the data is not know and therefore I would like to sample 500k data points based on the existing probabilities/distribution. In R there is a limitation of values that can be loaded in a vector - 2^31 -1 which is why I plan to do the sampling manually.
Some information about the data: The data is just integers. And many of them are repeated multiple times.
large.vec <- (1,2,3,4,1,1,8,7,4,1,...,216280)
To create the probabilities of 500k samples across the distribution I will first create the probability sequence.
prob.vec <- seq(0,1,,500000)
Next, convert these probabilities to position in the original sequence.
position.vec <- prob.vec*11034432564
The reason I created the position vector is so that I can pic data point at the specific position after I order the population data.
Now I count the occurrences of each integer value in the population. Create a data frame with the integer values and their counts. I also create the interval for each of these values
integer.values counts lw.interval up.interval
0 300,000,034 0 300,000,034
1 169,345,364 300,000,034 469,345,398
2 450,555,321 469,345,399 919,900,719
...
Now using the position vector, I identify which position value falls in which interval and based on that get the value of that interval.
This way I believe I have a sample of the population. I got a large chunk of the idea from this reference,
Calculate quantiles for large data.
I wanted to know if there is a better approach? Or if this approach could reasonably, albeit crudely give me a good sample of the population?
This process does take a reasonable amount of time, as the position vector as to go through all possible intervals in the data frame. For that I have made it parallel using RHIPE.
I understand that I will be able to do this only because the data can be ordered.
I am not trying to randomly sample here, I am trying to "sample" the data keeping the underlying distribution intact. Mainly reduce 11 billion to 500k.

Counting species occurrence in a grid

I have about 500,000 points in R of occurrence data of a migratory bird species throughout the US.
I am attempting to overlay a grid on these points, and then count the number of occurrences in each grid. Once the counts have been tallied, I then want to reference them to a grid cell ID.
In R, I've used the over() function to just get the points within the range map, which is a shapefile.
#Read in occurrence data
data=read.csv("data.csv", header=TRUE)
coordinates(data)=c("LONGITUDE","LATITUDE")
#Get shapefile of the species' range map
range=readOGR(".",layer="data")
proj4string(data)=proj4string(range)
#Get points within the range map
inside.range=!is.na(over(data,as(range,"SpatialPolygons")))
The above worked exactly as I hoped, but does not address my current problem: how to deal with points that are the type SpatialPointsDataFrame, and a grid that is a raster. Would you recommend polygonizing the raster grid, and using the same method I indicated above? Or would another process be more efficient?
First of all, your R code doesn't work as written. I would suggest copy-pasting it into a clean session, and if it errors out for you as well, correcting syntax errors or including add-on libraries until it runs.
That said, I assume that you are supposed to end up with a data.frame of two-dimensional numeric coordinates. So, for the purposes of binning and counting them, any such data will do, so I took the liberty of simulating such a dataset. Please correct me if this doesn't capture a relevant aspect of your data.
## Skip this line if you are the OP, and substitute the real data instead.
data<-data.frame(LATITUDE=runif(100,1,100),LONGITUDE=runif(100,1,100));
## Add the latitudes and longitudes between which each observation is located
## You can substitute any number of breaks you want. Or, a vector of fixed cutpoints
## LATgrid and LONgrid are going to be factors. With ugly level names.
data$LATgrid<-cut(data$LATITUDE,breaks=10,include.lowest=T);
data$LONgrid<-cut(data$LONGITUDE,breaks=10,include.lowest=T);
## Create a single factor that gives the lat,long of each observation.
data$IDgrid<-with(data,interaction(LATgrid,LONgrid));
## Now, create another factor based on the above one, with shorter IDs and no empty levels
data$IDNgrid<-factor(data$IDgrid);
levels(data$IDNgrid)<-seq_along(levels(data$IDNgrid));
## If you want total grid-cell count repeated for each observation falling into that grid cell, do this:
data$count<- ave(data$LATITUDE,data$IDNgrid,FUN=length);
## You could have also used data$LONGITUDE, doesn't matter in this case
## If you want just a table of counts at each grid-cell, do this:
aggregate(data$LATITUDE,data[,c('LATgrid','LONgrid','IDNgrid')],FUN=length);
## I included the LATgrid and LONgrid vectors so there would be some
## sort of descriptive reference accompanying the anonymous numbers in IDNgrid,
## but only IDNgrid is actually necessary
## If you want a really minimalist table, you could do this:
table(data$IDNgrid);

Resources