Graph file on R-INLA package - r

I have some areal data (disease mapping) to model.
I want to predict the number of cases of a disease, considering spatio-temporal effects using R-INLA.
My data has these columns:
city code; year; month; latitude; longitude; number of cases.
I know that in R-INLA with the poly2nb and nb2INLA functions I can create a graph file and latter I can use this file as an argument to the "graph" parameter on the formula.
The problem is: I don't know how to make this graph file. I can get polygons of the cities I'll be working on (the cities are brazilian and the package geobr seems very useful for this). However, I don't know how to effectively join this polygons information with my data and make the graph file with this (remembering that my data is also temporal, so I have one line to each city, month and year triplets).
Can someone help me?

Related

How to eliminate no-neighbour data when doing spatial clustering

Hello wonderful people!
I'm trying to run a cluster analysis on some data that I've mapped on a choropleth map. It's the % participation rates per constituency.
I'm trying to run a Moran_result test, but am unable to get the data in list format. I keep getting the error: "Error in nb2listw(nb) : Empty neighbour sets found".
I assume this is because some constituencies (like the Isle of White) have no neighbours. I can't find online what constituencies are islands or have no neighbours, and wondered if there is a speedier way to by-pass solve this issue in R, rather than googling all 573 England and Wales constituencies.
Can you help?
Ideas: I thought that maybe I could create "fake" polygons to surround all constituencies with no value so at the very least they could be listed. Or maybe there is a way of searching which have no neighbours and then removing them? Both of these I'm unsure how to do.
My goal: I wan't to get a few spatial clusters where the participation rates are similar and then extract that data so I can compare it to a regression model I have. If you know of another way to do this, other than above, please let me know.
I've tried: new_dataframe <- filter(election_merged_sf, !is.na(nb), but this doesn't actually remove any objects. I assume this is because it is testing whether there are numeric neighbours, when it needs to be done spatially.

Is there an R function/package for determining WWF biomes from latlong coordinates?

Very new here, hi, postgraduate student who is tearing their hair out.
I have inherited a dataset of secondary data collected from research papers on species populations and their genetic diversity and have been adding more appropriate data to this sheet in preparation to perform some analyses. Part of the analysis will include subsetting the data by biome type to create comparisons between the biomes, and therefore I've been cleaning up and trying at add this information to the data I've added. I have latlong coordinates for each population (in degrees decimals) and it appears that the person working on this before me was able to use these to determine the biome for each point, specifically following the Olson et al. (2001)/WWF 14 biome categorisation, but at this point I'll take anything.
However I have no idea how this was achieved and truly can't find anything to help. After googling just about every combination of "r package biomes WWF latitude longitude species assign derive convert" that you can think of, the only packages that I have located are non functioning in my version of RStudio (e.g. biomeara, ggbiome), leaving me with no idea if they'd even work, and all other pages that I have dragged up seem to already have biome data included with their dataset. Other research papers I have found describe assigning biomes based on latlong coords and give 0 steps on how to actually achieve this. Is it possible in R? Am I losing my mind? Does anyone know of a way to do this, whether in R or not, and that preferably doesn't take forever as I have over 8000 populations to assess? Many thanks!

Mutation plot splitting cohort

I am approaching some R packages to create oncoplot representing different mutation types.
I am using maftools (oncoplot function) and genvisR (waterfall function) packages.
I would like to split results plot in two parts based on clinical information (like gender - male and female).
I want to reach a representation like this one but I can't find the correct parameter
Could someone help me? Is there a command inside the function or outside (like facet_wrap) to split to cohort based on clinical information? It should be very useful for me
Thank you in advance

Training and simulating a spatstat ppm using multiple datasets

Disclaimer: I'm very new to spatstat and spatial point modeling in general... please excuse my naivete.
I have recently tried using spatstat to fit and simulate spatial point patterns related to weather phenomenon where the spatial pattern represents a set of eye-witness reports (for example, reports of hail occurrence) and the observational window and covariate is based on some meteorological parameter (eg. the window is area where moisture is at least X, and then the moisture variable is additionally passed as a covariate when training the model).
moistureMask = owin(mask=moisture>X)
moistureVar = im(moisture)
obsPPP = ppp(x=obsX,y=obsY,window=moistureMask)
myModel = ppm(obsPPP ~ moistureVar)
### then simulate
mySim = simulate(myModel,nsim=10)
My questions are the following:
Is it possible (or more importantly, even valid), to take a ppm trained on one day with a specific moisture variable and mask, and apply it to another day with a different moisture value and mask. I had considered using the update function to switch out the window and covariate fields from the trained model, but haven't actually tried it yet. If the answer is yes... its a little unclear to me how to actually do this, programmatically
Is it it possible to do an online update of the ppm with additional data. For example, train the model on data from different days (each with their own window and covariate), iteratively (similar to how many machine learning models are trained, using blocks of training data). For example, lets say I have 10-years of daily data which I'd like to use to train the model, and another 10-years of moisture variables over which I'd like to simulate point patterns. Again, I considered the update function here as well, but it was unclear if the new model would simply be based ONLY on the new data, or a combination of the original and new data.
Please let me know if I'm going the completely wrong direction with this. References and resources appreciated.
If you have fitted a model using ppm and you update it by specifying new data and/or new covariates, then the new data replace the old data; the updated model's parameters are determined using only the new data that you gave when you called update.
The syntax for the update command is described in the online help for update.ppm (the method for the generic update for an object of class ppm).
It seems that what you really want to do is to fit a point process model to many replicate datasets, each dataset consisting of a predictor moistureVar and a point pattern obsPPP. In that case, you should use the function mppm which fits a point process model to replicated data.
To do this, first make a list A containing the moisture regions for each day, and another list B containing the hail report location patterns for each day. That is, A[[1]] is the moisture region for day 1, and B[[1]] is the point pattern of hail report locations for day 1, and so on. Then do
h <- hyperframe(moistureVar=A, obsPPP=B)
m <- mppm(obsPPP ~ moistureVar, data=h)
This will fit a single point process model to the full set of data.
Finally can I point out that the model
obsPPP ~ moistureVar
is very simple, because moistureVar is a binary predictor. The model will simply say that the intensity of hail reports takes one value inside the high-moisture region, and another value outside that region. As an alternative, you could consider use the moisture content (eg humidity) as a predictor variable.
See Chapters 9 and 16 of the spatstat book for more detail.

What is the best way to store anciliary data with a 2D timeseries object in R?

I currently try to move from matlab to R.
I have 2D measurements, consisting of irradiance in time and wavelength together with quality flags and uncertainty and error estimates.
In Matlab I extended the timeseries object to store both the wavelength array and the auxiliary data.
What is the best way in R to store this data?
Ideally I would like this data to be stored together such that e.g. window(...) keeps all data synchronized.
So far I had a look at the different timeseries classes like ts, zoo etc and some spatial-time series. However none of them allow me to neither attach auxiliary data to observations nor can they give me a secondary axes.
Not totally sure what you want, but here is a simple tutorial mentioning
R's "ts" and"zoo" time series classes:
http://faculty.washington.edu/ezivot/econ424/Working%20with%20Time%20Series%20Data%20in%20R.pdf
and here is a more comprehensive outline of many more classes(see the Time Series Classes section)
http://cran.r-project.org/web/views/TimeSeries.html

Resources