Compare Survey Results Across Regions - r

I have results from a survey of nurse practitioners asking to what degree (Likert scale, values from 1-5) they feel certain barriers prevent them from adequate practice (i.e. time constraint, location restrictions, etc.). They were also asked to locate where in the state they practice (fill in a bubble). I was wondering if there was a way to code a picture of a U.S. state (say Texas) and superimpose the survey results onto the map by region?
For example: Say one nurse indicated a 1 for feeling time constrained, and she was from the Southern region of Texas. Then, I wold like to show that, out of say a sample of 100, that 1% who responded with a value of 1 came from the Southern region, and have that appear on a map of Texas. Does that make sense?

Related

How to simulate river level rise in R

I need to make a simulation to see what areas would be affected if the sea level rises in X meters. Could anyone give me tips were to start? I've search for tools embedded in the google maps API but didn't find any workaround.
The idea is to create a function such as this:
isAffected <- function( coordinate, metersRised)
---- return True if it is affected, false otherwise
Thanks in advance!
First reaction is I can't see there being any quick straightforward solution with off the shelf R libraries/data sets on top of which to build a function like that. Second is wondering if you'd like to model it or rely on already developed products, or something in the middle. The most rigorous would be applying a hydrodynamic model and the other bookend is sampling someone else's grid of anticipated results.
Just for context, For river level affected by sea level rise near the coast, you may want to consider variable river stages if they vary quite a bit. If the rivers are running high due to recent storms or snowmelt events, it will worsen flooding from sea level rise alone. So maybe you could assume a limited number of river heights (say rainy season - high, dry season - low). Tides complicate things too, as do storms and storm surge - basically above average ocean heights due to the temporary very low pressure. An example worst cast scenario with those three components is, how much of x city (regional coastline) would be flooded, say New Orleans or Australian coast, during storm surge, a high tide, and the local river very full from spring snowmelt, with 5 feet more extra sea level added, so lots of data needs to consider - eg you may want some sort of x,y,z data for those river height assumptions. Lots of cities have inundation maps where you can get those river stage elevations. The bigger the sea level rise assumption, the less the rivers might matter. Eg, a huge sea level rise scenario could easily inundate the whole city as it is today, no matter how high the river is, with the mouth of the river moving miles inland.
Simplyifying things, I'd say the most important data will be the digital elevation model (DEM), probably a raster file of x,y,z coordinates, with z being the key piece - the elevation of a pixel at every xy location above some certain datum. Higher resolution DEMs will give much more detailed and realistic inundation. Processed LiDAR data is maybe ideal - very high resolution data that some else has produced - raw LiDAR data is a burden. There's at least some here for New Zealand - http://opentopo.sdsc.edu/datasets - but I'm not sure of good warehouses for data outside the US.
A basic workflow might be, decide what hydraulic components you'll consider and how many scenarios. Eg, you'll ignore tides by using an average sea level and have just two sea level rise scenarios, and assume the river is always at __ feet, or maybe __ ft and __ ft. Download/build DEM, and then add your river heights to the digital elevation model (not trivial, but searching GIS Stack overflow a good start). That's a reference baseline elevation to combine sea water with. With an assumption of sea level rise, say 10 feet, that's incorporated into another DEM, one approach is raster math centric, subtracting one from the other and the result will show the new inundation areas. Once you've done the raster math, you could have a binary xy grid with either flooded or not flooded, to apply that final xy search function: is xy 1 or 0, but by far the trickiest part is all before that. There's maybe more straightforward or simplified approaches, but the system is so dyanmic so the sky is the limit for how complicated your model will be. Here's more information on the river component, that might help visualize the river starting points to which you'll add your sea water scenario(s) https://www.usgs.gov/mission-areas/water-resources/science/flood-inundation-mapping-science?qt-science_center_objects=0#qt-science_center_objects
The library raster might be a good start, that will read in downloaded raster/grid files, like .tif, and also perform the raster math you'd need - adding/subtracting same size rasters together. Or forgetting all this processing, maybe you could just read in pre-processed rasters of such scenarios done by others, then do your search on them. There's probably a good number for certain sea level rises, but it just gets much trickier if you want to assume both sea level and river elevation scenarios.

Finding a quantity of anything between two points in space

I'm currently working towards a 3D model of this, but I thought I would start with 2D. Basically, I have a grid of longitude and latitude with NO2 concentrations across it. What I want to produce, at least for now, is a total amount of Nitrogen Dioxide between two points. Like so:
2DGrid
Basically, These two points are at different lats and lons and as I stated I want to find the amount of something between them. The tricky thing to me is that the model data I'm working with is gridded so I need to be able to account for the amount of something along a line at the lat and lons at which that line cuts through said grid.
Another approach, and maybe a better one for my purposes, could be visualized like this:3DGrid
Ultimately, I'd like to be able to create a program (within any language honestly) that could find the amount of "something" between two points in a 3D grid. If you would like specfics, the bottom altitude is the surface, the top grid is the top of the atmosphere. The bottom point is a measurement device looking at the sun during a certain time of day (and therefore having a certain zenith and azimuth angle). I want to find the NO2 between that measurement device and the "top of the atmosphere" which in my grid is just the top altitude level (of which there are 25).
I'm rather new to coding, stack exchange, and even the subject matter I'm working with so the sparse code I've made might end up creating more clutter than purely asking the question and seeing what methods/code you might suggest?
Hopefully my question is beneficial!
Best,
Taylor
To traverse all touched cells, you can use Amanatides-Woo algorithm. It is suitable both for 2D and for 3D case.
Implementation clues
To account for quantity used from every cell, you can apply some model. For example, calculate path length inside cell (as difference of enter and exit coordinates) and divide by normalizing factor to get cell weight (for example, byCellSize*Sqrt(3) for 3D case as diagonal length).

Merge neighbouring areas together, if tm_fill for one area is too small

I have made a map with zip codes of a town. The fill is the ratio of being a case or not case. But some zip codes have very few numbers in total so outliers distort the map.
Is there a way to merge the polygons and data of two neighboring areas based on their n automatically?
And if that is not possible, how can I merge rows of my sf/df without losing ID?
And I guess simplest would be just to set the zip codes to NA.
Depends on what you mean by "automatically". Here's a simple algorithm.
repeat:
Find the region with the smallest population.
If that's more than your threshold, stop
Find that region's neighbours, pick one (at random, or smallest population).
merge that neighbour with that region
Finding neighbours and merging can all be done with either the sf package or the sp package and friends (like spdep and rgeos).
Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold.
Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.

Feature engineering of X,Y coordinates in neighborhoods of San Francisco

I am participating in a starter Kaggle competition(Crimes in San Francisco) in which I want to predict the category of a crime using a bunch of predictor variables including X and Y coordinates of a crime. As I doubt of the predictive power of the coordinates, I want to transform these variables to something more relevant to the crime category.
So I am thinking that if I had the neighbourhood of San Francisco in which the crime took place, it would be more informative than the actual coordinates of the crime. I can find the neighbourhoods online but of course I cant use the borders of each neighbour to classify the corresponding crime because their shapes are not rectangular or anything like that.
Does anyone have any idea about how I could solve this one?
Thanks guys
Well that's interesting AntoniosK and it's getting close to what I want to accomplish. The problem is that the information " south-east and 2km from city center" can lead to more than one neighborhoods.
I am still thinking that the partition of the city in neighborhoods is valuable because the socio-economic and structural differences between them ( there is a reason why the neighborhoods of each city are separated as such, right?) can lead to a higher probability for a certain category crime and a lower one for another.
That said, your idea made me thinking of using the south-east etc mapping and then use the angle of the segment(point to city center) with x axis to map the point to appropriate neighborhood. I am on it right now. Thanks
After some time on the problem I found that the procedure I want to perform is titled " reverse geocoding". It also turns out that there are some api's to solve this. The best according to my opinion is revgeocode() function contained in ggmap package(google's edition). This one though has a query limit per day(2500 queries) unless you pay for extra.
The one that I turned to though is geonames package and GNneighbourhood function that turns coordinates to neighbours. It is free, though I have experienced some errors(keep in mind that this one is only for US and Canada cities)
revgeocode function-ggmap package
Gnneighbourhood-geonames package

Mapping logical units

Most maps (maybe all of them) are just pictures made up of points, lines, fills and text. They don't incorporate any retrievable knowledge about the logical divisions they portray.
So if I want to combine a database of information about some logical entity, a well-defined neighborhood for example, with an accurate map of that neighborhood, I have to figure out how to render the neighborhood's map such that it's seamless when combined with the maps of adjacent neighborhoods.
I can brute-force it by going through the database of lat-long points that group to represent streets and similar, and add points where they pass through the logical boundaries of the neighborhood.
But I also have to do that for any other logical information I want to represent, e.g. school catchment areas, voting precincts, and so forth.
My question is: does anyone know of some software already written, C/C++ would be favorite, that handles this kind of interpolation? Or even a paper that discusses how to do it more elegantly than by brute force?

Resources