I'm working on a dataset of location characteristics that I separate into 7 clusters.
I have 7 characteristic clusters with, in each, between 20 and 2000 locations.
Each cluster have a score (between 0 and 1).
My goal is to do a regional clusterizing on each characteristic cluster, then to show these regional clusters on a map with a color based on the score on the characteristic cluster.
Example : I have 1000 locations. Each location have a temperature. I detect 3 clusters within the data.
I separate these 1000 locations into 3 clusters:
1st_cluster : 500 locations and a score of (=mean of temperature of all locations) 0 degrees
2nd_cluster : 300 locations and a score of -10 degrees
3rd_cluster : 200 locations and a score of +10 degrees
That part is already done.
I want to show on a map these 3 clusters BUT to detect regional tendancy.
For now, I used folium to plot each characteristic (=temperature) cluster individually and let the library detect regional behavior.
But I couldn't find a way to plot on the same map all clusters (my code only detect when multiple locations are in the same area and create a regional cluster).
I don't really know if folium is a good library to continue and, if not, I don't know what library to use and how.
Do you have any ideas and advices ?
Thanks
Related
In Australia we have a test for students called NAPLAN.
The results are provided in a sort of band graph mixed with a box and whisker.
Does anyone know what they are called?
They are good because they show total range, Where the student falls in the band. What the national average is and what the students class average is.
essentially 4 data points on 1 graph.
I am using googleway package in R to calculate distances between two points. I had 2 variables names household latitude and longitude and 2 other variables containing lat and long of the corresponding government cooperative store in the locality of the household. So one can easily specify the origin and destination coordinates.
But now i want to calculate the distance of a household from all the household in its locality. There are 12 localities in total, and in each locality there are about 50 (with some variation) households. So for each locality, i need to calculate 49 distances(excluding itself). Can someone help me on how to do it?
I have a number of locations (latitudes and longitudes) in my data. If each location is a member of ellipse, I want to find the centre (in cartesian coordinates) of the ellipse. The data looks like:
longitude latitude
location 1 -118.8267 33.73430
location 2 -115.9665 33.25514
location 3 -117.2978 34.18589
location 4 -117.2962. 34.18449
location 5 -117.1625 34.00642
Please note that I do not have any information regarding the major axis length and minor axis length of the ellipses.
I want to know, is there any way in R programming language by which I can find the cartesian coordinates of the centre of the ellipses the locations belong to? And how to find the major and minor axis lengths as well?
Geo
**EDIT**:
In my question, each location will be a member of a "different" ellipse. Now, an ellipse consists of at least 3 points. In my analysis,I will first classify the locations into several different groups according to some model criteria. Then there will be several different "ellipses". Each location point will be a member of one of these classified ellipse.
I am trying to find the relationship between the number of people that come to a certain region and the number of accommodations, shops, restaurants, and leisure places in that region. I know the number of total people whom visit a certain region but I don't know whether they visit for accommodation or to shop, etc.
So I have plotted the number of restaurants, etc, in each region by the number of people in that region. Here is the graph. Here is some of the data I'm trying to analyze
Thus, the general shape of these points is a parabola that is rotated 90 degrees. I am not very familiar with R and cannot figure out how to find this equation/know if this is not possible.
My goal is to get coefficients of each parameter (ie accomodation, restaurants, etc.) so I can conclude something like "if we add 10 restaurants, an increase of x number of people should come to the region."
Here is a snippet of some code I've tried but not succeeded
linez <- nls(People ~ sqrt(Accommodation/a) , data=fourth, start=c(a=1), trace=T)
lines(s, predict(linez, list(x=s)), col = "red")
I have a file with 52,000 points distributed in Brazil and a map of forest remnants (in polygon format).
What I want to do is calculate the distance from each point to each forest fragment that is within a buffer of, for example, 500m. So, if I have 3 fragments within a buffer of 500m, I want to have all three distances (euclidian) calculated from the centroid (focal point) to these fragments.
At the end I would like to take the mean distance from each focal point to their respective fragments.
I tried the function gWithinDistance,from the package "rgeos", like below:
near_frag_500 <- gWithinDistance (points, veg_natural, 500, byid=T)
being the argument "points" my focal points and "veg_natural" my forest remnant polygons. The number 500 refers to the buffer of 500m I want to calculate the distance. However, the output of this function is a matrix with TRUE or FALSE values. TRUE for those polygons which fall within the 500m buffer and FALSE for those polygons which fall outside the 500m buffer. It doesn´t give me the actual values of the distances calculated. I guess what I am looking for is an equivalent to the "Generate Near Table" function in ArcGIS.
I would really appreciate if someone could help me with that! I also have my forest remnants polygons in raster if there is any solution for that using a raster file.
I have made a simple test set with 7 points and 8 polygons. Everything has to be projected to a cartesian system in metres, so not lat-long. Use a local UTM zone if nothing else.
I compute the distance matrix from points to polygons:
> dmat = gDistance(points, veg_natural,byid=TRUE)
Then mask out anything over 500, and compute the row means:
> dmat[dmat>500]=NA
> apply(dmat, 1, mean, na.rm=TRUE)
0 1 2 3 4 5 6 7
331.5823 262.7129 380.2073 187.2068 111.9961 NaN 224.6962 360.7995
and that is the mean of the distances from each point to the nearest features within 500m. Note the NaN for point 5 which is because it is not 500m from any polygon features.
If this matrix is too big for your case with 52,000 points (and ?? polygons?) then just do it for 1000 points at a time in a loop or whatever your computer can cope with. I think mine would fall over with 52,000.
If you want to know which of the polygons are the ones within 500m of each point, then something like:
> apply(dmat,1, function(r){which(!is.na(r))})
$`0`
5 6
5 6
$`1`
4 5 7
4 5 7
shows my first point (labelled 0) is near to polygons 5 and 6.