Why is SpatialPointsDataFrame rounding my Y coordinates? (rounds UTM 5252636 to 5262640) - r

I am trying to estimate home range size using telemetry data using adehabitatHR. Every time I create a SpatialPointsDataFrame, it rounds the y-coordinate for my UTM locations. I have tried having the UTMs saved as integers, as numbers, removing NAs before importing the .csv, removing NAs after importing the .csvs, importing as a .txt file (in case there's a bug in excel), etc.
Here is the console output:
> data<-read.csv("C:/Workspace/URAM/Data/HomeRange/NEW_TelemetryLocs_AllBears_asof_9Sept19_noNAs.csv", header=T)
>
> summary(data)
Alias Order Sex AnimalID X Y
Brandy : 2839 Min. : 1 F:15546 102-16 : 2839 Min. :397286 Min. :5236180
Bernie : 2674 1st Qu.: 306 M:11650 06-16 : 2674 1st Qu.:406966 1st Qu.:5251887
Eve : 2646 Median : 635 01-18 : 2646 Median :413166 Median :5258742
Deedee : 2606 Mean :1239 17-17 : 2606 Mean :412579 Mean :5257164
Buddha : 2346 3rd Qu.:2018 04-17 : 2346 3rd Qu.:418419 3rd Qu.:5262669
Bailey : 1192 Max. :5583 12-17 : 1192 Max. :432690 Max. :5291985
(Other):12893 (Other):12893
> head(data)
Alias Order Sex AnimalID X Y
1 Calliope 128 F 19-22 432690.3 5262636
2 Calliope 191 F 19-22 432522.3 5262409
3 Calliope 127 F 19-22 432491.0 5263274
4 Calliope 189 F 19-22 432466.3 5262413
5 Calliope 190 F 19-22 432376.1 5262121
6 Calliope 202 F 19-22 432262.3 5264390
> dim(data)
[1] 27196 6
> str(data)
'data.frame': 27196 obs. of 6 variables:
$ Alias : Factor w/ 26 levels "Artemis","Bailey",..: 9 9 9 9 9 9 9 9 9 9 ...
$ Order : int 128 191 127 189 190 202 201 188 129 422 ...
$ Sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ AnimalID: Factor w/ 26 levels "01-18","01-19",..: 26 26 26 26 26 26 26 26 26 26 ...
$ X : num 432690 432522 432491 432466 432376 ...
$ Y : num 5262636 5262409 5263274 5262413 5262121 ...
> test.sp<-SpatialPointsDataFrame(data[,5:6],data = data, coords.nrs = 5:6,
+ match.ID = TRUE,
+ proj4string=CRS("+proj=utm +zone=10+datum=NAD83+ellps=GRS80"))
> head(test.sp)
coordinates Alias Order Sex AnimalID X Y
1 (432690, 5262640) Calliope 128 F 19-22 432690.3 5262636
2 (432522, 5262410) Calliope 191 F 19-22 432522.3 5262409
3 (432491, 5263270) Calliope 127 F 19-22 432491.0 5263274
4 (432466, 5262410) Calliope 189 F 19-22 432466.3 5262413
5 (432376, 5262120) Calliope 190 F 19-22 432376.1 5262121
6 (432262, 5264390) Calliope 202 F 19-22 432262.3 5264390
Coordinate Reference System (CRS) arguments: +proj=utm +zone=10+datum=NAD83+ellps=GRS80
>
If I enter 6 lines of data manually in R, it seems to work:
> Alias<-c("Brandy","Brandy","Brandy","Brandy","Brandy","Brandy")
> Sex<-c("F","F","F","F","F","F")
> Order<-c(5,6,7,8,9,10)
> X<-c(409483,409481,409442,409438,409443,409576)
> Y<-c(5263356,5263356,5263335,5263340,5263342,5263685)
> test2<-data.frame(Alias,Sex,Order,X,Y)
> head(test2)
Alias Sex Order X Y
1 Brandy F 5 409483 5263356
2 Brandy F 6 409481 5263356
3 Brandy F 7 409442 5263335
4 Brandy F 8 409438 5263340
5 Brandy F 9 409443 5263342
6 Brandy F 10 409576 5263685
> test2.sp<-SpatialPointsDataFrame(test2[,4:5],data = test2, coords.nrs = 4:5,
+ match.ID = TRUE,
+ proj4string=CRS("+proj=utm +zone=10+datum=NAD83+ellps=GRS80"))
>
> test2.sp
coordinates Alias Sex Order X Y
1 (409483, 5263356) Brandy F 5 409483 5263356
2 (409481, 5263356) Brandy F 6 409481 5263356
3 (409442, 5263335) Brandy F 7 409442 5263335
4 (409438, 5263340) Brandy F 8 409438 5263340
5 (409443, 5263342) Brandy F 9 409443 5263342
6 (409576, 5263685) Brandy F 10 409576 5263685
However, if i use the coordinates() command to create the SpatialPointsDataFrame, using the manually entered data, the result is a rounded Y coordinate:
> coordinates(test2)<-c("X","Y")
> head(test2)
coordinates Alias Sex Order
1 (409483, 5263360) Brandy F 5
2 (409481, 5263360) Brandy F 6
3 (409442, 5263340) Brandy F 7
4 (409438, 5263340) Brandy F 8
5 (409443, 5263340) Brandy F 9
6 (409576, 5263680) Brandy F 10
Coordinate Reference System (CRS) arguments: NA
Any help or suggestions would be greatly appreciated.

The rounding takes place in the printing of the object, not in the internal representation or subsequent computations.

Related

Measuring distance between centroids R

I want to create a matrix of the distance (in metres) between the centroids of every country in the world. Country names or country IDs should be included in the matrix.
The matrix is based on a shapefile of the world downloaded here: http://gadm.org/version2
Here is some rough info on the shapefile I'm using (I'm using shapefile#data$UN as my ID):
> str(shapefile#data)
'data.frame': 174 obs. of 11 variables:
$ FIPS : Factor w/ 243 levels "AA","AC","AE",..: 5 6 7 8 10 12 13
$ ISO2 : Factor w/ 246 levels "AD","AE","AF",..: 61 17 6 7 9 11 14
$ ISO3 : Factor w/ 246 levels "ABW","AFG","AGO",..: 64 18 6 11 3 10
$ UN : int 12 31 8 51 24 32 36 48 50 84 ...
$ NAME : Factor w/ 246 levels "Afghanistan",..: 3 15 2 11 6 10 13
$ AREA : int 238174 8260 2740 2820 124670 273669 768230 71 13017
$ POP2005 : int 32854159 8352021 3153731 3017661 16095214 38747148
$ REGION : int 2 142 150 142 2 19 9 142 142 19 ...
$ SUBREGION: int 15 145 39 145 17 5 53 145 34 13 ...
$ LON : num 2.63 47.4 20.07 44.56 17.54 ...
$ LAT : num 28.2 40.4 41.1 40.5 -12.3 ...
I tried this:
library(rgeos)
shapefile <- readOGR("./Map/Shapefiles/World/World Map", layer = "TM_WORLD_BORDERS-0.3") # Read in world shapefile
row.names(shapefile) <- as.character(shapefile#data$UN)
centroids <- gCentroid(shapefile, byid = TRUE, id = as.character(shapefile#data$UN)) # create centroids
dist_matrix <- as.data.frame(geosphere::distm(centroids))
The result looks something like this:
V1 V2 V3 V4
1 0.0 4296620.6 2145659.7 4077948.2
2 4296620.6 0.0 2309537.4 219442.4
3 2145659.7 2309537.4 0.0 2094277.3
4 4077948.2 219442.4 2094277.3 0.0
1) Instead of the first column (1,2,3,4) and row (V1, V2, V3, V4) I would like to have country IDs (shapefile#data$UN) or names (shapefile#data#NAME). How does that work?
2) I'm not sure of the value that is returned. Is it metres, kilometres, etc?
3) Is geosphere::distm preferable to geosphere:distGeo in this instance?
1.
This should work to add the column and row names to your matrix. Just as you had done when adding the row names to shapefile
crnames<-as.character(shapefile#data$UN)
colnames(dist_matrix)<- crnames
rownames(dist_matrix)<- crnames
2.
The default distance function in distm is distHaversine, which takes a radius( of the earth) variable in m. So I assume the output is in m.
3.
Look at the documentation for distGeo and distHaversine and decide the level of accuracy you want in your results. To look at the docs in R itself just enter ?distGeo.
edit: answer to q1 may be wrong since the matrix data may be aggregated, looking at alternatives

Observations becoming NA when ordering levels of factors in R with ordered()

Hi have a longitudinal data frame p that contains 4 variables and looks like this:
> head(p)
date.1 County.x providers beds price
1 Jan/2011 essex 258 5545 251593.4
2 Jan/2011 greater manchester 108 3259 152987.7
3 Jan/2011 kent 301 7191 231985.7
4 Jan/2011 tyne and wear 103 2649 143196.6
5 Jan/2011 west midlands 262 6819 149323.9
6 Jan/2012 essex 2 27 231398.5
The structure of my variables is the following:
'data.frame': 259 obs. of 5 variables:
$ date.1 : Factor w/ 66 levels "Apr/2011","Apr/2012",..: 23 23 23 23 23 24 24 24 25 25 ...
$ County.x : Factor w/ 73 levels "avon","bedfordshire",..: 22 24 32 65 67 22 32 67 22 32 ...
$ providers: int 258 108 301 103 262 2 9 2 1 1 ...
$ beds : int 5545 3259 7191 2649 6819 27 185 24 70 13 ...
$ price : num 251593 152988 231986 143197 149324 ...
I want to order date.1 chronologically. Prior to apply ordered(), this variable does not contain NA observations.
> summary(is.na(p$date.1))
Mode FALSE NA's
logical 259 0
However, once I apply my function for ordering the levels corresponding to date.1:
p$date.1 = with(p, ordered(date.1, levels = c("Jun/2010", "Jul/2010",
"Aug/2010", "Sep/2010", "Oct/2010", "Nov/2010", "Dec/2010", "Jan/2011", "Feb/2011",
"Mar/2011","Apr/2011", "May/2011", "Jun/2011", "Jul/2011", "Aug/2011", "Sep/2011",
"Oct/2011", "Nov/2011", "Dec/2011" ,"Jan/2012", "Feb/2012" ,"Mar/2012" ,"Apr/2012",
"May/2012", "Jun/2012", "Jul/2012", "Aug/2012", "Sep/2012", "Oct/2012", "Nov/2012",
"Dec/2012", "Jan/2013", "Feb/2013", "Mar/2013", "Apr/2013", "May/2013",
"Jun/2013", "Jul/2013", "Aug/2013", "Sep/2013", "Oct/2013", "Nov/2013",
"Dec/2013", "Jan/2014",
"Feb/2014", "Mar/2014", "Apr/2014", "May/2014", "Jun/2014", "Jul/2014" ,"Aug/2014",
"Sep/2014", "Oct/2014", "Nov/2014", "Dec/2014", "Jan/2015", "Feb/2015", "Mar/2015",
"Apr/2015","May/2015", "Jun/2015" ,"Jul/2015" ,"Aug/2015", "Sep/2015", "Oct/2015",
"Nov/2015")))
It seems I miss some observations.
> summary(is.na(p$date.1))
Mode FALSE TRUE NA's
logical 250 9 0
Has anyone come across with this problem when using ordered()? or alternatively, is there any other possible solution to group my observations chronologically?
It is possible that one of your p$date.1 doesn't matched to any of the levels. Try this ord.monas the levels.
ord.mon <- do.call(paste, c(expand.grid(month.abb, 2010:2015), sep = "/"))
Then, you can try this to see if there's any mismatch between the two.
p$date.1 %in% ord.mon
Last, You can also sort the data frame after transforming the date.1 columng into Date (Note that you have to add an actual date beforehand)
p <- p[order(as.Date(paste0("01/", p$date.1), "%d/%b/%Y")), ]

How to make a 3D Mesh in RGL with shade3d or wire3d using tmesh3d in R

I have some data which I have collected.
It consists of Vertices and then Triangles which I have made using a meshing software.
I am able to use R with
trimesh(triangles, vertices)
to make a nice mesh plot.
But can't figure out how to use RGL to make an interactive plot that I can view, and I can't work out how to colour the faces of the mesh based on a different value in the data frame.
here are the vertices in a data frame. x, y, z are the coordinates of the nodes/points (nn)
'data.frame': 23796 obs. of 7 variables:
$ nn : int 0 1 2 3 4 5 6 7 8 9 ...
$ x : num 39.5 70.8 49 83.5 -16 ...
$ y : num 28.2 -2.97 -25.67 -9.1 -39.75 ...
$ z: num 160 158 109 121 188 ...
$ uni: num 3.87 6.64 5.02 4.48 1.91 ...
$ bi : num 0.749 0.784 1.045 0.935 0.733 ...
nn x y z uni bi
0 39.527 28.202 160.219 3.86942 0.74871
1 70.804 -2.966 157.578 6.64361 0.78373
2 48.982 -25.674 109.022 5.02491 1.0451
3 83.514 -9.096 120.988 4.47977 0.9348
4 -16.04 -39.749 188.467 1.90873 0.73286
5 74.526 -3.096 174.347 8.4263 0.70594
6 54.93 -56.347 151.496 7.53334 2.17128
7 56.936 -20.131 186.177 7.16118 1.44875
8 -14.627 -47.1 162.185 2.13939 0.70887
9 38.207 -59.201 147.993 5.83457 4.32971
10 50.645 -32.04 110.418 5.3741 1.14543
The triangles for the vertices are
'data.frame': 47602 obs. of 7 variables:
$ X : int 3435 3161 18424 13600 1564 21598 21283 1171 51 9331 ...
$ Y : int 19658 17204 17467 19721 10099 19018 11341 2723 15729 5851 ...
$ Z : int 2764 9466 16955 2669 10091 21205 18399 20833 15865 9106 ...
X Y Z
3435 19658 2764
3161 17204 9466
18424 17467 16955
13600 19721 2669
1564 10099 10091
21598 19018 21205
21283 11341 18399
1171 2723 20833
51 15729 15865
9331 5851 9106
310 3513 9121
5651 11928 15468
8594 2295 6852
22725 22636 11114
I need to make this into a mesh as I can in trimesh, but with RGL and I need to colour the faces of the mesh based on a scale of uni, where <0.5 is red, 0/5-1/5 is orange and >1.5 is green
It looks something like this in trimesh but how to i do it in RGL for R, WITH COLOURING BASED ON VALUE ON UNI in the first data table
Here is an example, starting with two dataframes.
> library(rgl)
> vertices
x y z
1 1 -1 1
2 1 -1 -1
3 1 1 -1
4 1 1 1
5 -1 -1 1
6 -1 -1 -1
7 -1 1 -1
8 -1 1 1
> triangles
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12
1 5 5 1 1 2 2 6 6 8 8 1 1
2 1 4 2 3 6 7 5 8 4 3 6 5
3 4 8 3 4 7 3 8 7 3 7 2 6
You need matrices to deal with tmesh3d. A row of 1's must be added to the table of vertices.
verts <- rbind(t(as.matrix(vertices)),1)
trgls <- as.matrix(triangles)
tmesh <- tmesh3d(verts, trgls)
Now you can plot the mesh:
wire3d(tmesh)
About colors, you have to associate one color to each triangle:
tmesh$material <- list(color=rainbow(ncol(trgls)))
wire3d(tmesh)
> shade3d(tmesh)
UPDATE 2019-03-09
The newest version of rgl (0.100.18) allows different interpretation of the material colors.
You can assign a color to each face:
vertices <- as.matrix(vertices)
triangles <- as.matrix(triangles)
mesh1 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(ncol(triangles)))
)
shade3d(mesh1, meshColor = "faces")
or assign a color to each vertex:
mesh2 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(nrow(vertices)))
)
shade3d(mesh2, meshColor = "vertices")

Combining factor levels in R 3.2.1

In previous versions of R I could combine factor levels that didn't have a "significant" threshold of volume using the following little function:
whittle = function(data, cutoff_val){
#convert to a data frame
tab = as.data.frame.table(table(data))
#returns vector of indices where value is below cutoff_val
idx = which(tab$Freq < cutoff_val)
levels(data)[idx] = "Other"
return(data)
}
This takes in a factor vector, looks for levels that don't appear "often enough" and combines all of those levels into one "Other" factor level. An example of this is as follows:
> sort(table(data$State))
05 27 35 40 54 84 9 AP AU BE BI DI G GP GU GZ HN HR JA JM KE KU L LD LI MH NA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
OU P PL RM SR TB TP TW U VD VI VS WS X ZH 47 BL BS DL M MB NB RP TU 11 DU KA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3
BW ND NS WY AK SD 13 QC 01 BC MT AB HE ID J NO LN NM ON NE VT UT IA MS AO AR ME
4 4 4 4 5 5 6 6 7 7 7 8 8 8 9 10 11 17 23 26 26 30 31 31 38 40 44
OR KS HI NV WI OK KY IN WV AL CO WA MN NH MO SC LA TN AZ IL NC MI GA OH ** CT DE
45 47 48 57 57 64 106 108 112 113 120 125 131 131 135 138 198 200 233 492 511 579 645 646 840 873 1432
RI DC TX MA FL VA MD CA NJ PA NY
1782 2513 6992 7027 10527 11016 11836 12221 15485 16359 34045
Now when I use whittle it returns me the following message:
> delete = whittle(data$State, 1000)
Warning message:
In `levels<-`(`*tmp*`, value = c("Other", "Other", "Other", "Other", :
duplicated levels in factors are deprecated
How can I modify my function so that it has the same effect but doesn't use these "deprecated" factor levels? Converting to a character, tabling, and then converting to the character "Other"?
I've always found it easiest (less typing and less headache) to convert to character and back for these sorts of operations. Keeping with your as.data.frame.table and using replace to do the replacement of the low-frequency levels:
whittle <- function(data, cutoff_val) {
tab = as.data.frame.table(table(data))
factor(replace(as.character(data), data %in% tab$data[tab$Freq < cutoff_val], "Other"))
}
Testing on some sample data:
state <- factor(c("MD", "MD", "MD", "VA", "TX"))
whittle(state, 2)
# [1] MD MD MD Other Other
# Levels: MD Other
I think this verison should work. The levels<- function allows you to collapse by assigning a list (see ?levels).
whittle <- function(data, cutoff_val){
tab <- table(data)
shouldmerge <- tab < cutoff_val
tokeep <- names(tab)[!shouldmerge]
tomerge <- names(tab)[shouldmerge]
nv <- c(as.list(setNames(tokeep,tokeep)), list("Other"=tomerge))
levels(data)<-nv
return(data)
}
And we test it with
set.seed(15)
x<-factor(c(sample(letters[1:10], 100, replace=T), sample(letters[11:13], 10, replace=T)))
table(x)
# x
# a b c d e f g h i j k l m
# 5 11 8 8 7 5 13 14 14 15 2 3 5
y <- whittle(x, 9)
table(y)
# y
# b g h i j Other
# 11 13 14 14 15 43
It's worth adding to this answer that the new forcats package contains the fct_lump() function which is dedicated to this.
Using #MrFlick's data:
x <- factor(c(sample(letters[1:10], 100, replace=T),
sample(letters[11:13], 10, replace=T)))
library(forcats)
library(magrittr) ## for %>% ; could also load dplyr
fct_lump(x, n=5) %>% table
# b g h i j Other
#11 13 14 14 15 43
The n argument specifies the number of most common values to preserve.
Here's another way of doing it by replacing all the items below the threshold with the first and then renaming that level to Other.
whittle <- function(x, thresh) {
belowThresh <- names(which(table(x) < thresh))
x[x %in% belowThresh] <- belowThresh[1]
levels(x)[levels(x) == belowThresh[1]] <- "Other"
factor(x)
}

can't draw the grouped value above stacked bar plot in ggplot2

I have a ggplot2 question, I run the code below show the stacked barplot without add value above each bar correctly:
p=ggplot(data=essnn)
p+geom_bar(binwidth=0.5,stat="identity")+ #
aes(x=reorder(classname,-amount,sum), y=amount, label=amount, fill = sort(year))+
theme()
I want add the sum amount grouped by year in each class, and here is my code:
+geom_text(aes(x=classes,y=total,label=total), data=essnnta, fill=NULL, size=3)
But an error message appear:
Error in fill = year, can not find object "year"
That's my problem: why the object "year" can be found when I draw stack bar plot without add the sum amount grouped by year in each class, but when I add the sum amount grouped by year, the error appear?
> str(essnn)
'data.frame': 48619 obs. of 15 variables:
$ id : int 2006051337 2006051337 2006051337 2006051337 2006051337 2006051337 2004070648 2006031360 2006031360 2004070062 ...
$ gender : Factor w/ 3 levels "","F","M": 3 3 3 3 3 3 3 3 3 3 ...
$ age : num 30 30 30 30 30 30 38 43 43 37 ...
$ class : Factor w/ 92 levels "100ab","100aa",..: 18 18 18 18 18 18 18 18 18 18 ...
$ classname: Factor w/ 1136 levels "cad"," Office2010",..: 111 111 111 111 111 111 116 107 107 107 ...
$ grade : num 7 5 6 8 3 4 1 4 3 2 ...
$ year : Factor w/ 6 levels "98","99","100",..: 3 3 3 3 2 2 4 5 5 3 ...
$ ses : num 212 210 211 213 207 208 217 221 220 210 ...
$ date : int 1010421 1001115 1010214 1010701 1000411 1000627 1020424 1030304 1021121 1001108 ...
$ money : num 5800 5800 5800 5800 5200 5200 3000 0 5500 5500 ...
$ discount : num 1160 1160 1160 1160 1040 1040 600 0 275 550 ...
$ amount : num 4640 4640 4640 4640 4160 ...
$ idc : Factor w/ 7 levels "在校生","校友",..: 2 2 2 2 2 2 2 7 7 7 ...
$ mdy : Date, format: "2012-04-21" "2011-11-15" "2012-02-14" "2012-07-01" ...
$ day : num 1123 1281 1190 1052 1499 ...
> str(essnnta)
'data.frame': 10 obs. of 2 variables:
$ classes: Factor w/ 10 levels "JD","JF",..: 1 7 8 4 6 10 3 5 2 9
$ total : num 55603526 43708950 43555010 35649129 33214372 ...
Your problem might be that your x-axes are not the same in the two data frames. So ggplot does not know which value corresponds with which stack. I am not sure about this as I don't understand the way you define your x axis in the original barplot. I also find it a bit strange to define the aes outside of the ggplot function or the geom_bar. But that might just be me be used to a different kind of syntax.
All in all I find it difficult to help you as you do not provide any reproducible example.
Here is a small bit of data, and a plot that sort of works. If you supplement your question with your data (or a subset of it), see if this works. You may also want to position the label at the top of each bar.
essnn <- data.frame(year = c(98,99,100,101,102),
classname = c("a", "b", "c", "d", "e"),
amount = c(1e6, 2e6,3e6,4e6,5e6))
essnnta <- data.frame(total = c(10, 20, 30, 40, 50))
ggplot(data=essnn, aes(x=reorder(classname,-amount, sum), y=amount, fill = year)) +
geom_bar(binwidth=0.5, stat="identity", position = "stack") +
geom_text(aes(x=essnn$classname, y=essnnta$total, label=essnnta$total), size=3) # not "classes"

Resources