Wrong value occur when converting points from UTM to WGS84 in R - r

I use the method from Stanislav in this topic of Forum, which is a question about "converting latitude and longitude points to UTM". I edited the function reversely to change points from UTM to WGS84, which is:
library(sp); library(rgdal)
#Function
UTMToLongLat<-function(x,y,zone){
xy <- data.frame(ID = 1:length(x), X = x, Y = y)
coordinates(xy) <- c("X", "Y")
proj4string(xy) <- CRS(paste("+proj=utm +zone=",zone," ellps=WGS84",sep=''))
res <- spTransform(xy, CRS("+proj=longlat +datum=WGS84"))
return(as.data.frame(res))
}
The example in the previous question mentioned above is tried, that is:
x2 <- c(-48636.65, 1109577); y2 <- c(213372.05, 5546301)
What is expected is (118, 10), (119, 50) in WGS84. Colin's example is in UTM51.
So, the following sentence is used:
done2 <- UTMToLongLat(x2,y2,51)
However, it produced: (118.0729, 1.92326), (131.4686, 49.75866).
What is wrong? By the way, how to control the decimal digits of the output?

First, you mistook the expression of the coordinate. It should be:
x <- c(-48636.65, 213372.05)
y <- c(1109577, 5546301)
In the function, it will be transformed and stored as:
> data.frame(ID = 1:length(x), X = x, Y = y)
# ID X Y
# 1 1 -48636.65 1109577
# 2 2 213372.05 5546301
And execute your function again:
> UTMToLongLat(x, y, 51)
# ID X Y
# 1 1 118 9.999997
# 2 2 119 50.000001
To control the decimal digits:
> round(UTMToLongLat(x, y, 51))
# ID X Y
# 1 1 118 10
# 2 2 119 50

Related

Calculating the distance between two latitude and longitude Geocordinate in a R dataframe

I would like to calculate the distance between two latitude and longitude locations in a dataframe.
df <- tibble("lat1"=c(0,1,2),"lon1"=c(0,1,2),"lat2"=c(90,91,92),"lon2"=c(90,91,92))
df %>%
dplyr::mutate(distance_trip=mapply(FUN = distHaversine,c(lat1, lon1),c(lat2, lon2)))
The error that I am getting is:
*Error: Problem with `mutate()` column `distance_trip`.
ℹ `distance_trip = mapply(FUN = distHaversine, c(lat1, lon1), c(lat2, lon2))`.
x Wrong length for a vector, should be 2
Run `rlang::last_error()` to see where the error occurred.*
Not sure why I am unable to apply the function to the data frame.
A few things going on here:
geosphere::distHaversine is vectorized, (matricized?), so that if we give it matrix input, we don't need to use mapply to do it one-row-at-a-time. This will be much faster.
c() just sticks things together. c(lat1, lon1) will be 0, 1, 2, 0, 1, 2, one after the other. We need cbind to make a matrix. (I see what you were going for with mapply, but the long c(lat1, lon1) vector is what you're passing in to mapply, instead you'd need to pass lat1 and lon1 in separately and c() the individual items inside the FUN... but the matrix approach will be better.)
Despite the colloquial usage of "lat, lon", almost all functions expect "lon, lat", as longitude is more like "x" and latitude is more like "y" and the (x, y) paradigm holds.
The maximum latitude is 90, but your lat2 has values 91 and 92, which will cause an error if not addressed.
Fixing all of these, we can use:
df <- tibble(
"lat1"=c(0,1,2),
"lon1"=c(0,1,2),
"lat2"=4:6, ## valid latitudes
"lon2"=c(90,91,92)
)
library(dplyr)
library(geosphere)
df %>%
mutate(distance_trip = geosphere::distHaversine(
cbind(lon1, lat1), cbind(lon2, lat2)
))
# # A tibble: 3 × 5
# lat1 lon1 lat2 lon2 distance_trip
# <dbl> <dbl> <int> <dbl> <dbl>
# 1 0 0 4 90 10018754.
# 2 1 1 5 91 10009053.
# 3 2 2 6 92 9995487.
This is how to make it work with mapply--it's a bit more awkward to write and will be slower too :(
df %>%
mutate(distance_trip = mapply(
FUN = function(x1, y1, x2, y2) {
distHaversine(c(x1, y1), c(x2, y2))
},
x1 = lon1, y1 = lat1, x2 = lon2, y2 = lat2
))
## same result as above

How to make calculations across two different lists of dataframes?

I have two lists of data frames, such that data is a list of 47 data frames, where each data frame has columns [coords, x, y, liklihood, x.1, x.2, liklihood.1, etc.] and dataA is a list of 47 data frames each of the same length as those in data, but with fewer columns [coords, x, y] that represent different coordinates.
I want to create a third list, or add a column to each data frame in one of the lists, that will contain the distance calculation from pointDistance(p1, p2) where p1 is the x and y columns of each data frame in list data, and p2 is the x and y columns of each data frame in list dataA.
I am trying to keep the dataframes in lists rather than having 47*2 individual data frames in my global environment.
Minimal Reproducible Example:
coords <- rnorm(10)
x <- rnorm(10)
y <- rnorm(10)
liklihood <- rnorm(10)
x.1 <- rnorm(10)
y.1 <- rnorm(10)
day1 <- data.frame(coords,x,y,liklihood,x.1,y.1)
coords <- rnorm(10)
x <- rnorm(10)
y <- rnorm(10)
liklihood <- rnorm(10)
x.1 <- rnorm(10)
y.1 <- rnorm(10)
day2 <- data.frame(coords,x,y,liklihood,x.1,y.1)
data <- list(day1,day2)
coords <- rnorm(10)
x <- rnorm(10)
y <- rnorm(10)
liklihood <- rnorm(10)
day1 <- data.frame(coords,x,y,liklihood)
coords <- rnorm(10)
x <- rnorm(10)
y <- rnorm(10)
liklihood <- rnorm(10)
day2 <- data.frame(coords,x,y,liklihood)
dataA <- list(day1,day2)
You can use mapply in base R to do this.
First, write a function that would return a single correct data frame if it was given a pair of data frames from your two lists, like data[[1]] and dataA[[1]]
library(raster)
append_distances <- function(df1, df2)
{
df1$distance <- pointDistance(cbind(df1$x, df1$y), cbind(df2$x, df2$y), lonlat = FALSE)
return(df1)
}
Now we just pass this function and your two lists to mapply:
data <- mapply(append_distances, data, dataA, SIMPLIFY = FALSE)
and now each data frame indata has a distance column added:
data
#> [[1]]
#> coords x y liklihood x.1 y.1 distance
#> 1 0.4761741 0.7913819 0.11597299 -0.6159504 -0.17626836 -0.8649915 2.1378779
#> 2 0.2608518 0.4389639 -1.44510285 -0.5452702 -2.31927588 -0.5114613 3.0321765
#> 3 2.1098629 0.3457442 1.59630572 -0.3205454 0.25760236 1.6791924 0.4150714
#> 4 0.5937334 -0.2043505 0.23667944 -0.2480409 -0.52856599 -0.4263619 1.6662791
#> 5 0.2819461 -1.9768319 0.68344331 -0.4975349 -0.08315893 0.9271072 2.3841079
#> 6 0.5779044 -0.5706433 0.89377684 -1.0084165 -0.83697268 0.9928353 0.6818632
#> 7 0.1410554 -0.6133513 0.25957971 -0.1781339 -0.77489990 -0.7191718 0.8303696
#> 8 -1.1769578 0.9203776 -0.06258728 -0.8991639 -0.38907408 -0.8388408 0.5028145
#> 9 -0.1388739 -0.8279408 1.15568431 -0.3312423 1.17269754 -1.4530041 1.6042288
#> 10 -0.3755364 0.6285803 0.52453490 0.7323463 -0.49051839 -0.1949171 0.6205714
#>
#> [[2]]
#> coords x y liklihood x.1 y.1 distance
#> 1 2.2158425 0.16430566 -0.5721804 -0.7523029 0.2866881 -2.027529031 0.4418775
#> 2 1.5753250 -0.67190607 -0.1140359 -0.3125333 -0.5361148 0.153228235 1.7182954
#> 3 0.8558108 1.19404509 -1.5834463 0.3858246 0.4475970 0.460910344 1.6229581
#> 4 0.8027824 0.76579023 -0.5938679 0.5592208 0.5883806 0.231569460 3.3608275
#> 5 -1.1487244 0.01013471 0.6855049 0.7148735 -2.2822053 1.918921619 2.3790501
#> 6 0.1014336 0.73941541 -0.4487482 0.1758588 0.8579709 0.029777437 1.8923570
#> 7 -0.8238857 0.67911991 -0.9140873 -0.6887611 -1.0709704 -0.009789701 1.4694983
#> 8 -0.1553338 0.78560221 -0.8218460 -0.5537232 0.7295692 0.744225760 2.4279377
#> 9 -0.6297834 0.09747354 0.2048211 -1.0849396 -0.2201589 0.173386536 0.8638957
#> 10 -0.4616377 -0.51116686 0.3204535 -0.5285903 1.0053890 -0.534173400 1.0715881

Randomise across columns for half a dataset

I have a data set for MMA bouts.
The structure currently is
Fighter 1, Fighter 2, Winner
x y x
x y x
x y x
x y x
x y x
My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.
I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.
Ideally i would have this
Fighter 1, Fighter 2, Winner
x y x
y x x
x y y
y x x
x y y
is there a way to randomise across columns without messing up the order of the rows ??
I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.
Sample data:
set.seed(42)
x <- data.frame(
F1 = sample(letters, size = 5),
F2 = sample(LETTERS, size = 5),
stringsAsFactors = FALSE
)
x$W <- x$F1
x
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 g D g
# 4 t P t
# 5 o W o
Choose some rows to change, randomly:
(ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
# [1] 3 5 4
This means that we expect rows 3-5 to change.
Now the random changes:
within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 D g g
# 4 P t t
# 5 W o o
Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.
I also found that this code worked
matches_clean[, c("fighter1", "fighter2")] <- lapply(matches_clean[, c("fighter1", "fighter2")], as.character)
changeInd <- !!((match(matches_clean$fighter1, levels(as.factor(matches_clean$fighter1))) -
match(matches_clean$fighter2, levels(as.factor(matches_clean$fighter2)))) %% 2)
matches_clean[changeInd, c("fighter1", "fighter2")] <- matches_clean[changeInd, c("fighter2", "fighter1")]

number elements in a vector with constraints

Given x and y I wish to create the desired.result below:
x <- 1:10
y <- c(2:4,6:7,8:9)
desired.result <- c(1,2,2,2,3,4,4,5,5,6)
where, in effect, each sequence in y is replaced in x by the the first element in the sequence in y and then the elements of the new x are numbered.
The intermediate step for x would be:
x.intermediate <- c(1,2,2,2,5,6,6,8,8,10)
Below is code that does this. However, the code is not general and is overly complex:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y1 <- rep(min(unlist(y[1])), length(unlist(y[1])))
y2 <- rep(min(unlist(y[2])), length(unlist(y[2])))
y3 <- rep(min(unlist(y[3])), length(unlist(y[3])))
new.x <- x
new.x[unlist(y[1])] <- y1
new.x[unlist(y[2])] <- y2
new.x[unlist(y[3])] <- y3
rep(unique.x, rle(new.x)$lengths)
[1] 1 2 2 2 3 4 4 5 5 6
Below is my attempt to generalize the code. However, I am stuck on the second lapply.
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y2 <- lapply(y, function(i) rep(min(i), length(i)))
new.x <- x
lapply(y2, function(i) new.x[i[1]:(i[1]-1+length(i))] = i)
rep(unique.x, rle(new.x)$lengths)
Thank you for any advice. I suspect there is a much simpler solution I am overlooking. I prefer a solution in base R.
A solution like this should work:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
x[unlist(y)]<-rep(sapply(y,'[',1),lapply(y,length))
rep(1:length(rle(x)$lengths), rle(x)$lengths)
## [1] 1 2 2 2 3 4 4 5 5 6

How to combine two vectors into a data frame

I have two vectors like this
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
I'd like to output the dataframe like this:
> print(df)
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
What's the way to do it?
While this does not answer the question asked, it answers a related question that many people have had:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
df <- data.frame(x,y)
names(df) <- c(x_name,y_name)
print(df)
cond rating
1 1 100
2 2 200
3 3 300
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
require(reshape2)
df <- melt(data.frame(x,y))
colnames(df) <- c(x_name, y_name)
print(df)
UPDATE (2017-02-07):
As an answer to #cdaringe comment - there are multiple solutions possible, one of them is below.
library(dplyr)
library(magrittr)
x <- c(1, 2, 3)
y <- c(100, 200, 300)
z <- c(1, 2, 3, 4, 5)
x_name <- "cond"
y_name <- "rating"
# Helper function to create data.frame for the chunk of the data
prepare <- function(name, value, xname = x_name, yname = y_name) {
data_frame(rep(name, length(value)), value) %>%
set_colnames(c(xname, yname))
}
bind_rows(
prepare("x", x),
prepare("y", y),
prepare("z", z)
)
This should do the trick, to produce the data frame you asked for, using only base R:
df <- data.frame(cond=c(rep("x", times=length(x)),
rep("y", times=length(y))),
rating=c(x, y))
df
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
However, from your initial description, I'd say that this is perhaps a more likely usecase:
df2 <- data.frame(x, y)
colnames(df2) <- c(x_name, y_name)
df2
cond rating
1 1 100
2 2 200
3 3 300
[edit: moved parentheses in example 1]
You can use expand.grid( ) function.
x <-c(1,2,3)
y <-c(100,200,300)
expand.grid(cond=x,rating=y)
Here's a simple function. It generates a data frame and automatically uses the names of the vectors as values for the first column.
myfunc <- function(a, b, names = NULL) {
setNames(data.frame(c(rep(deparse(substitute(a)), length(a)),
rep(deparse(substitute(b)), length(b))), c(a, b)), names)
}
An example:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
myfunc(x, y, c(x_name, y_name))
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
df = data.frame(cond=c(rep("x",3),rep("y",3)),rating=c(x,y))
Alt simplification of https://stackoverflow.com/users/1969435/gx1sptdtda above:
cond <-c(1,2,3)
rating <-c(100,200,300)
df <- data.frame(cond, rating)
df
cond rating
1 1 100
2 2 200
3 3 300

Resources