Connecting two sets of coordinates to create lines using sf/mapview

Connecting two sets of coordinates to create lines using sf/mapview - r

I have a dataset where a bird captured in one location (Blong, Blat) then encountered again in another (Elong, Elat). These coordinates are in a lat/long format, and I'd like to connect the capture and encounter locations with a line and plot them in mapview.
In the data below, each row is an individual bird with its capture/encounter coordinates, and the flyway that it belongs to (which I would like to use to color the lines in mapview.
dat <- structure(list(Blong = c(-75.58333, -76.08333, -81.08333, -94.25,
-75.41667, -99.41667, -77.41667, -116.08333, -89.58333, -77.58333
), Blat = c(37.58333, 40.58333, 42.75, 41.91667, 38.25, 28.25,
38.91667, 43.58333, 44.25, 38.91667), Elong = c(-65.91667, -75.75,
-80.58333, -95.41667, -73.58333, -89.41667, -77.58333, -116.41667,
-96.41667, -77.41667), Elat = c(45.91667, 40.58333, 42.75, 29.75,
45.58333, 48.25, 38.75, 43.58333, 34.08333, 38.91667), Flyway = structure(c(2L,
2L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 2L), .Label = c("Central", "Eastern",
"West"), class = "factor")), .Names = c("Blong", "Blat", "Elong",
"Elat", "Flyway"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
A look at the data:
# A tibble: 10 x 5
Blong Blat Elong Elat Flyway
<dbl> <dbl> <dbl> <dbl> <fct>
1 -75.6 37.6 -65.9 45.9 Eastern
2 -76.1 40.6 -75.8 40.6 Eastern
3 -81.1 42.8 -80.6 42.8 Eastern
4 -94.2 41.9 -95.4 29.8 Central
5 -75.4 38.2 -73.6 45.6 Eastern
6 -99.4 28.2 -89.4 48.2 Eastern
7 -77.4 38.9 -77.6 38.8 Eastern
8 -116. 43.6 -116. 43.6 West
9 -89.6 44.2 -96.4 34.1 Eastern
10 -77.6 38.9 -77.4 38.9 Eastern
I've looked a few examples, but haven't found one that looks quite like my data set.

The tricky thing is to create a valid LINESTRING object from the coordinate pairs in wide format. sf expects linestring coordinates in rows of a matrix. Here's a way that works. The sfc column of a sf object is a list so here we use lapply to loop over the rows of the data you provided.
library(sf)
library(mapview)
b = dat[, c("Blong", "Blat")]
names(b) = c("long", "lat")
e = dat[, c("Elong", "Elat")]
names(e) = c("long", "lat")
dat$geometry = do.call(
"c",
lapply(seq(nrow(b)), function(i) {
st_sfc(
st_linestring(
as.matrix(
rbind(b[i, ], e[i, ])
)
),
crs = 4326
)
}))
dat_sf = st_as_sf(dat)
mapview(dat_sf, zcol = "Flyway")

Related

How to calculate mean of column, then paste mean value as row value in another data frame in R?

I have 36 data frames that each contain columns titled "lon", "lat", and "bottom_temp". Each different data frame represents data from a year between 1980 and 2015. I have a seperate dataframe called "month3_avg_box" that contains two columns: "year" and "avg_bottom_temp". The year column of the "month3_avg_box" data frame contains one row for each year between 1980-2015. I would like to find the average value of each "bottom_temp" column in each of the 36 data frames I have, and place each mean in the corresponding row of the new "month3_avg_box" data frame I have. I will write a mini example of what I'd like:
1980_df:
lon lat bottom_temp
-75.61 39.1 11.6
-75.60 39.1 11.5
-75.59 39.1 11.6
-75.58 39.1 11.7
(mean of bottom_temp column for 1980_df = 11.6)
1981_df:
lon lat bottom_temp
-75.57 39.1 11.9
-75.56 39.1 11.9
-75.55 39.1 12.0
-75.54 39.1 11.8
(mean of bottom_temp column for 1981_df = 11.9)
1982_df:
lon lat bottom_temp
-75.57 39.1 11.6
-75.56 39.1 11.7
-75.55 39.1 11.9
-75.54 39.1 11.2
(mean of bottom_temp column for 1982_df = 11.6)
Now, I'd like to take these averages and put them into my "month3_avg_box" data frame so it looks like:
month3_avg_box:
Year Avg_bottom_temp
1980 11.6
1981 11.9
1982 11.6
Does this make sense? How can I do this?

We may get the datasets in a list, bind the datasets, create a 'Year' column from the named list, do a group by mean
library(dplyr)
library(stringr)
lst(`1980_df`, `1981_df`, `1982_df`) %>%
bind_rows(.id = 'Year') %>%
group_by(Year = str_remove(Year, '_df')) %>%
summarise(Avg_bottom_temp = mean(bottom_temp))
-output
# A tibble: 3 × 2
Year Avg_bottom_temp
<chr> <dbl>
1 1980 11.6
2 1981 11.9
3 1982 11.6
data
`1980_df` <- structure(list(lon = c(-75.61, -75.6, -75.59, -75.58), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.5, 11.6, 11.7)), class = "data.frame", row.names = c(NA,
-4L))
`1981_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.9, 11.9, 12, 11.8)), class = "data.frame", row.names = c(NA,
-4L))
`1982_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.7, 11.9, 11.2)), class = "data.frame", row.names = c(NA,
-4L))

Summarise across each column by grouping their names

I want to calculate the weighted variance using the weights provided in the dataset, while group for the countries and cities, however the function returns NAs:
library(Hmisc) #for the 'wtd.var' function
weather_winter.std<-weather_winter %>%
group_by(country, capital_city) %>%
summarise(across(starts_with("winter"),wtd.var))
The provided output from the console (when in long format):
# A tibble: 35 x 3
# Groups: country [35]
country capital_city winter
<chr> <chr> <dbl>
1 ALBANIA Tirane NA
2 AUSTRIA Vienna NA
3 BELGIUM Brussels NA
4 BULGARIA Sofia NA
5 CROATIA Zagreb NA
6 CYPRUS Nicosia NA
7 CZECHIA Prague NA
8 DENMARK Copenhagen NA
9 ESTONIA Tallinn NA
10 FINLAND Helsinki NA
# … with 25 more rows
This is the code that I used to get the data from a wide format into a long format:
weather_winter <- weather_winter %>% pivot_longer(-c(31:33))
weather_winter$name <- NULL
names(weather_winter)[4] <- "winter"
Some example data:
structure(list(`dec-wet_2011` = c(12.6199998855591, 12.6099996566772,
14.75, 11.6899995803833, 18.2899990081787), `dec-wet_2012` = c(13.6300001144409,
14.2199993133545, 14.2299995422363, 16.1000003814697, 18.0299987792969
), `dec-wet_2013` = c(4.67999982833862, 5.17000007629395, 4.86999988555908,
7.56999969482422, 5.96000003814697), `dec-wet_2014` = c(14.2999992370605,
14.4799995422363, 13.9799995422363, 15.1499996185303, 16.1599998474121
), `dec-wet_2015` = c(0.429999977350235, 0.329999983310699, 1.92999994754791,
3.30999994277954, 7.42999982833862), `dec-wet_2016` = c(1.75,
1.29999995231628, 3.25999999046326, 6.60999965667725, 8.67999935150146
), `dec-wet_2017` = c(13.3400001525879, 13.3499994277954, 15.960000038147,
10.6599998474121, 14.4699993133545), `dec-wet_2018` = c(12.210000038147,
12.4399995803833, 11.1799993515015, 10.75, 18.6299991607666),
`dec-wet_2019` = c(12.7199993133545, 13.3800001144409, 13.9899997711182,
10.5299997329712, 12.3099994659424), `dec-wet_2020` = c(15.539999961853,
16.5200004577637, 11.1799993515015, 14.7299995422363, 13.5499992370605
), `jan-wet_2011` = c(8.01999950408936, 7.83999967575073,
10.2199993133545, 13.8899993896484, 14.5299997329712), `jan-wet_2012` = c(11.5999994277954,
11.1300001144409, 12.5500001907349, 10.1700000762939, 22.6199989318848
), `jan-wet_2013` = c(17.5, 17.4099998474121, 15.5599994659424,
13.3199996948242, 20.9099998474121), `jan-wet_2014` = c(12.5099992752075,
12.2299995422363, 15.210000038147, 9.73999977111816, 9.63000011444092
), `jan-wet_2015` = c(17.6900005340576, 16.9799995422363,
11.75, 9.9399995803833, 19), `jan-wet_2016` = c(15.6099996566772,
15.5, 14.5099992752075, 10.3899993896484, 18.4499988555908
), `jan-wet_2017` = c(9.17000007629395, 9.61999988555908,
9.30999946594238, 15.8499994277954, 11.210000038147), `jan-wet_2018` = c(8.55999946594238,
9.10999965667725, 13.2599992752075, 9.85999965667725, 15.8899993896484
), `jan-wet_2019` = c(17.0699996948242, 16.8699989318848,
14.5699996948242, 19.0100002288818, 19.4699993133545), `jan-wet_2020` = c(6.75999975204468,
6.25999975204468, 6.00999975204468, 5.35999965667725, 8.15999984741211
), `feb-wet_2011` = c(9.1899995803833, 8.63999938964844,
6.21999979019165, 9.82999992370605, 4.67999982833862), `feb-wet_2012` = c(12.2699995040894,
11.6899995803833, 8.27999973297119, 14.9399995803833, 13.0499992370605
), `feb-wet_2013` = c(15.3599996566772, 15.9099998474121,
17.0599994659424, 13.3599996566772, 16.75), `feb-wet_2014` = c(10.1999998092651,
11.1399993896484, 13.8599996566772, 10.7399997711182, 7.35999965667725
), `feb-wet_2015` = c(11.9200000762939, 12.2699995040894,
8.01000022888184, 14.5299997329712, 5.71999979019165), `feb-wet_2016` = c(14.6999998092651,
14.7799997329712, 16.7899990081787, 4.90000009536743, 19.3500003814697
), `feb-wet_2017` = c(8.98999977111816, 9.17999935150146,
11.7699995040894, 6.3899998664856, 13.9899997711182), `feb-wet_2018` = c(16.75,
16.8599987030029, 12.0599994659424, 16.1900005340576, 8.51000022888184
), `feb-wet_2019` = c(7.58999967575073, 7.26999998092651,
8.21000003814697, 7.57999992370605, 8.81999969482422), `feb-wet_2020` = c(10.6399993896484,
10.4399995803833, 13.4399995803833, 8.53999996185303, 19.939998626709
), country = c("SERBIA", "SERBIA", "SLOVENIA", "GREECE",
"CZECHIA"), capital_city = c("Belgrade", "Belgrade", "Ljubljana",
"Athens", "Prague"), weight = c(20.25, 19.75, 14.25, 23.75,
14.25)), row.names = c(76L, 75L, 83L, 16L, 5L), class = "data.frame")

Your code seems to provide the right answer, now there's more data:
# Groups: country [4]
country capital_city winter
<chr> <chr> <dbl>
1 CZECHIA Prague 27.2
2 GREECE Athens 14.6
3 SERBIA Belgrade 19.1
4 SLOVENIA Ljubljana 16.3
Is this what you were looking for?
I took the liberty of streamlining your code:
weather_winter <- weather_winter %>%
pivot_longer(-c(31:33), values_to = "winter") %>%
select(-name)
weather_winter.std <- weather_winter %>%
group_by(country, capital_city) %>%
summarise(winter = wtd.var(winter))
With only one "winter" column, there's no need for the across().
Finally, you are not using the weights. If these are needed, then change the last line to:
summarise(winter = wtd.var(winter, weights = weight))
To give:
# A tibble: 4 x 3
# Groups: country [4]
country capital_city winter
<chr> <chr> <dbl>
1 CZECHIA Prague 26.3
2 GREECE Athens 14.2
3 SERBIA Belgrade 18.8
4 SLOVENIA Ljubljana 15.8

Merge two dataframes: specifically merge a selection of columns based on two conditions?

I have two datasets on the same 2 patients. With the second dataset I want to add new information to the first, but I can't seem to get the code right.
My first (incomplete) dataset has a patient ID, measurement time (either T0 or FU1), year of birth, date of the CT scan, and two outcomes (legs_mass and total_mass):
library(tidyverse)
library(dplyr)
library(magrittr)
library(lubridate)
df1 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L,
6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
"FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
legs_mass = c(9.1, NA, NA, NA), total_mass = c(14.5, NA,
NA, NA)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
# Which gives the following dataframe
df1
# A tibble: 4 x 6
ID time year_of_birth date_ct legs_mass total_mass
<dbl> <fct> <dbl> <date> <dbl> <dbl>
1 115 T0 1970 2015-08-04 9.1 14.5
2 115 FU1 1970 2018-11-07 NA NA
3 370 T0 1961 2015-08-04 NA NA
4 370 FU1 1961 2020-09-30 NA NA
The second dataset adds to the legs_mass and total_mass columns:
df2 <- structure(list(ID = c(115, 370), date_ct = structure(c(17842,
18535), class = "Date"), ctscan_label = c("PXE115_CT_20181107_xxxxx-3.tif",
"PXE370_CT_20200930_xxxxx-403.tif"), legs_mass = c(956.1, 21.3
), total_mass = c(1015.9, 21.3)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
# Which gives the following dataframe:
df2
# A tibble: 2 x 5
ID date_ct ctscan_label legs_mass total_mass
<dbl> <date> <chr> <dbl> <dbl>
1 115 2018-11-07 PXE115_CT_20181107_xxxxx-3.tif 956. 1016.
2 370 2020-09-30 PXE370_CT_20200930_xxxxx-403.tif 21.3 21.3
What I am trying to do, is...
Add the legs_mass and total_mass column values from df2 to df1, based on ID number and date_ct.
Add the new columns of df2 (the one that is not in df1; ctscan_label) to df1, also based on the date of the ct and patient ID.
So that the final dataset df3 looks as follows:
df3 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L,
6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
"FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
legs_mass = c(9.1, 956.1, NA, 21.3), total_mass = c(14.5,
1015.9, NA, 21.3)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
# Corresponding to the following tibble:
# A tibble: 4 x 6
ID time year_of_birth date_ct legs_mass total_mass
<dbl> <fct> <dbl> <date> <dbl> <dbl>
1 115 T0 1970 2015-08-04 9.1 14.5
2 115 FU1 1970 2018-11-07 956. 1016.
3 370 T0 1961 2015-08-04 NA NA
4 370 FU1 1961 2020-09-30 21.3 21.3
I have tried the merge function and rbind from baseR, and bind_rows from dplyr but can't seem to get it right.
Any help?

You can join the two datasets and use coalesce to keep one non-NA value from the two datasets.
library(dplyr)
left_join(df1, df2, by = c("ID", "date_ct")) %>%
mutate(leg_mass = coalesce(legs_mass.x , legs_mass.y),
total_mass = coalesce(total_mass.x, total_mass.y)) %>%
select(-matches('\\.x|\\.y'), -ctscan_label)
# ID time year_of_birth date_ct leg_mass total_mass
# <dbl> <fct> <dbl> <date> <dbl> <dbl>
#1 115 T0 1970 2015-08-04 9.1 14.5
#2 115 FU1 1970 2018-11-07 956. 1016.
#3 370 T0 1961 2015-08-04 NA NA
#4 370 FU1 1961 2020-09-30 21.3 21.3

We can use data.table methods
library(data.table)
setDT(df1)[setDT(df2), c("legs_mass", "total_mass") :=
.(fcoalesce(legs_mass, i.legs_mass),
fcoalesce(total_mass, i.total_mass)), on = .(ID, date_ct)]
-output
df1
ID time year_of_birth date_ct legs_mass total_mass
1: 115 T0 1970 2015-08-04 9.1 14.5
2: 115 FU1 1970 2018-11-07 956.1 1015.9
3: 370 T0 1961 2015-08-04 NA NA
4: 370 FU1 1961 2020-09-30 21.3 21.3

R: create a data.frame or data.table from a list and unpacked list of lists

I have a list and a list of lists and would like to create a data.frame or data.table.
Here is the list:
head(stadte_namen)
[1] "Berlin" "Hamburg" "München"
and a list of lists
> head(result)
[[1]]
min max
x 13.22886 13.54886
y 52.35704 52.67704
[[2]]
min max
x 9.840654 10.16065
y 53.390341 53.71034
[[3]]
min max
x 11.36078 11.72291
y 48.06162 48.24812
How could I create a data.frame or a data.table with the following structure?
name xmin ymin xmax ymax
Berlin 13.22886 52.35704 13.54886 52.67704
Hamburg 9.840654 53.390341 10.16065 53.71034
München 11.36078 48.06162 11.72291 48.24812
...
Here is the data:
stadte_namen<-c("Berlin", "Hamburg", "München", "Köln", "Frankfurt am Main",
"Stuttgart")
result<-list(structure(c(13.2288599, 52.3570365, 13.5488599, 52.6770365
), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"), c("min", "max"
))), structure(c(9.840654, 53.390341, 10.160654, 53.710341), .Dim = c(2L,
2L), .Dimnames = list(c("x", "y"), c("min", "max"))), structure(c(11.360777,
48.0616244, 11.7229083, 48.2481162), .Dim = c(2L, 2L), .Dimnames = list(
c("x", "y"), c("min", "max"))), structure(c(6.7725303, 50.8304399,
7.162028, 51.0849743), .Dim = c(2L, 2L), .Dimnames = list(c("x",
"y"), c("min", "max"))), structure(c(8.4727933, 50.0155435, 8.8004716,
50.2271408), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"),
c("min", "max"))), structure(c(9.0386007, 48.6920188, 9.3160228,
48.8663994), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"),
c("min", "max"))))

You can also try:
l <- result
df <- data.frame(t(sapply(l,c)))
colnames(df) <- c("minX", "minY", "maxX", "maxY"); df
df$stadte_namen <- c("Berlin", "Hamburg", "München", "Köln", "Frankfurt am Main",
"Stuttgart");df
Answer:
minX minY maxX maxY stadte_namen
1 13.228860 52.35704 13.548860 52.67704 Berlin
2 9.840654 53.39034 10.160654 53.71034 Hamburg
3 11.360777 48.06162 11.722908 48.24812 München
4 6.772530 50.83044 7.162028 51.08497 Köln
5 8.472793 50.01554 8.800472 50.22714 Frankfurt am Main
6 9.038601 48.69202 9.316023 48.86640 Stuttgart

With lapply and purrr:
library(dplyr)
library(purrr)
data <- lapply(result, function(x) c(xmin = x[1,1],
xmax = x[1,2],
ymin = x[2,1],
ymax = x[2,2])) %>%
purrr::map_dfr(~.x)
data$stadte_namen <- stadte_namen
# A tibble: 6 x 5
xmin xmax ymin ymax stadte_namen
<dbl> <dbl> <dbl> <dbl> <chr>
1 13.2 13.5 52.4 52.7 Berlin
2 9.84 10.2 53.4 53.7 Hamburg
3 11.4 11.7 48.1 48.2 München
4 6.77 7.16 50.8 51.1 Köln
5 8.47 8.80 50.0 50.2 Frankfurt am Main
6 9.04 9.32 48.7 48.9 Stuttgart

Assign stadte_namen as names to result and bind the dataframe together in one dataframe. You can get the data in wide format using pivot_wider.
library(tidyverse)
map_df(setNames(result, stadte_namen), ~.x %>%
as.data.frame %>%
rownames_to_column('row'), .id = 'name') %>%
pivot_wider(names_from = row, values_from = c(min, max))
# name min_x min_y max_x max_y
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Berlin 13.2 52.4 13.5 52.7
#2 Hamburg 9.84 53.4 10.2 53.7
#3 München 11.4 48.1 11.7 48.2
#4 Köln 6.77 50.8 7.16 51.1
#5 Frankfurt am Main 8.47 50.0 8.80 50.2
#6 Stuttgart 9.04 48.7 9.32 48.9

Converting values to special character in summary

I have created a summary table like below
Name Sales
AS 71.5%
DY 88.4%
VH 44.6%
MY 86.9%
HU 42.3%
TT 67.2%
BG 0.0%
SA 85.3%
now I want to replace the occurrence of 0.0 to "-"
I have tried
tab[,2] <- paste0(tab[,2],"%")
tab[,2] <- replace(tab[,2],tab[,2]<0,"-")
but its converting all values like 8.0 and 7.0 to "-"
do we have any other sollution
the output should be like
Name Sales
AS 71.5%
DY 88.4%
BG -
so the whole function is like this, have three columns of os sales for each person

You can try this:
#Data
df <- structure(list(Name = structure(c(1L, 3L, 8L, 5L, 4L, 7L, 2L,
6L), .Label = c("AS", "BG", "DY", "HU", "MY", "SA", "TT", "VH"
), class = "factor"), Sales = c(71.5, 88.4, 44.6, 86.9, 42.3,
67.2, 0, 85.3)), class = "data.frame", row.names = c(NA, -8L))
#Code
index <- which(df$Sales==0)
df$Sales[index] <- '-'
Name Sales
1 AS 71.5
2 DY 88.4
3 VH 44.6
4 MY 86.9
5 HU 42.3
6 TT 67.2
7 BG -
8 SA 85.3
Update with new data
New data has been provided:
df2 <- structure(list(Name = c("AS", "DY", "VH", "MY", "HU", "TT", "BG",
"SA"), Sales = c("71.5%", "88.4%", "44.6%", "86.9%", "42.3%",
"67.2%", "0.0%", "85.3%")), class = "data.frame", row.names = c(NA,
-8L))
df2$Sales2 <- gsub("0.0%","-",df2$Sales,fixed=T)
Name Sales Sales2
1 AS 71.5% 71.5%
2 DY 88.4% 88.4%
3 VH 44.6% 44.6%
4 MY 86.9% 86.9%
5 HU 42.3% 42.3%
6 TT 67.2% 67.2%
7 BG 0.0% -
8 SA 85.3% 85.3%
Update with variable
Using first data df:
df$tab <- paste0(df$Sales,'%')
df$tab <- ifelse(nchar(df$tab)==2,gsub("0%","-",df$tab,fixed=T),df$tab)
Name Sales tab
1 AS 71.5 71.5%
2 DY 88.4 88.4%
3 VH 44.6 44.6%
4 MY 86.9 86.9%
5 HU 42.3 42.3%
6 TT 67.2 67.2%
7 BG 0.0 -
8 SA 85.3 85.3%

Try this:
tab$Sales <- replace(tab$Sales, which(tab$Sales == 0), "-")
I'd also recommend looking into dplyr's mutate.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Connecting two sets of coordinates to create lines using sf/mapview - r

Related

How to calculate mean of column, then paste mean value as row value in another data frame in R?

Summarise across each column by grouping their names

Merge two dataframes: specifically merge a selection of columns based on two conditions?

R: create a data.frame or data.table from a list and unpacked list of lists

Converting values to special character in summary

Categories

Resources