Connecting two sets of coordinates to create lines using sf/mapview - r

I have a dataset where a bird captured in one location (Blong, Blat) then encountered again in another (Elong, Elat). These coordinates are in a lat/long format, and I'd like to connect the capture and encounter locations with a line and plot them in mapview.
In the data below, each row is an individual bird with its capture/encounter coordinates, and the flyway that it belongs to (which I would like to use to color the lines in mapview.
dat <- structure(list(Blong = c(-75.58333, -76.08333, -81.08333, -94.25,
-75.41667, -99.41667, -77.41667, -116.08333, -89.58333, -77.58333
), Blat = c(37.58333, 40.58333, 42.75, 41.91667, 38.25, 28.25,
38.91667, 43.58333, 44.25, 38.91667), Elong = c(-65.91667, -75.75,
-80.58333, -95.41667, -73.58333, -89.41667, -77.58333, -116.41667,
-96.41667, -77.41667), Elat = c(45.91667, 40.58333, 42.75, 29.75,
45.58333, 48.25, 38.75, 43.58333, 34.08333, 38.91667), Flyway = structure(c(2L,
2L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 2L), .Label = c("Central", "Eastern",
"West"), class = "factor")), .Names = c("Blong", "Blat", "Elong",
"Elat", "Flyway"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
A look at the data:
# A tibble: 10 x 5
Blong Blat Elong Elat Flyway
<dbl> <dbl> <dbl> <dbl> <fct>
1 -75.6 37.6 -65.9 45.9 Eastern
2 -76.1 40.6 -75.8 40.6 Eastern
3 -81.1 42.8 -80.6 42.8 Eastern
4 -94.2 41.9 -95.4 29.8 Central
5 -75.4 38.2 -73.6 45.6 Eastern
6 -99.4 28.2 -89.4 48.2 Eastern
7 -77.4 38.9 -77.6 38.8 Eastern
8 -116. 43.6 -116. 43.6 West
9 -89.6 44.2 -96.4 34.1 Eastern
10 -77.6 38.9 -77.4 38.9 Eastern
I've looked a few examples, but haven't found one that looks quite like my data set.

The tricky thing is to create a valid LINESTRING object from the coordinate pairs in wide format. sf expects linestring coordinates in rows of a matrix. Here's a way that works. The sfc column of a sf object is a list so here we use lapply to loop over the rows of the data you provided.
library(sf)
library(mapview)
b = dat[, c("Blong", "Blat")]
names(b) = c("long", "lat")
e = dat[, c("Elong", "Elat")]
names(e) = c("long", "lat")
dat$geometry = do.call(
"c",
lapply(seq(nrow(b)), function(i) {
st_sfc(
st_linestring(
as.matrix(
rbind(b[i, ], e[i, ])
)
),
crs = 4326
)
}))
dat_sf = st_as_sf(dat)
mapview(dat_sf, zcol = "Flyway")

Related

How to calculate mean of column, then paste mean value as row value in another data frame in R?

I have 36 data frames that each contain columns titled "lon", "lat", and "bottom_temp". Each different data frame represents data from a year between 1980 and 2015. I have a seperate dataframe called "month3_avg_box" that contains two columns: "year" and "avg_bottom_temp". The year column of the "month3_avg_box" data frame contains one row for each year between 1980-2015. I would like to find the average value of each "bottom_temp" column in each of the 36 data frames I have, and place each mean in the corresponding row of the new "month3_avg_box" data frame I have. I will write a mini example of what I'd like:
1980_df:
lon lat bottom_temp
-75.61 39.1 11.6
-75.60 39.1 11.5
-75.59 39.1 11.6
-75.58 39.1 11.7
(mean of bottom_temp column for 1980_df = 11.6)
1981_df:
lon lat bottom_temp
-75.57 39.1 11.9
-75.56 39.1 11.9
-75.55 39.1 12.0
-75.54 39.1 11.8
(mean of bottom_temp column for 1981_df = 11.9)
1982_df:
lon lat bottom_temp
-75.57 39.1 11.6
-75.56 39.1 11.7
-75.55 39.1 11.9
-75.54 39.1 11.2
(mean of bottom_temp column for 1982_df = 11.6)
Now, I'd like to take these averages and put them into my "month3_avg_box" data frame so it looks like:
month3_avg_box:
Year Avg_bottom_temp
1980 11.6
1981 11.9
1982 11.6
Does this make sense? How can I do this?
We may get the datasets in a list, bind the datasets, create a 'Year' column from the named list, do a group by mean
library(dplyr)
library(stringr)
lst(`1980_df`, `1981_df`, `1982_df`) %>%
bind_rows(.id = 'Year') %>%
group_by(Year = str_remove(Year, '_df')) %>%
summarise(Avg_bottom_temp = mean(bottom_temp))
-output
# A tibble: 3 × 2
Year Avg_bottom_temp
<chr> <dbl>
1 1980 11.6
2 1981 11.9
3 1982 11.6
data
`1980_df` <- structure(list(lon = c(-75.61, -75.6, -75.59, -75.58), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.5, 11.6, 11.7)), class = "data.frame", row.names = c(NA,
-4L))
`1981_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.9, 11.9, 12, 11.8)), class = "data.frame", row.names = c(NA,
-4L))
`1982_df` <- structure(list(lon = c(-75.57, -75.56, -75.55, -75.54), lat = c(39.1,
39.1, 39.1, 39.1), bottom_temp = c(11.6, 11.7, 11.9, 11.2)), class = "data.frame", row.names = c(NA,
-4L))

Summarise across each column by grouping their names

I want to calculate the weighted variance using the weights provided in the dataset, while group for the countries and cities, however the function returns NAs:
library(Hmisc) #for the 'wtd.var' function
weather_winter.std<-weather_winter %>%
group_by(country, capital_city) %>%
summarise(across(starts_with("winter"),wtd.var))
The provided output from the console (when in long format):
# A tibble: 35 x 3
# Groups: country [35]
country capital_city winter
<chr> <chr> <dbl>
1 ALBANIA Tirane NA
2 AUSTRIA Vienna NA
3 BELGIUM Brussels NA
4 BULGARIA Sofia NA
5 CROATIA Zagreb NA
6 CYPRUS Nicosia NA
7 CZECHIA Prague NA
8 DENMARK Copenhagen NA
9 ESTONIA Tallinn NA
10 FINLAND Helsinki NA
# … with 25 more rows
This is the code that I used to get the data from a wide format into a long format:
weather_winter <- weather_winter %>% pivot_longer(-c(31:33))
weather_winter$name <- NULL
names(weather_winter)[4] <- "winter"
Some example data:
structure(list(`dec-wet_2011` = c(12.6199998855591, 12.6099996566772,
14.75, 11.6899995803833, 18.2899990081787), `dec-wet_2012` = c(13.6300001144409,
14.2199993133545, 14.2299995422363, 16.1000003814697, 18.0299987792969
), `dec-wet_2013` = c(4.67999982833862, 5.17000007629395, 4.86999988555908,
7.56999969482422, 5.96000003814697), `dec-wet_2014` = c(14.2999992370605,
14.4799995422363, 13.9799995422363, 15.1499996185303, 16.1599998474121
), `dec-wet_2015` = c(0.429999977350235, 0.329999983310699, 1.92999994754791,
3.30999994277954, 7.42999982833862), `dec-wet_2016` = c(1.75,
1.29999995231628, 3.25999999046326, 6.60999965667725, 8.67999935150146
), `dec-wet_2017` = c(13.3400001525879, 13.3499994277954, 15.960000038147,
10.6599998474121, 14.4699993133545), `dec-wet_2018` = c(12.210000038147,
12.4399995803833, 11.1799993515015, 10.75, 18.6299991607666),
`dec-wet_2019` = c(12.7199993133545, 13.3800001144409, 13.9899997711182,
10.5299997329712, 12.3099994659424), `dec-wet_2020` = c(15.539999961853,
16.5200004577637, 11.1799993515015, 14.7299995422363, 13.5499992370605
), `jan-wet_2011` = c(8.01999950408936, 7.83999967575073,
10.2199993133545, 13.8899993896484, 14.5299997329712), `jan-wet_2012` = c(11.5999994277954,
11.1300001144409, 12.5500001907349, 10.1700000762939, 22.6199989318848
), `jan-wet_2013` = c(17.5, 17.4099998474121, 15.5599994659424,
13.3199996948242, 20.9099998474121), `jan-wet_2014` = c(12.5099992752075,
12.2299995422363, 15.210000038147, 9.73999977111816, 9.63000011444092
), `jan-wet_2015` = c(17.6900005340576, 16.9799995422363,
11.75, 9.9399995803833, 19), `jan-wet_2016` = c(15.6099996566772,
15.5, 14.5099992752075, 10.3899993896484, 18.4499988555908
), `jan-wet_2017` = c(9.17000007629395, 9.61999988555908,
9.30999946594238, 15.8499994277954, 11.210000038147), `jan-wet_2018` = c(8.55999946594238,
9.10999965667725, 13.2599992752075, 9.85999965667725, 15.8899993896484
), `jan-wet_2019` = c(17.0699996948242, 16.8699989318848,
14.5699996948242, 19.0100002288818, 19.4699993133545), `jan-wet_2020` = c(6.75999975204468,
6.25999975204468, 6.00999975204468, 5.35999965667725, 8.15999984741211
), `feb-wet_2011` = c(9.1899995803833, 8.63999938964844,
6.21999979019165, 9.82999992370605, 4.67999982833862), `feb-wet_2012` = c(12.2699995040894,
11.6899995803833, 8.27999973297119, 14.9399995803833, 13.0499992370605
), `feb-wet_2013` = c(15.3599996566772, 15.9099998474121,
17.0599994659424, 13.3599996566772, 16.75), `feb-wet_2014` = c(10.1999998092651,
11.1399993896484, 13.8599996566772, 10.7399997711182, 7.35999965667725
), `feb-wet_2015` = c(11.9200000762939, 12.2699995040894,
8.01000022888184, 14.5299997329712, 5.71999979019165), `feb-wet_2016` = c(14.6999998092651,
14.7799997329712, 16.7899990081787, 4.90000009536743, 19.3500003814697
), `feb-wet_2017` = c(8.98999977111816, 9.17999935150146,
11.7699995040894, 6.3899998664856, 13.9899997711182), `feb-wet_2018` = c(16.75,
16.8599987030029, 12.0599994659424, 16.1900005340576, 8.51000022888184
), `feb-wet_2019` = c(7.58999967575073, 7.26999998092651,
8.21000003814697, 7.57999992370605, 8.81999969482422), `feb-wet_2020` = c(10.6399993896484,
10.4399995803833, 13.4399995803833, 8.53999996185303, 19.939998626709
), country = c("SERBIA", "SERBIA", "SLOVENIA", "GREECE",
"CZECHIA"), capital_city = c("Belgrade", "Belgrade", "Ljubljana",
"Athens", "Prague"), weight = c(20.25, 19.75, 14.25, 23.75,
14.25)), row.names = c(76L, 75L, 83L, 16L, 5L), class = "data.frame")
Your code seems to provide the right answer, now there's more data:
# Groups: country [4]
country capital_city winter
<chr> <chr> <dbl>
1 CZECHIA Prague 27.2
2 GREECE Athens 14.6
3 SERBIA Belgrade 19.1
4 SLOVENIA Ljubljana 16.3
Is this what you were looking for?
I took the liberty of streamlining your code:
weather_winter <- weather_winter %>%
pivot_longer(-c(31:33), values_to = "winter") %>%
select(-name)
weather_winter.std <- weather_winter %>%
group_by(country, capital_city) %>%
summarise(winter = wtd.var(winter))
With only one "winter" column, there's no need for the across().
Finally, you are not using the weights. If these are needed, then change the last line to:
summarise(winter = wtd.var(winter, weights = weight))
To give:
# A tibble: 4 x 3
# Groups: country [4]
country capital_city winter
<chr> <chr> <dbl>
1 CZECHIA Prague 26.3
2 GREECE Athens 14.2
3 SERBIA Belgrade 18.8
4 SLOVENIA Ljubljana 15.8

Merge two dataframes: specifically merge a selection of columns based on two conditions?

I have two datasets on the same 2 patients. With the second dataset I want to add new information to the first, but I can't seem to get the code right.
My first (incomplete) dataset has a patient ID, measurement time (either T0 or FU1), year of birth, date of the CT scan, and two outcomes (legs_mass and total_mass):
library(tidyverse)
library(dplyr)
library(magrittr)
library(lubridate)
df1 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L,
6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
"FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
legs_mass = c(9.1, NA, NA, NA), total_mass = c(14.5, NA,
NA, NA)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
# Which gives the following dataframe
df1
# A tibble: 4 x 6
ID time year_of_birth date_ct legs_mass total_mass
<dbl> <fct> <dbl> <date> <dbl> <dbl>
1 115 T0 1970 2015-08-04 9.1 14.5
2 115 FU1 1970 2018-11-07 NA NA
3 370 T0 1961 2015-08-04 NA NA
4 370 FU1 1961 2020-09-30 NA NA
The second dataset adds to the legs_mass and total_mass columns:
df2 <- structure(list(ID = c(115, 370), date_ct = structure(c(17842,
18535), class = "Date"), ctscan_label = c("PXE115_CT_20181107_xxxxx-3.tif",
"PXE370_CT_20200930_xxxxx-403.tif"), legs_mass = c(956.1, 21.3
), total_mass = c(1015.9, 21.3)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
# Which gives the following dataframe:
df2
# A tibble: 2 x 5
ID date_ct ctscan_label legs_mass total_mass
<dbl> <date> <chr> <dbl> <dbl>
1 115 2018-11-07 PXE115_CT_20181107_xxxxx-3.tif 956. 1016.
2 370 2020-09-30 PXE370_CT_20200930_xxxxx-403.tif 21.3 21.3
What I am trying to do, is...
Add the legs_mass and total_mass column values from df2 to df1, based on ID number and date_ct.
Add the new columns of df2 (the one that is not in df1; ctscan_label) to df1, also based on the date of the ct and patient ID.
So that the final dataset df3 looks as follows:
df3 <- structure(list(ID = c(115, 115, 370, 370), time = structure(c(1L,
6L, 1L, 6L), .Label = c("T0", "T1M0", "T1M6", "T1M12", "T2M0",
"FU1"), class = "factor"), year_of_birth = c(1970, 1970, 1961,
1961), date_ct = structure(c(16651, 17842, 16651, 18535), class = "Date"),
legs_mass = c(9.1, 956.1, NA, 21.3), total_mass = c(14.5,
1015.9, NA, 21.3)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
# Corresponding to the following tibble:
# A tibble: 4 x 6
ID time year_of_birth date_ct legs_mass total_mass
<dbl> <fct> <dbl> <date> <dbl> <dbl>
1 115 T0 1970 2015-08-04 9.1 14.5
2 115 FU1 1970 2018-11-07 956. 1016.
3 370 T0 1961 2015-08-04 NA NA
4 370 FU1 1961 2020-09-30 21.3 21.3
I have tried the merge function and rbind from baseR, and bind_rows from dplyr but can't seem to get it right.
Any help?
You can join the two datasets and use coalesce to keep one non-NA value from the two datasets.
library(dplyr)
left_join(df1, df2, by = c("ID", "date_ct")) %>%
mutate(leg_mass = coalesce(legs_mass.x , legs_mass.y),
total_mass = coalesce(total_mass.x, total_mass.y)) %>%
select(-matches('\\.x|\\.y'), -ctscan_label)
# ID time year_of_birth date_ct leg_mass total_mass
# <dbl> <fct> <dbl> <date> <dbl> <dbl>
#1 115 T0 1970 2015-08-04 9.1 14.5
#2 115 FU1 1970 2018-11-07 956. 1016.
#3 370 T0 1961 2015-08-04 NA NA
#4 370 FU1 1961 2020-09-30 21.3 21.3
We can use data.table methods
library(data.table)
setDT(df1)[setDT(df2), c("legs_mass", "total_mass") :=
.(fcoalesce(legs_mass, i.legs_mass),
fcoalesce(total_mass, i.total_mass)), on = .(ID, date_ct)]
-output
df1
ID time year_of_birth date_ct legs_mass total_mass
1: 115 T0 1970 2015-08-04 9.1 14.5
2: 115 FU1 1970 2018-11-07 956.1 1015.9
3: 370 T0 1961 2015-08-04 NA NA
4: 370 FU1 1961 2020-09-30 21.3 21.3

R: create a data.frame or data.table from a list and unpacked list of lists

I have a list and a list of lists and would like to create a data.frame or data.table.
Here is the list:
head(stadte_namen)
[1] "Berlin" "Hamburg" "München"
and a list of lists
> head(result)
[[1]]
min max
x 13.22886 13.54886
y 52.35704 52.67704
[[2]]
min max
x 9.840654 10.16065
y 53.390341 53.71034
[[3]]
min max
x 11.36078 11.72291
y 48.06162 48.24812
How could I create a data.frame or a data.table with the following structure?
name xmin ymin xmax ymax
Berlin 13.22886 52.35704 13.54886 52.67704
Hamburg 9.840654 53.390341 10.16065 53.71034
München 11.36078 48.06162 11.72291 48.24812
...
Here is the data:
stadte_namen<-c("Berlin", "Hamburg", "München", "Köln", "Frankfurt am Main",
"Stuttgart")
result<-list(structure(c(13.2288599, 52.3570365, 13.5488599, 52.6770365
), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"), c("min", "max"
))), structure(c(9.840654, 53.390341, 10.160654, 53.710341), .Dim = c(2L,
2L), .Dimnames = list(c("x", "y"), c("min", "max"))), structure(c(11.360777,
48.0616244, 11.7229083, 48.2481162), .Dim = c(2L, 2L), .Dimnames = list(
c("x", "y"), c("min", "max"))), structure(c(6.7725303, 50.8304399,
7.162028, 51.0849743), .Dim = c(2L, 2L), .Dimnames = list(c("x",
"y"), c("min", "max"))), structure(c(8.4727933, 50.0155435, 8.8004716,
50.2271408), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"),
c("min", "max"))), structure(c(9.0386007, 48.6920188, 9.3160228,
48.8663994), .Dim = c(2L, 2L), .Dimnames = list(c("x", "y"),
c("min", "max"))))
You can also try:
l <- result
df <- data.frame(t(sapply(l,c)))
colnames(df) <- c("minX", "minY", "maxX", "maxY"); df
df$stadte_namen <- c("Berlin", "Hamburg", "München", "Köln", "Frankfurt am Main",
"Stuttgart");df
Answer:
minX minY maxX maxY stadte_namen
1 13.228860 52.35704 13.548860 52.67704 Berlin
2 9.840654 53.39034 10.160654 53.71034 Hamburg
3 11.360777 48.06162 11.722908 48.24812 München
4 6.772530 50.83044 7.162028 51.08497 Köln
5 8.472793 50.01554 8.800472 50.22714 Frankfurt am Main
6 9.038601 48.69202 9.316023 48.86640 Stuttgart
With lapply and purrr:
library(dplyr)
library(purrr)
data <- lapply(result, function(x) c(xmin = x[1,1],
xmax = x[1,2],
ymin = x[2,1],
ymax = x[2,2])) %>%
purrr::map_dfr(~.x)
data$stadte_namen <- stadte_namen
# A tibble: 6 x 5
xmin xmax ymin ymax stadte_namen
<dbl> <dbl> <dbl> <dbl> <chr>
1 13.2 13.5 52.4 52.7 Berlin
2 9.84 10.2 53.4 53.7 Hamburg
3 11.4 11.7 48.1 48.2 München
4 6.77 7.16 50.8 51.1 Köln
5 8.47 8.80 50.0 50.2 Frankfurt am Main
6 9.04 9.32 48.7 48.9 Stuttgart
Assign stadte_namen as names to result and bind the dataframe together in one dataframe. You can get the data in wide format using pivot_wider.
library(tidyverse)
map_df(setNames(result, stadte_namen), ~.x %>%
as.data.frame %>%
rownames_to_column('row'), .id = 'name') %>%
pivot_wider(names_from = row, values_from = c(min, max))
# name min_x min_y max_x max_y
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Berlin 13.2 52.4 13.5 52.7
#2 Hamburg 9.84 53.4 10.2 53.7
#3 München 11.4 48.1 11.7 48.2
#4 Köln 6.77 50.8 7.16 51.1
#5 Frankfurt am Main 8.47 50.0 8.80 50.2
#6 Stuttgart 9.04 48.7 9.32 48.9

Converting values to special character in summary

I have created a summary table like below
Name Sales
AS 71.5%
DY 88.4%
VH 44.6%
MY 86.9%
HU 42.3%
TT 67.2%
BG 0.0%
SA 85.3%
now I want to replace the occurrence of 0.0 to "-"
I have tried
tab[,2] <- paste0(tab[,2],"%")
tab[,2] <- replace(tab[,2],tab[,2]<0,"-")
but its converting all values like 8.0 and 7.0 to "-"
do we have any other sollution
the output should be like
Name Sales
AS 71.5%
DY 88.4%
BG -
so the whole function is like this, have three columns of os sales for each person
You can try this:
#Data
df <- structure(list(Name = structure(c(1L, 3L, 8L, 5L, 4L, 7L, 2L,
6L), .Label = c("AS", "BG", "DY", "HU", "MY", "SA", "TT", "VH"
), class = "factor"), Sales = c(71.5, 88.4, 44.6, 86.9, 42.3,
67.2, 0, 85.3)), class = "data.frame", row.names = c(NA, -8L))
#Code
index <- which(df$Sales==0)
df$Sales[index] <- '-'
Name Sales
1 AS 71.5
2 DY 88.4
3 VH 44.6
4 MY 86.9
5 HU 42.3
6 TT 67.2
7 BG -
8 SA 85.3
Update with new data
New data has been provided:
df2 <- structure(list(Name = c("AS", "DY", "VH", "MY", "HU", "TT", "BG",
"SA"), Sales = c("71.5%", "88.4%", "44.6%", "86.9%", "42.3%",
"67.2%", "0.0%", "85.3%")), class = "data.frame", row.names = c(NA,
-8L))
df2$Sales2 <- gsub("0.0%","-",df2$Sales,fixed=T)
Name Sales Sales2
1 AS 71.5% 71.5%
2 DY 88.4% 88.4%
3 VH 44.6% 44.6%
4 MY 86.9% 86.9%
5 HU 42.3% 42.3%
6 TT 67.2% 67.2%
7 BG 0.0% -
8 SA 85.3% 85.3%
Update with variable
Using first data df:
df$tab <- paste0(df$Sales,'%')
df$tab <- ifelse(nchar(df$tab)==2,gsub("0%","-",df$tab,fixed=T),df$tab)
Name Sales tab
1 AS 71.5 71.5%
2 DY 88.4 88.4%
3 VH 44.6 44.6%
4 MY 86.9 86.9%
5 HU 42.3 42.3%
6 TT 67.2 67.2%
7 BG 0.0 -
8 SA 85.3 85.3%
Try this:
tab$Sales <- replace(tab$Sales, which(tab$Sales == 0), "-")
I'd also recommend looking into dplyr's mutate.

Resources