Convert to coordinates over a list of data frames - r

I am trying to convert point values to coordinates using the sp package to perform operations similar to this question. I have a list of data frames (hundreds in the full data set, 2 short ones here).
> dput(df)
list(structure(list(group = c(22, 43, 43, 36, 9, 20, 35, 18,
32, 2), mean_x_m = c(-2578373.61904762, -2082265, -1853701.875,
-2615961.89189189, -1538829.07815509, -1753235.6200847, -1690679.5,
-1694763.64583333, -1700343.15217391, -1416060), mean_y_m = c(3242738.76190476,
2563892.5, 1945883.125, 3130074.86486486, 1373724.65001039, 1468737.97186933,
2123413.5, 1442167.01388889, 2144261.73913043, 1352573.33333333
)), .Names = c("group", "mean_x_m", "mean_y_m"), row.names = c(72L,
140L, 142L, 121L, 27L, 66L, 114L, 60L, 105L, 5L), class = "data.frame"),
structure(list(group = c(12, 12, 47, 30, 39, 34, 47, 22,
10, 1), mean_x_m = c(-1830635.68663753, -2891058.33333333,
-1637448.59886202, -1974773.67400716, -1571853.24324324,
-2723090.33333333, -2704594.92760618, -2240863.49122807,
-1940748.88253242, -2176724.69924812), mean_y_m = c(2324222.49926225,
3261997.5, 2057096.55049787, 2411733.29933653, 1447883.78378379,
3406879.26666667, 3291053.77606178, 2788255.49473684, 2176919.6882151,
2920168.77443609)), .Names = c("group", "mean_x_m", "mean_y_m"
), row.names = c(67L, 68L, 243L, 155L, 202L, 173L, 244L,
114L, 61L, 3L), class = "data.frame"))
I can pull one data frame out at a time and convert to a SpatialPointsDataFrame without issue.
df1 = df[[1]]
coordinates(df1) = ~mean_x_m+mean_y_m
My problem is I can't get this to iterate over the entire list using a function, or even get the function to work for a single dataframe.
c = function(f){coordinates(f) = ~mean_x_m+mean_y_m}
df2 = c(df1)
c(df1)
df3 = lapply(df,c)
Would a for loop work better? I'm still learning about working with lists of data frames and matrices so any help on apply or for in this context would be appreciated. Thank you.

This is how you can use lapply:
fc <- function(f){coordinates(f) = ~mean_x_m + mean_y_m; f}
lapply(df, fc)
The problem was that your function did not return anything.
To make a single object:
x <- lapply(1:length(df), function(i) cbind(id=i, df[[i]]))
x <- do.call(rbind, x)
coordinates(x) <- ~mean_x_m+mean_y_m

If your dataframes have a consistent structure, it would be better to put them all into one dataframe.
library(dplyr)
library(sp)
result =
df %>%
bind_rows(.id = "list_number") %>%
as.data.frame %>%
`coordinates<-`(~mean_x_m+mean_y_m)

If you are working with geographic data I find it is easiest to use Spatial Points and SpatialPointsDataFrame classes to store data. To convert all elements of a list containing dataframes with the same column headings you could adapt this code:
library(sp)
# toy dataset X
X<-list(
x1 = data.frame(group =c("a","b","c"), X = c(-110.1,-110.2,-110), Y = c(44,44.2,44.3)),
x2 = data.frame(group =c("a","b","c"), X = c(-110.1,-110.2,-110), Y = c(44,44.2,44.3)))
# write a function based on the structure of your dfs
spdf_fxn<-function(df){
SpatialPointsDataFrame(coords= cbind(df$X,df$Y), data= data.frame(group = df$group),
proj4string=CRS("+proj=longlat +datum=WGS84"))
}
#apply this function over the list
Out_List<-lapply(X,spdf_fxn)
Write a function to convert the generic dataframe structure to a SpatialPointsDataframe, with group as the data appended to each point, then apply that function to the list. Note you will have to adapt the column names and use the appropriate proj4string (in this example it is longitude and latitude in WGS 84).

Related

How to compute distance.matrix for the spatialRF::rf_spatial function

I am using the package spatialRF in R to perform a regression task. From the example provided by the package, the have precomputed the distance.matrix and they use it in the function spatialRF::rf. Here is an example:
library(spatialRF)
#loading training data
data(block.data)
#names of the response variable and the predictors
dependent.variable.name <- "ntl"
predictor.variable.names <- colnames(block.data)[2:4]
#coordinates of the cases
xy <- block.data[, c("x", "y")]
#distance matrix
distance.matrix <- dist(subset(block.data, select = -c(x, y)))
#random seed for reproducibility
random.seed <- 1
model.non.spatial <- spatialRF::rf(
data = block.data,
dependent.variable.name = dependent.variable.name,
predictor.variable.names = predictor.variable.names,
distance.matrix = distance.matrix,
distance.thresholds = 0,
xy = xy,
seed = random.seed,
verbose = FALSE)
When running the spatialRF::rf function I am getting this error: Error in diag<-(tmp, value = NA): only matrix diagonals can be replaced
My dataset:
block.data = structure(list(ntl = c(11.4058170318604, 13.7000455856323, 16.0420398712158,
17.4475727081299, 26.263370513916, 30.658130645752, 19.8927211761475,
20.917688369751, 23.7149887084961, 25.2641334533691), pop = c(111.031448364258,
145.096557617188, 166.351989746094, 193.804962158203, 331.787200927734,
382.979248046875, 237.971466064453, 276.575958251953, 334.015289306641,
345.376617431641), tirs = c(35.392936706543, 34.4172630310059,
33.7765464782715, 35.3224639892578, 40.4262886047363, 39.6619148254395,
38.6306076049805, 36.752326965332, 37.2010040283203, 36.1100578308105
), agbh = c(1.15364360809326, 0.177780777215958, 0.580717206001282,
0.647109687328339, 3.84336423873901, 5.6310133934021, 2.10894227027893,
3.9533429145813, 2.7016019821167, 4.36041164398193), lc = c(40L,
40L, 40L, 126L, 50L, 50L, 50L, 50L, 40L, 50L)), class = "data.frame", row.names = c(NA,
-10L))
For reference, in the example in the link I provided, the distance matrix and the dataset the authors are using it's the same.

R: replace column values with string

I have a set of BAM files within the chr16_bam directory and a sgseq_sam.txt file.
I want to replace the file_bam column values with the full path where the BAM files are stored.
My code hasn't been able to achieve that.
bamPath = "C:/Users/User/Downloads/chr16_bam/"
samFile <- read.delim("C:/Users/User/Downloads/sgseq_sam.txt", header=T)
for (i in samFile[,2]) {
p <- gsub(i, bamPath, samFile)
}
> dput(samFile)
structure(list(sample_name = c("N60", "N11", "T132", "T114"),
file_bam = c("60.bam", "11.bam", "132.bam", "114.bam"), paired_end = c(TRUE,
TRUE, TRUE, TRUE), read_length = c(75L, 75L, 75L, 75L), frag_length = c(1075L,
1466L, 946L, 1154L), lib_size = c(2589976L, 5153522L, 4429912L,
3131400L)), class = "data.frame", row.names = c(NA, -4L))
library(tidyverse)
sam_file <- sam_file %>%
mutate(file_bam = paste0(bamPath, file_bam))
Or, alternately, in base R:
data_file$file_bam <- paste0(bamPath, data_file$file_bam)

Grouped barplots in R using csv

I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you
Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))

Optimizing geom_point on top of expanded geom_raster background

I'm trying to visualize how a neural network separates a simple 2 dimension points into 2 classes. I use geom_point to denote the training points and geom_raster to denote how the neural network separates the 2D space. Here's the functions and some of the data points plotted.
library(tidyverse)
library(neuralnet)
data2 <- structure(list(X1 = c(152, 178, 19, 101, 145, 184), x = c(32.4083268723916,
84.5016641449183, 114.483315175202, 51.914560098842, 79.6402378017537,
82.6861507166177), y = c(18.339864264708, 83.42093185056, 63.2843023451388,
55.7215069333086, 42.6517407153766, 86.5805756277405), label = structure(c(2L,
1L, 1L, 2L, 2L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(152L,
178L, 19L, 101L, 145L, 184L), class = "data.frame")
nn.model <- neuralnet(label~x+y, data2, hidden=4, linear.output=FALSE)
background <- expand_grid(x=seq(-40,120,0.1), y=seq(0,100,0.1))
background$label <- predict(nn.model, background) %>% apply(1, which.max)
ggplot()+geom_raster(data=background, aes(x, y, fill=label))+geom_point(data=data2, aes(x, y, color=label))+scale_color_manual(values=c("white","red"))
In the original dataset, the points lie in x range (-40, 120) and y range (0, 100); therefore the background expands accordingly. This approach, of course, takes some time because R will need to have the neural network predict some 1600 x 1000 points and then render them on the geom_raster layer.
My question: is there way to optimize or do this another way in ggplot (or in another package, if this problem is solved well there), as this approach is brute force in geom_rastering the background?

Multiple markers on same coordinate

When plotting out markers on a interactive worlmap from the r package leaflet data with exactly the same coordinates will overlap each other.
See the example below:
library(leaflet)
Data <- structure(list(Name = structure(1:3, .Label = c("M1", "M2", "M3"), class = "factor"), Latitude = c(52L, 52L, 51L), Longitude = c(50L, 50L, 50L), Altitude = c(97L, 97L, 108L)), .Names = c("Name", "Latitude", "Longitude", "Altitude"), class = "data.frame", row.names = c(NA, -3L))
leaflet(data = Data) %>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(noWrap = TRUE)) %>%
addMarkers(~Longitude, ~Latitude, popup = ~as.character(paste(sep = "",
"<b>",Name,"</b>","<br/>", "Altitude: ",Altitude)))
There is a possibilty to show all coordinates with the cluster option, but this is far from my goal. I dont want clusters and only the overlapping Markers are shown when fully zoomed in. When fully zoomed in the background map turns into grey("Map data not yet available"). The spider view of the overlapping markers is what i want, but not when fully zoomed in.
See example below:
leaflet(data = Data) %>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(noWrap = TRUE)) %>%
addMarkers(~Longitude, ~Latitude, popup = ~as.character(paste(sep = "",
"<b>",Name,"</b>","<br/>", "Altitude: ",Altitude)), clusterOptions = markerClusterOptions())
I found some literatur about the solution i want but i dont know how to implement it in the r leaflet code/package.
https://github.com/jawj/OverlappingMarkerSpiderfier-Leaflet
Also if there are other approaches to handle overlapping Markers, feel free to answer. (for example multiple Markers info in one popup)
You could jitter() your coordinates slightly:
library(mapview)
library(sp)
Data <- structure(list(Name = structure(1:3, .Label = c("M1", "M2", "M3"),
class = "factor"),
Latitude = c(52L, 52L, 51L),
Longitude = c(50L, 50L, 50L),
Altitude = c(97L, 97L, 108L)),
.Names = c("Name", "Latitude", "Longitude", "Altitude"),
class = "data.frame", row.names = c(NA, -3L))
Data$lat <- jitter(Data$Latitude, factor = 0.0001)
Data$lon <- jitter(Data$Longitude, factor = 0.0001)
coordinates(Data) <- ~ lon + lat
proj4string(Data) <- "+init=epsg:4326"
mapview(Data)
This way you still need to zoom in for the markers to separate, how far you need to zoom in depends on the factor attribute in jitter().
Note that I am using library(mapview) in the example for simplicity.
Following up on my comment, here's a somewhat more modern solution (circa 2020) that takes advantage of some newer packages designed to make our lives easier (tidyverse & sf). I use sf:st_jitter as well as mapview as #TimSalabim does. Finally, I chose a slightly larger jitter factor so you wouldn't have to zoom in quite so far to see the effect:
library(mapview)
library(sf)
Data <- tibble(Name = c("M1", "M2", "M3"),
Latitude = c(52L, 52L, 51L),
Longitude = c(50L, 50L, 50L),
Altitude = c(97L, 97L, 108L))
Data %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) %>%
st_jitter(factor = 0.001) %>%
mapview

Resources