How to compute distance.matrix for the spatialRF::rf_spatial function - r

I am using the package spatialRF in R to perform a regression task. From the example provided by the package, the have precomputed the distance.matrix and they use it in the function spatialRF::rf. Here is an example:
library(spatialRF)
#loading training data
data(block.data)
#names of the response variable and the predictors
dependent.variable.name <- "ntl"
predictor.variable.names <- colnames(block.data)[2:4]
#coordinates of the cases
xy <- block.data[, c("x", "y")]
#distance matrix
distance.matrix <- dist(subset(block.data, select = -c(x, y)))
#random seed for reproducibility
random.seed <- 1
model.non.spatial <- spatialRF::rf(
data = block.data,
dependent.variable.name = dependent.variable.name,
predictor.variable.names = predictor.variable.names,
distance.matrix = distance.matrix,
distance.thresholds = 0,
xy = xy,
seed = random.seed,
verbose = FALSE)
When running the spatialRF::rf function I am getting this error: Error in diag<-(tmp, value = NA): only matrix diagonals can be replaced
My dataset:
block.data = structure(list(ntl = c(11.4058170318604, 13.7000455856323, 16.0420398712158,
17.4475727081299, 26.263370513916, 30.658130645752, 19.8927211761475,
20.917688369751, 23.7149887084961, 25.2641334533691), pop = c(111.031448364258,
145.096557617188, 166.351989746094, 193.804962158203, 331.787200927734,
382.979248046875, 237.971466064453, 276.575958251953, 334.015289306641,
345.376617431641), tirs = c(35.392936706543, 34.4172630310059,
33.7765464782715, 35.3224639892578, 40.4262886047363, 39.6619148254395,
38.6306076049805, 36.752326965332, 37.2010040283203, 36.1100578308105
), agbh = c(1.15364360809326, 0.177780777215958, 0.580717206001282,
0.647109687328339, 3.84336423873901, 5.6310133934021, 2.10894227027893,
3.9533429145813, 2.7016019821167, 4.36041164398193), lc = c(40L,
40L, 40L, 126L, 50L, 50L, 50L, 50L, 40L, 50L)), class = "data.frame", row.names = c(NA,
-10L))
For reference, in the example in the link I provided, the distance matrix and the dataset the authors are using it's the same.

Related

R: replace column values with string

I have a set of BAM files within the chr16_bam directory and a sgseq_sam.txt file.
I want to replace the file_bam column values with the full path where the BAM files are stored.
My code hasn't been able to achieve that.
bamPath = "C:/Users/User/Downloads/chr16_bam/"
samFile <- read.delim("C:/Users/User/Downloads/sgseq_sam.txt", header=T)
for (i in samFile[,2]) {
p <- gsub(i, bamPath, samFile)
}
> dput(samFile)
structure(list(sample_name = c("N60", "N11", "T132", "T114"),
file_bam = c("60.bam", "11.bam", "132.bam", "114.bam"), paired_end = c(TRUE,
TRUE, TRUE, TRUE), read_length = c(75L, 75L, 75L, 75L), frag_length = c(1075L,
1466L, 946L, 1154L), lib_size = c(2589976L, 5153522L, 4429912L,
3131400L)), class = "data.frame", row.names = c(NA, -4L))
library(tidyverse)
sam_file <- sam_file %>%
mutate(file_bam = paste0(bamPath, file_bam))
Or, alternately, in base R:
data_file$file_bam <- paste0(bamPath, data_file$file_bam)

Creating new column in one dataframe based on column from another dataframe

I have a dataframe as follows:
dput(head(modellingdata, n = 5))
structure(list(X = 1:5, heading = c(2, 0.5, 2, 1.5, 2), StartFrame = c(27L,
28L, 24L, 31L, 35L), StartYaw = c(0.0719580421911421, 0.0595571128205128,
0.0645337707459207, 0.0717132524475524, 0.066818187062937), FirstSteeringTime = c(0.433389999999999,
0.449999999999989, 0.383199999999988, 0.499899999999997, 0.566800000000001
), pNum = c(1L, 1L, 1L, 1L, 1L), EarlyResponses = c(FALSE, FALSE,
FALSE, FALSE, FALSE), PeakFrame = c(33L, 34L, 32L, 38L, 46L),
PeakYaw = c(0.201025641025641, 0.140734297249417, 0.187890472913753,
0.154032698135198, 0.23129368951049), PeakSteeringTime = c(0.533459999999998,
0.550099999999986, 0.516700000000014, 0.616600000000005,
0.750100000000003), heading_radians = c(0.0349065850398866,
0.00872664625997165, 0.0349065850398866, 0.0261799387799149,
0.0349065850398866), error_rate = c(2.86537083478438, 11.459301348013,
2.86537083478438, 3.82015500141104, 2.86537083478438), error_growth = c(0.34899496702501,
0.0872653549837393, 0.34899496702501, 0.261769483078731,
0.34899496702501)), row.names = c(NA, 5L), class = "data.frame")
Each row of my df is a trial. Overall, I have 3037 rows (trials). pNum denotes the participant number - I have 19 participants overall.
I also have a dataframe of intercepts for each participant:
dput(head(heading_intercept, n = 19))
c(0.432448612242496, 0.446371667203615, 0.420854119185846, 0.366763485495426,
0.355619586392715, 0.381658477093055, 0.512552445721875, 0.317210665852951,
0.358345666677048, 0.421441965798511, 0.477135103908373, 0.325512003640487,
0.5542144068862, 0.454182438162137, 0.333993738757344, 0.424179318544432,
0.272486598058728, 0.37014581658542, 0.397112817663261)
What I want do is create a new column "intercept" in my modellingdata dataframe. If pNum is 1, I want to select the first intercept in the heading_intercept dataframe and input that value for every row where pNum is 1. When pNum is 2, I want to input the second intercept value into every row where pNum is 2. And so on...
I have tried this:
for (i in c(1:19)){
if (modellingdata$pNum == i){
modellingdata$intercept <- c(heading_intercept[i])
}
}
However this just inputs the first heading_intercept value for every row and every pNum. Does anybody have any ideas? Any help is appreciated!
modellingdata$intercept <- heading_intercept[modellingdata$pNum]
Or with minimum modification of your current loop:
modellingdata$intercept <- 0L
for (i in c(1:19)){
rows <- modellingdata$pNum == i
if (any(rows)) {
modellingdata$intercept[rows] <- heading_intercept[i]
}
}

Flow map(Travel Path) Using Lat and Long in R

I am trying to plot flow map (for singapore) . I have Entry(Lat,Long) and Exit (Lat,long). I am trying to map the flow from entry to exit in singapore map.
structure(list(token_id = c(1.12374e+19, 1.12374e+19, 1.81313e+19,
1.85075e+19, 1.30752e+19, 1.30752e+19, 1.32828e+19, 1.70088e+19,
1.70088e+19, 1.70088e+19, 1.05536e+19, 1.44818e+19, 1.44736e+19,
1.44736e+19, 1.44736e+19, 1.44736e+19, 1.89909e+19, 1.15795e+19,
1.15795e+19, 1.15795e+19, 1.70234e+19, 1.70234e+19, 1.44062e+19,
1.21512e+19, 1.21512e+19, 1.95909e+19, 1.95909e+19, 1.50179e+19,
1.50179e+19, 1.24174e+19, 1.36445e+19, 1.98549e+19, 1.92068e+19,
1.18468e+19, 1.18468e+19, 1.92409e+19, 1.92409e+19, 1.21387e+19,
1.9162e+19, 1.9162e+19, 1.40385e+19, 1.40385e+19, 1.32996e+19,
1.32996e+19, 1.69103e+19, 1.69103e+19, 1.57387e+19, 1.40552e+19,
1.40552e+19, 1.00302e+19), Entry_Station_Lat = c(1.31509, 1.33261,
1.28425, 1.31812, 1.33858, 1.29287, 1.39692, 1.37773, 1.33858,
1.33322, 1.28179, 1.30036, 1.43697, 1.39752, 1.27637, 1.39752,
1.41747, 1.35733, 1.28405, 1.37773, 1.35898, 1.42948, 1.32774,
1.42948, 1.349, 1.36017, 1.34971, 1.38451, 1.31509, 1.31509,
1.37002, 1.34971, 1.31231, 1.39169, 1.31812, 1.44909, 1.29341,
1.41747, 1.33759, 1.44062, 1.31509, 1.38451, 1.29461, 1.32388,
1.41747, 1.27614, 1.39752, 1.39449, 1.33261, 1.31231), Entry_Station_Long = c(103.76525,
103.84718, 103.84329, 103.89308, 103.70611, 103.8526, 103.90902,
103.76339, 103.70611, 103.74217, 103.859, 103.85563, 103.7865,
103.74745, 103.84596, 103.74745, 103.83298, 103.9884, 103.85152,
103.76339, 103.75191, 103.83505, 103.67828, 103.83505, 103.74956,
103.88504, 103.87326, 103.74437, 103.76525, 103.76525, 103.84955,
103.87326, 103.83793, 103.89548, 103.89308, 103.82004, 103.78479,
103.83298, 103.69742, 103.80098, 103.76525, 103.74437, 103.80605,
103.93002, 103.83298, 103.79156, 103.74745, 103.90051, 103.84718,
103.83793), Exit_Station_Lat = structure(c(48L, 34L, 118L, 60L,
14L, 54L, 10L, 49L, 49L, 74L, 71L, 65L, 102L, 5L, 102L, 119L,
116L, 10L, 13L, 88L, 117L, 66L, 40L, 62L, 117L, 37L, 67L, 34L,
85L, 44L, 102L, 44L, 115L, 29L, 92L, 17L, 121L, 70L, 120L, 52L,
85L, 34L, 42L, 11L, 4L, 115L, 62L, 48L, 92L, 14L), .Label = c("1.27082",
"1.27091", "1.27236", "1.27614", "1.27637", "1.27646", "1.27935",
"1.28221", "1.28247", "1.28405", "1.28621", "1.28819", "1.28932",
"1.29287", "1.29309", "1.29338", "1.29341", "1.29461", "1.29694",
"1.29959", "1.29974", "1.30034", "1.30252", "1.30287", "1.30392",
"1.30394", "1.30619", "1.30736", "1.30842", "1.31139", "1.3115",
"1.31167", "1.31188", "1.31509", "1.31654", "1.31756", "1.31913",
"1.31977", "1.32008", "1.3205", "1.32104", "1.32388", "1.32573",
"1.32725", "1.32774", "1.33119", "1.33155", "1.33261", "1.33322",
"1.33474", "1.33554", "1.33759", "1.33764", "1.33858", "1.33921",
"1.34037", "1.34225", "1.34293", "1.3432", "1.34426", "1.34857",
"1.349", "1.34905", "1.35158", "1.35733", "1.35898", "1.36017",
"1.3625", "1.36849", "1.37002", "1.37121", "1.37304", "1.37666",
"1.37775", "1.3786", "1.37862", "1.38001", "1.38029", "1.3803",
"1.38178", "1.38269", "1.38295", "1.38399", "1.38423", "1.38451",
"1.38671", "1.38672", "1.38777", "1.38814", "1.3894", "1.39147",
"1.39169", "1.39189", "1.39208", "1.39389", "1.39449", "1.39452",
"1.39628", "1.39692", "1.39717", "1.39732", "1.39752", "1.39821",
"1.39928", "1.39962", "1.4023", "1.40455", "1.40511", "1.40524",
"1.40843", "1.40961", "1.41184", "1.41588", "1.41685", "1.41747",
"1.42526", "1.42948", "1.43256", "1.43697", "1.44062", "1.44909"
), class = "factor"), Exit_Station_Long = structure(c(59L, 19L,
27L, 4L, 65L, 3L, 63L, 6L, 6L, 21L, 93L, 121L, 9L, 56L, 9L, 32L,
16L, 63L, 44L, 23L, 50L, 12L, 54L, 11L, 50L, 71L, 87L, 19L, 7L,
118L, 9L, 118L, 49L, 90L, 96L, 31L, 45L, 61L, 38L, 2L, 7L, 19L,
117L, 47L, 34L, 49L, 11L, 59L, 96L, 65L), .Label = c("103.67828",
"103.69742", "103.70611", "103.72092", "103.73274", "103.74217",
"103.74437", "103.74529", "103.74745", "103.74905", "103.74956",
"103.75191", "103.7537", "103.75803", "103.76011", "103.76215",
"103.76237", "103.76449", "103.76525", "103.76648", "103.76667",
"103.76893", "103.7696", "103.77082", "103.77145", "103.77266",
"103.774", "103.77866", "103.78185", "103.78425", "103.78479",
"103.7865", "103.78744", "103.79156", "103.79631", "103.79654",
"103.79836", "103.80098", "103.803", "103.80605", "103.80745",
"103.80781", "103.80978", "103.81703", "103.82004", "103.82592",
"103.82695", "103.83216", "103.83298", "103.83505", "103.83918",
"103.83953", "103.83974", "103.84387", "103.84496", "103.84596",
"103.84673", "103.84674", "103.84718", "103.84823", "103.84955",
"103.85092", "103.85152", "103.85226", "103.8526", "103.85267",
"103.85436", "103.85446", "103.85452", "103.86088", "103.86149",
"103.86275", "103.86291", "103.86395", "103.86405", "103.86896",
"103.87087", "103.87135", "103.87534", "103.87563", "103.8763",
"103.87971", "103.88003", "103.88126", "103.88243", "103.88296",
"103.88504", "103.8858", "103.88816", "103.8886", "103.88934",
"103.89054", "103.89237", "103.89313", "103.8938", "103.89548",
"103.89719", "103.89723", "103.89854", "103.9003", "103.90051",
"103.90208", "103.90214", "103.9031", "103.90484", "103.90537",
"103.90597", "103.90599", "103.90663", "103.9086", "103.90902",
"103.9126", "103.9127", "103.91296", "103.91616", "103.9165",
"103.93002", "103.94638", "103.94929", "103.95337", "103.9884"
), class = "factor")), .Names = c("token_id", "Entry_Station_Lat",
"Entry_Station_Long", "Exit_Station_Lat", "Exit_Station_Long"
), row.names = c(10807L, 10808L, 10810L, 10815L, 10817L, 10818L,
10819L, 10820L, 10823L, 10824L, 10826L, 10827L, 10829L, 10831L,
10832L, 10833L, 10834L, 10835L, 10836L, 10838L, 10840L, 10841L,
10843L, 10847L, 10850L, 10852L, 10854L, 10855L, 10859L, 10861L,
10869L, 10872L, 10883L, 10886L, 10891L, 10895L, 10896L, 10897L,
10900L, 10902L, 10903L, 10906L, 10910L, 10911L, 10912L, 10913L,
10915L, 10920L, 10921L, 10924L), class = "data.frame")
I am trying to get something this : Map Flow
Just realized that the original solution usin geom_path was more complicated than necessary. geom_segmentworks without changing the data:
require(ggplot2)
require(ggmap)
basemap <- get_map("Singapore",
source = "stamen",
maptype = "toner",
zoom = 11)
g = ggplot(a)
map = ggmap(basemap, base_layer = g)
map = map + coord_cartesian() +
geom_curve(size = 1.3,
aes(x=as.numeric(Entry_Station_Long),
y=as.numeric(Entry_Station_Lat),
xend=as.numeric(as.character(Exit_Station_Long)),
yend=as.numeric(as.character(Exit_Station_Lat)),
color=as.factor(token_id)))
map
This solution leverages Draw curved lines in ggmap, geom_curve not working to implement curved lines on a map.
ggmaps used for simplicity - for more ambitious projects I would recommend leaflet.
Below the solution using a long data format with some prior data wrangling. It also uses straight lines instead of the curves above.
a %>%
mutate(path = row_number()) -> a
origin = select(a,token_id,Entry_Station_Lat,Entry_Station_Long,path)
origin$type = "origin"
dest = select(a,token_id,Exit_Station_Lat,Exit_Station_Long,path)
dest$type = "dest"
colnames(origin) = c("id","lat","long","path","type")
colnames(dest) = c("id","lat","long","path","type")
complete = rbind(origin,dest)
complete %>% arrange(path,type) -> complete
require(ggmap)
basemap <- get_map("Singapore",
source = "stamen",
maptype = "toner",
zoom = 11)
g = ggplot(complete, aes(x=as.numeric(long),
y=as.numeric(lat)))
map = ggmap(basemap, base_layer = g)
map + geom_path(aes(color = as.factor(id)),
size = 1.1)
If you want to plot it on an actual Google Map, and recreate the style of your linked map, you can use my googleway package that uses Google's Maps API. You need an API key to use their maps
library(googleway)
df$Exit_Station_Lat <- as.numeric(as.character(df$Exit_Station_Lat))
df$Exit_Station_Long <- as.numeric(as.character(df$Exit_Station_Long))
df$polyline <- apply(df, 1, function(x) {
lat <- c(x['Entry_Station_Lat'], x['Exit_Station_Lat'])
lon <- c(x['Entry_Station_Long'], x['Exit_Station_Long'])
encode_pl(lat = lat, lon = lon)
})
mapKey <- 'your_api_key'
style <- '[ { "stylers": [{ "visibility": "simplified"}]},{"stylers": [{"color": "#131314"}]},{"featureType": "water","stylers": [{"color": "#131313"},{"lightness": 7}]},{"elementType": "labels.text.fill","stylers": [{"visibility": "on"},{"lightness": 25}]}]'
google_map(key = mapKey, style = style) %>%
add_polylines(data = df,
polyline = "polyline",
mouse_over_group = "Entry_Station_Lat",
stroke_weight = 0.7,
stroke_opacity = 0.5,
stroke_colour = "#ccffff")
Note, to recreate the map using flight data, see the example given in ?add_polylines
You can also show other types of routes, for example, driving between the locations by using Google's Directions API to encode the driving routes.
df$drivingRoute <- lst_directions <- apply(df, 1, function(x){
orig <- as.numeric(c(x['Entry_Station_Lat'], x['Entry_Station_Long']))
dest <- as.numeric(c(x['Exit_Station_Lat'], x['Exit_Station_Long']))
dir <- google_directions(origin = orig, destination = dest, key = apiKey)
dir$routes$overview_polyline$points
})
google_map(key = mapKey, style = style) %>%
add_polylines(data = df,
polyline = "drivingRoute",
mouse_over_group = "Entry_Station_Lat",
stroke_weight = 0.7,
stroke_opacity = 0.5,
stroke_colour = "#ccffff")
Alternative answer using leaflet and geosphere
#get Packages
require(leaflet)
require(geosphere)
#format data
a$Entry_Station_Long = as.numeric(as.character(a$Entry_Station_Long))
a$Entry_Station_Lat = as.numeric(as.character(a$Entry_Station_Lat))
a$Exit_Station_Long = as.numeric(as.character(a$Exit_Station_Long))
a$Exit_Station_Lat = as.numeric(as.character(a$Exit_Station_Lat))
a$id = as.factor(as.numeric(as.factor(a$token_id)))
#create some colors
factpal <- colorFactor(heat.colors(30), pathList$id)
#create a list of interpolated paths
pathList = NULL
for(i in 1:nrow(a))
{
tmp = gcIntermediate(c(a$Entry_Station_Long[i],
a$Entry_Station_Lat[i]),
c(a$Exit_Station_Long[i],
a$Exit_Station_Lat[i]),n = 25,
addStartEnd=TRUE)
tmp = data.frame(tmp)
tmp$id = a[i,]$id
tmp$color = factpal(a[i,]$id)
pathList = c(pathList,list(tmp))
}
#create empty base leaflet object
leaflet() %>% addTiles() -> lf
#add each entry of pathlist to the leaflet object
for (path in pathList)
{
lf %>% addPolylines(data = path,
lng = ~lon,
lat = ~lat,
color = ~color) -> lf
}
#show output
lf
Note that as I mentioned before there is no way of geosphering the paths in such a small locality - the great circles are effectively straight lines. If you want the rounded edges for sake of aesthetics you may have to use the geom_curve way described in my other answer.
I've also written the mapdeck library to make visualisations like this more appealing*
library(mapdeck)
set_token("MAPBOX_TOKEN") ## set your mapbox token here
df$Exit_Station_Lat <- as.numeric(as.character(df$Exit_Station_Lat))
df$Exit_Station_Long <- as.numeric(as.character(df$Exit_Station_Long))
mapdeck(
style = mapdeck_style('dark')
, location = c(104, 1)
, zoom = 8
, pitch = 45
) %>%
add_arc(
data = df
, origin = c("Entry_Station_Long", "Entry_Station_Lat")
, destination = c("Exit_Station_Long", "Exit_Station_Lat")
, layer_id = 'arcs'
, stroke_from_opacity = 100
, stroke_to_opacity = 100
, stroke_width = 3
, stroke_from = "#ccffff"
, stroke_to = "#ccffff"
)
*subjectively speaking
I would like to leave an alternative approach for you. What you can do is to restructure your data. Right now you have two columns for entry stations and the other two for exit stations. You can create one column for long, and another for lat by combing these columns. The trick is to use rbind() and c().
Let's have a look of this simple example.
x <- c(1, 3, 5)
y <- c(2, 4, 6)
c(rbind(x, y))
#[1] 1 2 3 4 5 6
Imagine x is long for entry stations and y for exit stations. 1 is longitude for a starting point. 2 is longitude where the first journey ended. As far as I can see from your sample data, it seems that 3 is identical 2. You could remove duplicated data points for each token_id. If you have a large set of data, perhaps this is something you want to consider. Back to the main point, you can create a column with longitude in the sequence you want with the combination of the two functions. Since you said you have date information, make sure you order the data by date. Then, the sequence of each journey appears in the right way in tmp. You want to do this with latitude as well.
Now we look into your sample data. It seems that Exit_Station_Lat and Exit_Station_Long are in factor. The first operation is to convert them to numeric. Then, you apply the method above and create a data frame. I called your data mydf.
library(dplyr)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(raster)
mydf %>%
mutate_at(vars(Exit_Station_Lat: Exit_Station_Long),
funs(as.numeric(as.character(.)))) -> mydf
group_by(mydf, token_id) %>%
do(data.frame(long = c(rbind(.$Entry_Station_Long,.$Exit_Station_Long)),
lat = c(rbind(.$Entry_Station_Lat, .$Exit_Station_Lat))
)
) -> tmp
Now let's get a map data from GADM. You can download data using the raster package.
getData(name = "GADM", country = "singapore", level = 0) %>%
fortify -> singapore
Finally, you draw a map. The key thing is to use group in aes in geom_path(). I hope this will let you move forward.
ggplot() +
geom_cartogram(data = singapore,
aes(x = long, y = lat, map_id = id),
map = singapore) +
geom_path(data = tmp,
aes(x = long, y = lat, group = token_id,
color = as.character(token_id)),
show.legend = FALSE) +
theme_map()

Convert to coordinates over a list of data frames

I am trying to convert point values to coordinates using the sp package to perform operations similar to this question. I have a list of data frames (hundreds in the full data set, 2 short ones here).
> dput(df)
list(structure(list(group = c(22, 43, 43, 36, 9, 20, 35, 18,
32, 2), mean_x_m = c(-2578373.61904762, -2082265, -1853701.875,
-2615961.89189189, -1538829.07815509, -1753235.6200847, -1690679.5,
-1694763.64583333, -1700343.15217391, -1416060), mean_y_m = c(3242738.76190476,
2563892.5, 1945883.125, 3130074.86486486, 1373724.65001039, 1468737.97186933,
2123413.5, 1442167.01388889, 2144261.73913043, 1352573.33333333
)), .Names = c("group", "mean_x_m", "mean_y_m"), row.names = c(72L,
140L, 142L, 121L, 27L, 66L, 114L, 60L, 105L, 5L), class = "data.frame"),
structure(list(group = c(12, 12, 47, 30, 39, 34, 47, 22,
10, 1), mean_x_m = c(-1830635.68663753, -2891058.33333333,
-1637448.59886202, -1974773.67400716, -1571853.24324324,
-2723090.33333333, -2704594.92760618, -2240863.49122807,
-1940748.88253242, -2176724.69924812), mean_y_m = c(2324222.49926225,
3261997.5, 2057096.55049787, 2411733.29933653, 1447883.78378379,
3406879.26666667, 3291053.77606178, 2788255.49473684, 2176919.6882151,
2920168.77443609)), .Names = c("group", "mean_x_m", "mean_y_m"
), row.names = c(67L, 68L, 243L, 155L, 202L, 173L, 244L,
114L, 61L, 3L), class = "data.frame"))
I can pull one data frame out at a time and convert to a SpatialPointsDataFrame without issue.
df1 = df[[1]]
coordinates(df1) = ~mean_x_m+mean_y_m
My problem is I can't get this to iterate over the entire list using a function, or even get the function to work for a single dataframe.
c = function(f){coordinates(f) = ~mean_x_m+mean_y_m}
df2 = c(df1)
c(df1)
df3 = lapply(df,c)
Would a for loop work better? I'm still learning about working with lists of data frames and matrices so any help on apply or for in this context would be appreciated. Thank you.
This is how you can use lapply:
fc <- function(f){coordinates(f) = ~mean_x_m + mean_y_m; f}
lapply(df, fc)
The problem was that your function did not return anything.
To make a single object:
x <- lapply(1:length(df), function(i) cbind(id=i, df[[i]]))
x <- do.call(rbind, x)
coordinates(x) <- ~mean_x_m+mean_y_m
If your dataframes have a consistent structure, it would be better to put them all into one dataframe.
library(dplyr)
library(sp)
result =
df %>%
bind_rows(.id = "list_number") %>%
as.data.frame %>%
`coordinates<-`(~mean_x_m+mean_y_m)
If you are working with geographic data I find it is easiest to use Spatial Points and SpatialPointsDataFrame classes to store data. To convert all elements of a list containing dataframes with the same column headings you could adapt this code:
library(sp)
# toy dataset X
X<-list(
x1 = data.frame(group =c("a","b","c"), X = c(-110.1,-110.2,-110), Y = c(44,44.2,44.3)),
x2 = data.frame(group =c("a","b","c"), X = c(-110.1,-110.2,-110), Y = c(44,44.2,44.3)))
# write a function based on the structure of your dfs
spdf_fxn<-function(df){
SpatialPointsDataFrame(coords= cbind(df$X,df$Y), data= data.frame(group = df$group),
proj4string=CRS("+proj=longlat +datum=WGS84"))
}
#apply this function over the list
Out_List<-lapply(X,spdf_fxn)
Write a function to convert the generic dataframe structure to a SpatialPointsDataframe, with group as the data appended to each point, then apply that function to the list. Note you will have to adapt the column names and use the appropriate proj4string (in this example it is longitude and latitude in WGS 84).

Inverse probability weights in r

I'm trying to apply inverse probability weights to a regression, but lm() only uses analytic weights. This is part of a replication I'm working on where the original author is using pweight in Stata, but I'm trying to replicate it in R. The analytic weights are providing lower standard errors which is causing problems with some of my variable being significance.
I've tried looking at the survey package, but am not sure how to prepare a survey object for use with svyglm(). Is this the approach I want, or is there an easier way to apply inverse probability weights?
dput :
data <- structure(list(lexptot = c(9.1595012302023, 9.86330744180814,
8.92372556833205, 8.58202430280175, 10.1133857229336), progvillm = c(1L,
1L, 1L, 1L, 0L), sexhead = c(1L, 1L, 0L, 1L, 1L), agehead = c(79L,
43L, 52L, 48L, 35L), weight = c(1.04273509979248, 1.01139605045319,
1.01139605045319, 1.01139605045319, 0.76305216550827)), .Names = c("lexptot",
"progvillm", "sexhead", "agehead", "weight"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
Linear Model (using analytic weights)
prog.lm <- lm(lexptot ~ progvillm + sexhead + agehead, data = data, weight = weight)
summary(prog.lm)
Alright, so I figured it out and thought I would update the post incase others were trying to figure it out. It's actually pretty straightforward.
data$X <- 1:nrow(data)
des1 <- svydesign(id = ~X, weights = ~weight, data = data)
prog.lm <- svyglm(lexptot ~ progvillm + sexhead + agehead, design=des1)
summary(prog.lm)
Standard errors are now correct.

Resources