I would like to plot a shape file loaded using read.shp from the fastshp package. However, the read.shp function returns a list of list and not a data.frame. I'm unsure which part of the list I need to extract to obtain the correctly formatted data.frame object. This exact question has been asked on stack overflow already, however, the solution no longer seems to work (solution was from > 7 years ago). Any help is much appreciated.
remotes::install_github("s-u/fastshp") #fastshp not on CRAN
library(ggplot2);library(fastshp)
temp <- tempfile()
temp2 <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip",temp)
unzip(zipfile = temp, exdir = temp2)
shp <- list.files(temp2, pattern = ".shp$",full.names=TRUE) %>% read.shp(.)
shp is a list of lists containing a plethora of information. I tried the following solution from the SO posted earlier, but to no avail:
shp.list <- sapply(shp, FUN = function(x) Polygon(cbind(lon = x$x, lat = x$y))) #throws an error here cbind(lon = x$x, lat = x$y) returns NULL
shp.poly <- Polygons(shp.list, "area")
shp.df <- fortify(shp.poly, region = "area")
I also tried the following:
shp.list <- sapply(shp, FUN = function(x) do.call(cbind, x[c("id","x","y")])) #returns NULL value here...
shp.df <- as.data.frame(do.call(rbind, shp.list))
Updated: Still no luck but closer:
file_shp<-list.files(temp2, pattern = ".shp$",full.names=TRUE) %>%
read.shp(., format = c("table"))
ggplot() +
geom_polygon(data = file_shp, aes(x = x, y = y, group = part),
colour = "black", fill = NA)
Looks like the projection is off. I'm not sure how to order the data to map correctly, also not sure how to read in the CRS data. Tried the following to no avail:
file_prj<-list.files(temp2, pattern = ".prj$",full.names=TRUE) %>%
proj4string(.)
I tried to use the census data you have in your script. However, R Studio somehow kept crashing when I applied read.shp() to the polygon data. Hence, I decided to use the example from the help page of read.shp(), which is also census data. I hope you do not mind. It took some time to figure out how to draw a map with class shp. Let me explain what I went through step by step.
This part is from the help page. I am basically getting shapefile and importing it as shp object.
# Census 2010 TIGER/Line(TM) state shapefile
library(fastshp)
fn <- system.file("shp", "tl_2010_us_state10.shp.xz", package="fastshp")
s <- read.shp(xzfile(fn, "rb"))
Let's check how this object, s is like. It contains 52 lists. In each list, there are six vectors. ID is a unique integer to represent a state. x is longitude and y is latitude. The nasty part was parts. In this example below, there is only one number, which means there is one polygon only in this state. But some other lists (states) have multiple numbers. These numbers are basically indices which indicate where new polygons begin in the data.
#> str(s)
#List of 52
# $ :List of 6
# ..$ id : int 1
# ..$ type : int 5
# ..$ box : num [1:4] -111 41 -104 45
# ..$ parts: int 0
# ..$ x : num [1:9145] -109 -109 -109 -109 -109 ...
# ..$ y : num [1:9145] 45 45 45 45 45 ...
Here is the one for Alaska. As you see there are some numbers in parts These numbers indicate where new polygon data begin. Alaksa has many small islands. Hence they needed to indicate different polygons in the data with this information. We will come back to this later when we create data frames.
#List of 6
# $ id : int 18
# $ type : int 5
# $ box : num [1:4] -179.2 51.2 179.9 71.4
# $ parts: int [1:50] 0 52 88 127 175 207 244 306 341 375 ...
# $ x : num [1:14033] 177 177 177 177 177 ...
# $ y : num [1:14033] 52.1 52.1 52.1 52.1 52.1 ...
What we need is the following. For each list, we need to extract longitude (i.e., x), latitude (i.e., y), and id in order to create a data fame for one state. In addition, we need to use parts so that we can indicate all polygons with unique IDs. We need to crate a new group variable, which contains unique ID value for each polygon. I used findInterval() which takes indices to create a group variable. One tricky part was that we need to use left.open = TRUE in findInterval() in order to create a group variable. (This gave me some hard time to figure out what was going on.) This map_dfr() part handles the job I just described.
library(tidyverse)
map_dfr(.x = s,
.f = function(mylist){
temp <- data.frame(id = mylist$id,
lon = mylist$x,
lat = mylist$y)
ind <- mylist$parts
out <- mutate(temp,
subgroup = findInterval(x = 1:n(), vec = ind, left.open = TRUE),
group = paste(id, subgroup, sep = "_"))
return(out)
}) -> test
Once we have test, we have another job. Some longitude points of Alaska stay in positive numbers (e.g., 179.85). As long as we have numbers like this, ggplot2 draws funny long lines, which you can see even in your example. What we need is to convert these positive numbers to negative ones so that ggplot2 can draw a proper map.
mutate(test,
lon = if_else(lon > 0, lon * -1, lon)) -> out
By this time, out looks like this.
id lon lat subgroup group
1 1 -108.6213 45.00028 1 1_1
2 1 -108.6197 45.00028 1 1_1
3 1 -108.6150 45.00031 1 1_1
4 1 -108.6134 45.00032 1 1_1
5 1 -108.6133 45.00032 1 1_1
6 1 -108.6130 45.00032 1 1_1
Now we are ready to draw a map.
ggplot() +
geom_polygon(data = out, aes(x = lon, y = lat, group = group))
Related
There is five polygons for five different cities (see attached file in the link, it's called bound.shp). I also have a point file "points.csv" with longitude and latitude where for each point I know the proportion of people belonging to group m and group h.
I am trying to calculate the spatial segregation proposed by Reardon and O’Sullivan, “Measures of Spatial Segregation”
There is a package called "seg" which should allow us to do it. I am trying to do it but so far no success.
Here is the link to the example file: LINK. After downloading the "example". This is what I do:
setwd("~/example")
library(seg)
library(sf)
bound <- st_read("bound.shp")
points <- st_read("points.csv", options=c("X_POSSIBLE_NAMES=x","Y_POSSIBLE_NAMES=y"))
#I apply the following formula
seg::spseg(bound, points[ ,c(group_m, group_h)] , smoothing = "kernel", sigma = bandwidth)
Error: 'x' must be a numeric matrix with two columns
Can someone help me solve this issue? Or is there an alternate method which I can use?
Thanks a lot.
I don't know what exactly spseg function does but when evaluating the spseg function in the seg package documentation;
First argument x should be dataframe or object of class Spatial.
Second argument data should be matrix or dataframe.
After evaluating the Examples for spseg function, it should have been noted that the data should have the same number of rows as the id number of the Spatial object. In your sample, the id is the cities that have different polygons.
First, let's examine the bound data;
setwd("~/example")
library(seg)
library(sf)
#For the fortify function
library(ggplot2)
bound <- st_read("bound.shp")
bound <- as_Spatial(bound)
class(bound)
"SpatialPolygonsDataFrame"
attr(,"package")
"sp"
tail(fortify(bound))
Regions defined for each Polygons
long lat order hole piece id group
5379 83.99410 27.17326 972 FALSE 1 5 5.1
5380 83.99583 27.17339 973 FALSE 1 5 5.1
5381 83.99705 27.17430 974 FALSE 1 5 5.1
5382 83.99792 27.17552 975 FALSE 1 5 5.1
5383 83.99810 27.17690 976 FALSE 1 5 5.1
5384 83.99812 27.17700 977 FALSE 1 5 5.1
So you have 5 id's in your SpatialPolygonsDataFrame. Now, let's read the point.csv with read.csv function since the data is required to be in matrix format for the spseg function.
points <- read.csv("c://Users/cemozen/Downloads/example/points.csv")
tail(points)
group_m group_h x y
950 4.95 78.49000 84.32887 26.81203
951 5.30 86.22167 84.27448 26.76932
952 8.68 77.85333 84.33353 26.80942
953 7.75 82.34000 84.35270 26.82850
954 7.75 82.34000 84.35270 26.82850
955 7.75 82.34000 84.35270 26.82850
In the documentation and the example within, it has been strictly stated that; the row number of the points which have two attributes (group_m and group_h in our data), should be equal to the id number (which is the cities). Maybe, you should calculate a value by using the mean for each polygon or any other statistics for each city in your data to be able to get only one value for each polygon.
On the other hand, I just would like to show that the function is working properly after feeding with a matrix that has 5 rows and 2 groups.
sample_spseg <- spseg(bound, as.matrix(points[1:5,c("group_m", "group_h")]))
print(sample_spseg)
Reardon and O'Sullivan's spatial segregation measures
Dissimilarity (D) : 0.0209283
Relative diversity (R): -0.008781
Information theory (H): -0.0066197
Exposure/Isolation (P):
group_m group_h
group_m 0.07577679 0.9242232
group_h 0.07516285 0.9248372
--
The exposure/isolation matrix should be read horizontally.
Read 'help(spseg)' for more details.
first: I do not have experience with the seg-package and it's function.
What I read from your question, is that you want to perform the spseg-function, om the points within each area?
If so, here is a possible apprach:
library(sf)
library(tidyverse)
library(seg)
library(mapview) # for quick viewing only
# read polygons, make valif to avoid probp;ems later on
areas <- st_read("./temp/example/bound.shp") %>%
sf::st_make_valid()
# read points and convert to sf object
points <- read.csv("./temp/example/points.csv") %>%
sf::st_as_sf(coords = c("x", "y"), crs = 4326) %>%
#spatial join city (use st_intersection())
sf::st_join(areas)
# what do we have so far??
mapview::mapview(points, zcol = "city")
# get the coordinates back into a data.frame
mydata <- cbind(points, st_coordinates(points))
# drop the geometry, we do not need it anymore
st_geometry(mydata) <- NULL
# looks like...
head(mydata)
# group_m group_h city X Y
# 1 8.02 84.51 2 84.02780 27.31180
# 2 8.02 84.51 2 84.02780 27.31180
# 3 8.02 84.51 2 84.02780 27.31180
# 4 5.01 84.96 2 84.04308 27.27651
# 5 5.01 84.96 2 84.04622 27.27152
# 6 5.01 84.96 2 84.04622 27.27152
# Split to a list by city
L <- split(mydata, mydata$city)
# loop over list and perform sppseg function
final <- lapply(L, function(i) spseg(x = i[, 4:5], data = i[, 1:2]))
# test for the first city
final[[1]]
# Reardon and O'Sullivan's spatial segregation measures
#
# Dissimilarity (D) : 0.0063
# Relative diversity (R): -0.0088
# Information theory (H): -0.0067
# Exposure/Isolation (P):
# group_m group_h
# group_m 0.1160976 0.8839024
# group_h 0.1157357 0.8842643
# --
# The exposure/isolation matrix should be read horizontally.
# Read 'help(spseg)' for more details.
spplot(final[[1]], main = "Equal")
I have a tricky issue.
I am trying to visualise some data in for a sort of 'pretty' meta data explorer. It's basic point data in the following format:
> print(tempdata[1:5, ])
Station Lat_dec Long_dec Surface_T
1 247 50.33445 -2.240283 15.19
2 245 50.58483 -2.535217 14.11
3 239 50.16883 -2.509250 15.41
4 225 50.32848 -2.765967 15.34
5 229 50.63900 -2.964800 14.09
I can use the Lat, Long and Temp to create the following voronoi polygons, and a simple box to clip them so they don't extend forever.
# Creating Stations
stations <- st_as_sf(df,
coords = c("Long_dec","Lat_dec")
)
# Create voronoi/thiessen polygons
v <- stations %>%
st_union() %>%
st_voronoi() %>%
st_collection_extract()
# Creating boundary box
box <- st_bbox(stations) %>%
st_as_sfc()
# Clipping voronoi to boundary box
hmm <- st_crop(v, box)
This produces the following surface polygon:
> str(hmm)
sfc_POLYGON of length 107; first list element: List of 1
$ : num [1:7, 1:2] -7.23 -6.94 -6.95 -7.04 -7.24 ...
- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
Plotted as:
leaflet() %>%
addPolygons(data = hmm) %>%
addProviderTiles(providers$Esri.WorldTerrain)
What I am trying to do, is colour the surface polygons by temperature, e.g. hotter being redder etc. I have tried all which ways and typically R crashes.
I think its something to do with not having any info on the surface polygons such as station or a polygon ID number that links to the original data.
I am stumped, any help would be awesome!!
packages:
library(sf)
library(dplyr)
library(rgdal)
library(leaflet)
update:
> tempdata[1:10, ]
Station Lat_dec Long_dec Surface_T
1 247 50.33445 -2.240283 15.19
2 245 50.58483 -2.535217 14.11
3 239 50.16883 -2.509250 15.41
4 225 50.32848 -2.765967 15.34
5 229 50.63900 -2.964800 14.09
6 227 50.33757 -3.303217 15.12
7 217 50.16657 -3.563817 15.13
8 207 49.66683 -3.556550 15.04
9 213 50.16512 -3.824667 14.97
10 219 49.83707 -3.815483 14.78
stations <- st_as_sf(tempdata,
coords = c("Long_dec","Lat_dec"))
test <- st_sample(stations,
size = as.numeric(count(tempdata))
)
join <- st_sf("temp" = stations$Surface_T, geometry = test)
That was a new one for me, too. Never worked with voronois before. But the problem is indeed that your stations dataframe looses all its features with st_union().
Just adding it doesn't seem to be viable, since the order of the polygons is not the same as the order of the points before. A spatial join might be a good workaround therefore.
Using my own sample data:
library(sf)
library(leaflet)
#will work with any polygon
samplepoints_sf <- st_sample(bw_polygon, size = 2000, type = "random", crs = st_crs(4326))[1:500]
# although coordinates are longitude/latitude, st_intersects assumes that they are planar
#create an sf-object like your example
bw_sf <- st_sf("some_variable" = sample(1:50, 500, replace = TRUE), geometry = samplepoints_sf)
#create the voronoi diagram, "some_variable" gets lost.
v <- bw_sf %>%
st_union() %>%
st_voronoi() %>%
st_collection_extract()
#do a spatial join with the original bw_sf data frame to get the data back
v_poly <- st_cast(v) %>%
st_intersection(bw_polygon) %>%
st_sf() %>%
st_join(bw_sf, join = st_contains)
#create a palette (many ways to do this step)
colors <- colorFactor(
palette = c("green", "yellow", "red"),
domain = (v_poly$some_variable)
#create the leaflet
leaflet(v_poly) %>% addTiles() %>%
addPolygons(fillColor = colors(v_poly$some_variable),
fillOpacity = 0.7,
weight = 1,
popup = paste("<strong> some variable: </strong>",v_poly$some_variable))
So, hope this works for you.
Excel allows you to switch rows and columns in its Chart functionality.
I am trying to replicate this in R. My data (shown) below, is showing production for each company in rows. I am unable to figure out how to show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph. Any help appreciated.
Data:
tibble::tribble( ~Company.Name, ~Month-1, ~Month-2, ~Month-3, ~Month-4, "Comp-1", 945.5438986, 1081.417009, 976.7388701, 864.309703, "Comp-2", 16448.87, 13913.19, 12005.28, 10605.32, "Comp-3", 346.9689321, 398.2297592, 549.1282647, 550.4207169, "Comp-4", 748.8806367, 949.463941, 1018.877481, 932.3773791 )
I'm going to skip the part where you want to transpose, and infer that your purpose for that was solely to help with plotting. The part I'm focusing on here is "show the Month-1, Month-2 etc in x-axis, and the series for each company in the same graph".
This is doable in base graphics, but I highly recommend using ggplot2 (or plotly or similar), due to its ease of dealing with dimensional plots like this. The "grammar of graphics" (which both tend to implement) really prefers data like this be in a "long" format, so part of what I'll do is convert to this format.
First, some data:
set.seed(2)
months <- paste0("Month", 1:30)
companies <- paste0("Comp", 1:5)
m <- matrix(abs(rnorm(length(months)*length(companies), sd=1e3)),
nrow = length(companies))
d <- cbind.data.frame(
Company = companies,
m,
stringsAsFactors = FALSE
)
colnames(d)[-1] <- months
str(d)
# 'data.frame': 5 obs. of 31 variables:
# $ Company: chr "Comp1" "Comp2" "Comp3" "Comp4" ...
# $ Month1 : num 896.9 184.8 1587.8 1130.4 80.3
# $ Month2 : num 132 708 240 1984 139
# $ Month3 : num 418 982 393 1040 1782
# $ Month4 : num 2311.1 878.6 35.8 1012.8 432.3
# (truncated)
Reshaping can be done with multiple libraries, including base R, here are two techniques:
library(data.table)
d2 <- melt(as.data.table(d), id = 1, variable.name = "Month", value.name = "Cost")
d2[,Month := as.integer(gsub("[^0-9]", "", Month)),]
d2
# Company Month Cost
# 1: Comp1 1 896.91455
# 2: Comp2 1 184.84918
# 3: Comp3 1 1587.84533
# 4: Comp4 1 1130.37567
# 5: Comp5 1 80.25176
# ---
# 146: Comp1 30 653.67306
# 147: Comp2 30 657.10598
# 148: Comp3 30 549.90924
# 149: Comp4 30 806.72936
# 150: Comp5 30 997.37972
library(dplyr)
# library(tidyr)
d2 <- tbl_df(d) %>%
tidyr::gather(Month, Cost, -Company) %>%
mutate(Month = as.integer(gsub("[^0-9]", "", Month)))
I also integerized the Month, since it made sense with an ordinal variable. This isn't strictly necessary, the plot would just treat them as discretes.
The plot is anti-climactically simple:
library(ggplot2)
ggplot(d2, aes(Month, Cost, group=Company)) +
geom_line(aes(color = Company))
Bottom line: I don't think you need to worry about transposing your data: doing so has many complications that can just confuse things. Reshaping is a good thing (in my opinion), but with this kind of data is fast enough that if your data is stored in the wide format, you can re-transform it without too much difficulty. (If you are thinking about putting this in a database, however, I'd strongly recommend you re-think "wide", your db schema will be challenging if you keep it.)
I'm working with a fixed effects regression model using plm.
The model looks like this:
FE.model <-plm(fml, data = data.reg2,
index=c('Site.ID','date.hour'), # cross section ID and time series ID
model='within', #coefficients are fixed
effect='individual')
summary(FE.model)
"fml" is a formula I defined previously. I have many independent variables, so this made it more efficient.
What I want to do is get my fitted values (my yhats) and join them to my base dataset; data.reg2
I was able to get the fitted values using this code:
Fe.model.fitted <- FE.model$model[[1]] - FE.model$residuals
However, this only gives me a one column vector of fitted values only - I have no way of joining it to my base dataset.
Alternatively, I've tried something like this:
Fe.model.fitted <- cbind(data.reg2, resid=resid(FE.model), fitted=fitted(FE.model))
However, I get this error with that:
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""pseries"" to a data.frame
Are there any other ways to get my fitted values in my base dataset? Or can someone explain the error I'm getting and maybe a way to fix it?
I should note that I don't want to manually compute the yhats based on my betas. I have way too many independent variables for that option and my defined formula (fml) may change so that option would not be efficient.
Many thanks!!
Merging plm fitted values back into the original dataset requires some intermediate steps -- plm drops any rows with missing data, and as far as I can tell, a plm object does not contain the index info. The order of the data is not preserved -- see what Giovanni Millo, one of plm's authors, commented in this thread:
"...the input order is not always preserved: observations are always reordered by (individual, time) internally, so that the output you get is ordered accordingly..."
The steps in short:
Get fitted values from the estimated plm object. It is a single vector but the entries are named. The names correspond to the position in the index.
Get the index, using the index() function. It can return both individual and time indices. Note the index may contain more rows than the fitted values, in case rows were removed for missing data. (It is also possible to generate an index directly from the original data, but I did not see a promise that the original order of the data is preserved in what plm returns.)
Merge into the original data, looking up the id and time values from the index.
Sample code is provided below. Kind of long but I've tried to comment. The code is not optimized, my intention was to list the steps explicitly. Also, I am using data.tables rather than data.frames.
library(data.table); library(plm)
### Generate dummy data. This way we know the "true" coefficients
set.seed(100)
n <- 500 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n / 5))))
DT[, x1 := rnorm(n)]
DT[, x2 := rnorm(n)]
DT[, y := x1 + 2 * x2 + rnorm(n) / 10]
setkey(DT, id, time)
# # Make it an unbalanced panel & put in some NAs
DT <- DT[!(id == "a" & time == 4)]
DT[.("a", 3), x2 := as.numeric(NA)]
DT[.("d", 2), x2 := as.numeric(NA)]
str(DT)
### Run the model -- both individual and time effects; "within" model
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit"))
### Merge the fitted values back into the data.table DT
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped
cat("\nRows omitted (due to NA): ", nrow(DT) - length(PLM$model$y))
# Since the objects returned by plm() do not contain the index, need to generate it from the data
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes
INDEX[, id := as.character(id)]
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT
# Generate the fitted values as the difference between the y values and the residuals
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case...
FIT <- data.table(
index = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below
fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals)
)
}
FIT[, id := INDEX[index]$id]
FIT[, time := INDEX[index]$time]
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for)
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped!
I have a simplified method. The main problem here is twofold:
1) pdata.frames sort your input alphabetically by name, then year. This can be addressed by sorting your data frame first before running plm.
2) rows with NA in variables included in the formula are dropped. I handle this problem by creating a second formula including my id and time variable, and then use model.frame to extract the data used in the regression (excluding NAs but now also includes id and time)
library(plm)
set.seed(100)
n <- 10 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.frame(id = c("a","c","b","d","e"), time = c(1:(n / 5)),x1 = rnorm(n),x2= rnorm(n),x3=rnorm(n))
DT$Y = DT$x2 + 2 * DT$x3 + rnorm(n) / 10 # make x1 a function of other variables
DT$x3[3]=NA # add an NA to show this works with missing data
DT
# now can add drop.index = F, but note that DT is now sorted by order(id,time)
pdata.frame(DT,index=c('id','time'),drop.index = F)
# order DT to match pdata.frame that will be used for plm
DT=DT[order(DT$id,DT$time),]
# formulas
formulas =Y~x1+x2+x3
formulas_dataframe = Y~x1+x2+x3 +id+time # add id and time for model.frame
# estimate
random <- plm(formulas, data=DT, index=c("id", "time"), model="random",na.action = 'na.omit')
summary(random)
# merge prediction and and model.frame
fitted = data.frame(fitted = random$model[[1]] - random$residuals)
model_data = cbind(as.data.frame(as.matrix(random$model)),fitted) # this isn't really needed but shows that input and model.frame are same
model_data = cbind(model_data,na.omit(model.frame(formulas_dataframe,DT)))
model_data
I wrote a function (predict.out.plm) to do out of sample predictions after estimating First Differences or Fixed-Effects models with plm.
The function further adds the predicted values to the indices of the original data. This is done by using the rownames saved within the plm - attributes(plmobject)$index and the rownames within the model.matrix
for more details see the function posted here:
https://stackoverflow.com/a/44185441/2409896
It's been a while for this post, but I believe the easiest way to do this now would be:
Fe.model.fitted <- cbind(FE.model$model,
resid=FE.model$residuals,
fitted=plm:::fitted_exp.plm(FE.model))
The function fitted_exp.plm is not exported by the plm package but we can use the ::: to extract it.
The residuals are deviation of the model from the value on the LHS of the formula .... which you have not shown to us. There is a fitted.panelmodel function in the 'plm' package, but it appears to expect that there will be a fitted value which the plm function does not return by default, nor is it documented to do so, nor is the a way that I see to make it cough one up.
library(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
summary(zz) # the example on the plm page:
> str(fitted(zz))
NULL
> names(zz$model)
[1] "log(gsp)" "log(pcap)" "log(pc)" "log(emp)" "unemp"
> Produc[ , c("Yvar", "Fitted")] <- cbind( zz$model[ ,"log(gsp)", drop=FALSE], zz$residuals)
> str(Produc)
'data.frame': 816 obs. of 12 variables:
$ state : Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ year : int 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ...
$ pcap : num 15033 15502 15972 16406 16763 ...
$ hwy : num 7326 7526 7765 7908 8026 ...
$ water : num 1656 1721 1765 1742 1735 ...
$ util : num 6051 6255 6442 6756 7002 ...
$ pc : num 35794 37300 38670 40084 42057 ...
$ gsp : int 28418 29375 31303 33430 33749 33604 35764 37463 39964 40979 ...
$ emp : num 1010 1022 1072 1136 1170 ...
$ unemp : num 4.7 5.2 4.7 3.9 5.5 7.7 6.8 7.4 6.3 7.1 ...
$ Yvar :Classes 'pseries', 'pseries', 'integer' atomic [1:816] 10.3 10.3 10.4 10.4 10.4 ...
.. ..- attr(*, "index")='data.frame': 816 obs. of 2 variables:
.. .. ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Fitted: num -0.04656 -0.03064 -0.01645 -0.00873 -0.02708 ...
I currently have a shapefile of the UK and have plot the population of species in different regions of the UK. So far I have just plotted 3 levels of species population and coloured them red=high, orange=med, green=low. But what I would like to do would be to have a gradient plot instead of being bounded by just 3 colours.
So far I have a table called Count that has the regions as the column names and then the count of species for each region below. My lowest count being 0 and my highest being around 2500 and the regions in Count match with the regions in my shapefile. I have a function that determines what is high, med, low based on levels you input yourself
High<-colnames(Count)[which(Count>'input value here')]
and then these are plotted onto the shapefile like this:
plot(ukmap[(ukmap$Region %in% High),],col='red',add=T)
Unfortunately I can't really install any packages, I was thinking of using colorRamp, but I'm not really sure what to do?
EDIT: my data looks something like this
Wales Midlands North Scotland South East South West
1 551 32 124 1 49 28
3 23 99 291 152 164 107
4 1 7 17 11 21 14
7 192 32 12 0 1 9
9 98 97 5 1 21 0
and the first column is just a number that represents the species and currently I have a function that plots the count onto a UK shapefile but based on boundaries of high, med and low. The data above is not attached to my shapefile. I then loop through for each line (species) of my data set and plot a new map for each line (species).
All right, I'll bite. I'm not going to use base R because plot is too hard for me to understand, so instead we will be using ggplot2.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
require(ggplot2)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# use the NAME_2 field (representing counties) to create data frame
uk.map <- fortify(uk, region = "NAME_2")
# create fake count data...
uk.map$count <- round(runif(nrow(uk.map), 0, 2500), 0)
# quick visual check
ggplot(uk.map, aes(x = long, y = lat, group = group, fill = count)) +
geom_polygon(colour = "black", size = 0.5, aes(group = group)) +
theme()
This generates the output below, which may be similar to what you need.
Note that we don't explictly specify the gradient in this case - we just leave it up to ggplot. If you wish to specify those details it is possible but more involved. If you go down that route you should create another column in uk.map to allocate each count into one of (say) 10 bins using the cut function. The uk.map data frame looks like this:
> str(uk.map)
'data.frame': 427339 obs. of 8 variables:
$ long : num -2.05 -2.05 -2.05 -2.05 -2.05 ...
$ lat : num 57.2 57.2 57.2 57.2 57.2 ...
$ order: int 1 2 3 4 5 6 7 8 9 10 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece: Factor w/ 234 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ group: Factor w/ 1136 levels "Aberdeen.1","Aberdeenshire.1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ id : chr "Aberdeen" "Aberdeen" "Aberdeen" "Aberdeen" ...
$ count: num 1549 1375 433 427 1282 ...
>
OK, here is an alternative solution that doesn't use ggplot (I will leave the ggplot solution for reference). This code is simple but it should be enough to give you some ideas as to how you can adapt it to your own data.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- round(runif(nrow(uk#data), 0, 2500), 0)
uk#data$count <- as.numeric(uk#data$count)
# and plot it
plot(uk, col = gray(uk#data$count/2500))
The result of the code is the following plot.
EDIT following a request to include a legend, I have tweaked the code a little but in all honesty I don't understand base R's legend function well enough to get something of production quality and I have no wish to research it further. (Incidentally hat tip to this question for ideas.) A look at the plot beneath the code suggests that we need to reorder the legend colours etc but I will leave that to the original poster as an exercise or to post as another question.
# UK shapefile found via http://www.gadm.org/download
uk.url <- "http://www.filefactory.com/file/s3dz3jt3vr/n/GBR_adm_zip"
# replace following with your working directory - no trailing slash
work.dir <- "C:/Temp/r.temp/gb_map"
# the full file path for storing file
file.loc <- paste0(work.dir, "/uk.zip")
download.file (uk.url, destfile = file.loc, mode = "wb")
unzip(file.loc, exdir = work.dir)
# open the shapefile
require(rgdal)
uk <- readOGR(work.dir, layer = "GBR_adm2")
# make some fake data to plot
uk#data$count <- as.numeric(round(runif(nrow(uk#data), 0, 2500), 0))
uk#data$bin <- cut(uk#data$count, seq(0, 2500, by = 250),
include.lowest = TRUE, dig.lab = 4)
# labels for the legend
lev = levels(uk#data$bin)
lev2 <- gsub("\\,", " to ", lev)
lev3 <- gsub("\\]$", "", lev2)
lev4 <- gsub("\\(|\\)", " ", lev3)
lev5 <- gsub("^\\[", " ", lev4)
my.levels <- lev5
# Create a function to generate a continuous color palette
rbPal <- colorRampPalette(c('red','blue'))
uk#data$Col <- rbPal(10)[as.numeric(cut(uk#data$count, seq(0, 2500, by = 250)))]
# Plot
plot(uk, col = uk#data$Col)
legend("topleft", fill = uk#data$Col, legend = my.levels, col = uk#data$Col)
Have you tried colorRampPalette?
Here is how you could try to build a gradient palette
gradient_color <- colorRampPalette(c("blue", "red"))
gradient_color(10)
[1] "#0000FF" "#1C00E2" "#3800C6" "#5500AA" "#71008D" "#8D0071" "#AA0055"
[8] "#C60038" "#E2001C" "#FF0000"
An example plot
plot(rep(1,10),col=gradient_color(10))