Download, Plot map, and extract data in R

Download, Plot map, and extract data in R - r

I downloaded a monthly data from [NASA data][1] and saved in .txt and .asc format. I am trying to plot and extract the data from the ASCII file, but unfortunately I am unable to do so. I tried the following:
1.
infile <- "OMI/L3feb09.txt"
data <- as.matrix(read.table(infile, skip = 3, header = FALSE, sep = "\t"))
data[data == -9999] = NA
rr <- raster(data, crs = "+init=epsg:4326")
extent(rr) = c(179.375, 179.375+1.25*288, -59.5, -59.5+1*120)
Tried to extract for australia
adm <- getData("GADM", country="AUS", level=1)
rr = mask(rr, adm)
plot(rr)
library(rgdal)
r = raster("OMI/L3feb09.txt")
plot(r)
library(raster)
r = raster("OMI/L3feb09.txt")
plot(r)
4.Also tried,
df1 <- read.table("OMI/L3feb09.txt", skip = 11, header = FALSE, sep = "\t")
Tried the following from
Stackoverflow link 1
Stackoverflow link 2
The problem is there are strings in the file in between number, such as "lat = -55.5"
Appreciate any kind of help. Thank you
[2]: https://stackoverflow.com/questions/42064943/opening-an-ascii-file-using-r

So, I downloaded one file and played around with it! It is not the best solution, however, I hope it can give you an idea.
library(stringr)
# read data
data<-read.csv("L3_tropo_ozone_column_oct04",header = FALSE, skip = 3,sep = "")
# this "" will seperate lat = -59.5 to 3 rows, and will be easier to remove.
#Also each row in the data frame constrained by 2 rows of "lat", represents #data on the later "lat".
lat_index<-which(data[,1]=="lat")
#you need the last row that contains data not "lat string
lat_index<-lat_index-1
#define an empty array for results.
result<-array(NA, dim = c(120,288),dimnames = list(lat=seq(-59.5,59.5,1),
lon=seq(-179.375,179.375,1.25)))
I assumed data -on 3 three digits- on each latituide is dividable by 3 resulting in 288, which equals the lon grid number. Correct me if I'm wrong.
# function to split a string into a vector in which each string has three letter/numbers
split_n_parts<-function(input_string,n){
# dislove it to many elements or by number
input_string_1<-unlist(str_extract_all(input_string,boundary("character")))
output_string<-vector(length = length(input_string_1)/n)
for ( x in 1:length(output_string)){
output_string[x]<-paste0(input_string_1[c(x*3-2)],
input_string_1[c(x*3-1)],
input_string_1[c(x*3)])
}
return(as.numeric(output_string))
}
Here, the code loops, collects, write each lat data in the result array
# loop over rows constrainted by 2 lats, process it and assign to an array
for (i in 1:length(lat_index)){
if(i ==1){
for(j in 1:lat_index[i]){
if(j==1){
row_j<-paste0(data[j,])
}else{
row_j<-paste0(row_j,data[j,])
}
}
}else{
ii<-i-1
lower_limit<-lat_index[ii]+4
upper_limit<-lat_index[i]
for(j in lower_limit:upper_limit){
if(j==lower_limit){
row_j<-paste0(data[j,])
}else{
row_j<-paste0(row_j,data[j,])
}
}
}
result[i,]<-split_n_parts(row_j,3)
}
Here, is the final array and image
#plot as image
image(result)
EDIT: To continue the solution and put the end-result:
# because data is IN DOBSON UNITS X 10
result<-result/10
#melt to datafrome
library(plyr)
result_df<-adply(result, c(1,2))
result_df$lat<-as.numeric(as.character(result_df$lat))
result_df$lon<-as.numeric(as.character(result_df$lon))
# plotting
library(maps)
library(ggplot2)
library(tidyverse)
world_map <- map_data("world")
#colors
jet.colors <-colorRampPalette(c("white", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
ggplot() +
geom_raster(data=result_df,aes(fill=V1,x=lon,y=lat))+
geom_polygon(data = world_map, aes(x = long, y = lat, group = group),
fill=NA, colour = "black")+
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))+
scale_fill_gradientn(colors = jet.colors(7))

Related

Why does spplot take so much time for multiple panels

I am plotting multiple shapefiles using spplot. Here's a data to construct that
library(raster)
library(randomcoloR)
my.shp <- getData('GADM', country = 'BRA', level = 2)
my.shp$ID<- 1:nrow(my.shp)
My data consists of a variable X for 10 years as shown where each column is a year
df <- matrix(sample(100:5000, 55040, replace = T), nrow = 5504, ncol = 10)
df <- data.frame(ID = 1:nrow(my.shp), df)
my.dat <- merge(my.shp, df, by = "ID")
variable.names <- paste0("X",1:10)
spplot(my.dat, rev(variable.names), col = NA, at = seq(from = 100, to = 5000, by = 500),
col.regions = distinctColorPalette(length(seq(from = 100, to = 5000, by = 500))),
main = list(label = "TEST"))
My problem is this plot takes so much time (around an hour) to get plotted and was wondering if there is something inherently wrong in the code itself that it is taking too long to plot. My laptop has a 32 GB RAM.
Thanks

I haven't compared this plot to your spplot because I don't want to spend an hour waiting for it.
Instead I'm proposing to use library(mapdeck) to plot an interactive map, which takes a matter of seconds.
Two things to note
You need a Mapbox Access token
You need to convert the sp object to sf
library(raster)
my.shp <- getData('GADM', country = 'BRA', level = 2)
my.shp$ID <- 1:nrow(my.shp)
df <- matrix(sample(100:5000, 55040, replace = T), nrow = 5504, ncol = 10)
df <- data.frame(ID = 1:nrow(my.shp), df)
my.dat <- merge(my.shp, df, by = "ID")
library(sf)
sf <- sf::st_as_sf( my.dat )
library(mapdeck)
set_token( "YOUR_MAPBOX_TOKEN" )
mapdeck() %>%
add_sf(
data = sf
, fill_colour = "GID_2"
)

Are you willing/able to switch to sf instead of sp?
The sf plot function is considerably faster than spplot, although the layout differs a bit.
library(sf)
my.dat_sf <- st_as_sf(my.dat)
plot(my.dat_sf[rev(variable.names)], max.plot=10, breaks=c(seq(from = 100, to = 5000, by = 500),5000),
pal = distinctColorPalette(length(seq(from = 100, to = 5000, by = 500))),
main = "TEST", border=NA, key.pos=4)
Additionally, you could try to simplify the polygon with rmapshaper::ms_simplify() for Spatial*-objects or sf::st_simplify() for SimpleFeatures, which lets you reduce the object size by quite a bit, depending on the given dTolerance. Thus plotting, will also be faster with simplified polygons.
The original SpatialPolygon:
format(object.size(my.dat_sf), units="Kb")
"25599.2 Kb"
and a simplified SimpleFeature:
dat_sf_simple <- st_transform(my.dat_sf, crs = 3035)
dat_sf_simple <- st_simplify(dat_sf_simple, dTolerance = 1000, preserveTopology = T)
dat_sf_simple <- st_transform(dat_sf_simple, crs = 4326)
format(object.size(dat_sf_simple), units="Kb")
"7864.2 Kb"
Plot the simplified SimpleFeature, which takes about 1 minute on my machine with 8GB RAM.
plot(dat_sf_simple[rev(variable.names)], max.plot=10, breaks=c(seq(from = 100, to = 5000, by = 500),5000),
pal = distinctColorPalette(length(seq(from = 100, to = 5000, by = 500))),
main = "TEST", border=NA, key.pos=4)
You could also try out with ggplot2, but I am pretty sure the most performant solution will be the sf plot.
library(ggplot2)
library(dplyr)
library(tidyr)
dat_sf_simple_gg <- dat_sf_simple %>%
dplyr::select(rev(variable.names), geometry) %>%
gather(VAR, SID, -geometry)
ggplot() +
geom_sf(data = dat_sf_simple_gg, aes(fill=SID)) +
facet_wrap(~VAR, ncol = 2)

Partial Row Labels Heatmap - R

I was wondering if anyone knows of a package that allows partial row labeling of heatmaps. I am currently using pheatmap() to construct my heatmaps, but I can use any package that has this functionality.
I have plots with many rows of differentially expressed genes and I would like to label a subset of them. There are two main things to consider (that I can think of):
The placement of the text annotation depends on the height of the row. If the rows are too narrow, then the text label will be ambiguous without some sort of pointer.
If multiple adjacent rows are significant (i.e. will be labelled), then these will need to be offset, and again, a pointer will be needed.
Below is an example of a partial solution that really only gets maybe halfway there, but I hope illustrates what I'd like to be able to do.
set.seed(1)
require(pheatmap)
require(RColorBrewer)
require(grid)
### Data to plot
data_mat <- matrix(sample(1:10000, 300), nrow = 50, ncol = 6)
rownames(data_mat) <- paste0("Gene", 1:50)
colnames(data_mat) <- c(paste0("A", 1:3), paste0("B", 1:3))
### Set how many genes to annotate
### TRUE - make enough labels that some overlap
### FALSE - no overlap
tooMany <- T
### Select a few genes to annotate
if (tooMany) {
sigGenes_v <- paste0("Gene", c(5,20,26,42,47,16,28))
newMain_v <- "Too Many Labels"
} else {
sigGenes_v <- paste0("Gene", c(5,20,26,42))
newMain_v <- "OK Labels"
}
### Make color list
colors_v <- brewer.pal(8, "Dark2")
colors_v <- colors_v[c(1:length(sigGenes_v), 8)]
names(colors_v) <- c(sigGenes_v, "No")
annColors_lsv <- list("Sig" = colors_v)
### Column Metadata
colMeta_df <- data.frame(Treatment = c(rep("A", 3), rep("B", 3)),
Replicate = c(rep(1:3, 2)),
stringsAsFactors = F,
row.names = colnames(data_mat))
### Row metadata
rowMeta_df <- data.frame(Sig = rep("No", 50),
stringsAsFactors = F,
row.names = rownames(data_mat))
for (gene_v in sigGenes_v) rowMeta_df[rownames(rowMeta_df) == gene_v, "Sig"] <- gene_v
### Heatmap
heat <- pheatmap(data_mat,
annotation_row = rowMeta_df,
annotation_col = colMeta_df,
annotation_colors = annColors_lsv,
cellwidth = 10,
main = "Original Heat")
### Get order of genes after clustering
genesInHeatOrder_v <- heat$tree_row$labels[heat$tree_row$order]
whichSigInHeatOrder_v <- which(genesInHeatOrder_v %in% sigGenes_v)
whichSigInHeatOrderLabels_v <- genesInHeatOrder_v[whichSigInHeatOrder_v]
sigY <- 1 - (0.02 * whichSigInHeatOrder_v)
### Change title
whichMainGrob_v <- which(heat$gtable$layout$name == "main")
heat$gtable$grobs[[whichMainGrob_v]] <- textGrob(label = newMain_v,
gp = gpar(fontsize = 16))
### Remove rows
whichRowGrob_v <- which(heat$gtable$layout$name == "row_names")
heat$gtable$grobs[[whichRowGrob_v]] <- textGrob(label = whichSigInHeatOrderLabels_v,
y = sigY,
vjust = 1)
grid.newpage()
grid.draw(heat)
Here are a few outputs:
original heatmap:
ok labels:
ok labels, with flags:
too many labels
too many labels, with flags
The "with flags" outputs are the desired final results.
I just saved these as images from the Rstudio plot viewer. I recognize that I could save them as pdfs and provide a larger file size to get rid of the label overlap, but then the individual cells would be larger than I want.

Based on your code, you seem fairly comfortable with gtables & grobs. A (relatively) straightforward way to achieve the look you want is to zoom in on the row label grob, & make some changes there:
replace unwanted labels with "";
evenly spread out labels within the available space;
add line segments joining the old and new label positions.
I wrote a wrapper function for this, which works as follows:
# heat refers to the original heatmap produced from the pheatmap() function
# kept.labels should be a vector of labels you wish to show
# repel.degree is a number in the range [0, 1], controlling how much the
# labels are spread out from one another
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0.5)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 1)
Function (explanations in annotations):
add.flag <- function(pheatmap,
kept.labels,
repel.degree) {
# repel.degree = number within [0, 1], which controls how much
# space to allocate for repelling labels.
## repel.degree = 0: spread out labels over existing range of kept labels
## repel.degree = 1: spread out labels over the full y-axis
heatmap <- pheatmap$gtable
new.label <- heatmap$grobs[[which(heatmap$layout$name == "row_names")]]
# keep only labels in kept.labels, replace the rest with ""
new.label$label <- ifelse(new.label$label %in% kept.labels,
new.label$label, "")
# calculate evenly spaced out y-axis positions
repelled.y <- function(d, d.select, k = repel.degree){
# d = vector of distances for labels
# d.select = vector of T/F for which labels are significant
# recursive function to get current label positions
# (note the unit is "npc" for all components of each distance)
strip.npc <- function(dd){
if(!"unit.arithmetic" %in% class(dd)) {
return(as.numeric(dd))
}
d1 <- strip.npc(dd$arg1)
d2 <- strip.npc(dd$arg2)
fn <- dd$fname
return(lazyeval::lazy_eval(paste(d1, fn, d2)))
}
full.range <- sapply(seq_along(d), function(i) strip.npc(d[i]))
selected.range <- sapply(seq_along(d[d.select]), function(i) strip.npc(d[d.select][i]))
return(unit(seq(from = max(selected.range) + k*(max(full.range) - max(selected.range)),
to = min(selected.range) - k*(min(selected.range) - min(full.range)),
length.out = sum(d.select)),
"npc"))
}
new.y.positions <- repelled.y(new.label$y,
d.select = new.label$label != "")
new.flag <- segmentsGrob(x0 = new.label$x,
x1 = new.label$x + unit(0.15, "npc"),
y0 = new.label$y[new.label$label != ""],
y1 = new.y.positions)
# shift position for selected labels
new.label$x <- new.label$x + unit(0.2, "npc")
new.label$y[new.label$label != ""] <- new.y.positions
# add flag to heatmap
heatmap <- gtable::gtable_add_grob(x = heatmap,
grobs = new.flag,
t = 4,
l = 4
)
# replace label positions in heatmap
heatmap$grobs[[which(heatmap$layout$name == "row_names")]] <- new.label
# plot result
grid.newpage()
grid.draw(heatmap)
# return a copy of the heatmap invisibly
invisible(heatmap)
}

Visualization issue while using Leaflet

I have two data frame as below:
PickUP <- data.frame(pickuplong = c(-73.93909 ,-73.94189 ,-73.93754,-73.91638,-73.92792 ,-73.88634), pickuplat =c(40.84408,40.83841,40.85311,40.84966,40.86284,40.85628))
Dropoff <- data.frame(pickuplong = c(-73.93351 ,-73.93909 ,-73.93909 ,-73.80747,-73.95722,-73.91880), pickuplat =c(40.76621,40.84408,40.85311,40.69951,40.68877,40.75917), Droplong =c(-73.91300,-73.96259 ,-73.94870,-73.93860,-73.93633, -73.90690), Droplat =c(40.77777,40.77488 ,40.78493,40.84463,40.75977,40.77013))
I try to find the pickup coordinations (longtitude and latitude) in the pickup data frame which are repeated in dropoff dataframe. I have the below code but I got the error on this:
library(sp)
library(rgdal)
library(leaflet)
library(mapview)
library(dplyr)
a <- semi_join(Dropoff , PickUP , by = c("pickuplong","pickuplat"))
a$ID <- 1:nrow(a)
Dropoff_p <- a[, c("ID", "Pickup_longitude", "Pickup_latitude")]
Dropoff_d <- a[, c("ID", "Dropoff_longitude", "Dropoff_latitude")]
coordinates(Dropoff_p) <- ~Pickup_longitude + Pickup_latitude
coordinates(Dropoff_d) <- ~Dropoff_longitude + Dropoff_latitude
proj4string(Dropoff_p) <- CRS("+init=epsg:4326")
proj4string(Dropoff_d) <- CRS("+init=epsg:4326")
map_p <- mapview(Dropoff_p, color = "red")
map_d <- mapview(Dropoff_d, color = "blue")
map_p + map_d
My error is:
Error in $<-.data.frame (tmp, "ID", value = c(1L, 0L)) :
replacement has 2 rows, data has 0 Error during wrapup: cannot open the
connection

When subsetting the data frame, you have to use the same column names. I changed the column name in the Dropoff_p, Dropoff_d, coordinates(Dropoff_p), and proj4string(Dropoff_d), and then your script works.
In addition, the mapview package just has a new update. If you want, you can update your mapview to version 2.0.1. You can also add col.regions = "red" and col.regions = "blue" because it seems like under the new version the color argument will only change the outline of a point. To change the fill color, use col.regions.
library(sp)
library(rgdal)
library(leaflet)
library(mapview)
library(dplyr)
a <- semi_join(Dropoff , PickUP , by = c("pickuplong","pickuplat"))
a$ID <- 1:nrow(a)
Dropoff_p <- a[, c("ID", "pickuplong", "pickuplat")]
Dropoff_d <- a[, c("ID", "Droplong", "Droplat")]
coordinates(Dropoff_p) <- ~pickuplong + pickuplat
coordinates(Dropoff_d) <- ~Droplong + Droplat
proj4string(Dropoff_p) <- CRS("+init=epsg:4326")
proj4string(Dropoff_d) <- CRS("+init=epsg:4326")
map_p <- mapview(Dropoff_p, color = "red", col.regions = "red")
map_d <- mapview(Dropoff_d, color = "blue", col.regions = "blue")
map_p + map_d

Saving multiply pdf plots r

I have made a loop for making multiply plots, however i have no way of saving them, my code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd()
#### Read data & Converting factors ####
dat <- read.table("Complete RING.txt", header =TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$Fly <- as.factor(dat$Fly)
dat$Temp <- as.factor(dat$Temp)
str(dat)
datSUM <- summaryBy(X0.5_sec+X1_sec+X1.5_sec+X2_sec+X2.5_sec+X3_sec~Vial_nr+Concentration+Sex+Line+Vial+Temp,data=dat, FUN=sum)
fl<-levels(datSUM$Line)
colors = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
meltet <- melt(datSUM, id=c("Concentration","Sex","Line","Vial", "Temp", "Vial_nr"))
levels(meltet$variable) <- c('0,5 sec', '1 sec', '1,5 sec', '2 sec', '2,5 sec', '3 sec')
meltet20 <- subset(meltet, Line=="20")
meltet20$variable <- as.factor(meltet20$variable)
AllConcentrations <- levels(meltet20$Concentration)
for (i in AllConcentrations) {
meltet.i <- meltet20[meltet20$Concentration ==i,]
quartz()
print(dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))) }
I have tried with the quartz.save function, but that just overwrites the files. Im using a mac if that makes any difference.

When I want to save multiple plots in a loop I tend to do something like...
for(i in AllConcentrations){
meltet.i <- meltet20[meltet20$Concentration ==i,]
pdf(paste("my_filename", i, ".pdf", sep = ""))
dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))
dev.off()
}
This will create a pdf file for every level in AllConcentrations and save it in your working directory. It will paste together my_filename, the number of the iteration i, and then .pdf together to make each file unique. Of course, you will want to adjust height and width in the pdf function.

Add a label to map at each leg start

I'm plotting legs of a route to a ggmap. It works okay so far. I've been trying to add a label containing the order (n from the loop) of each leg.
I've tried +geom_text to the geom_leg() but I get the error :
Error in geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat), :
non-numeric argument to binary operator
I'd appreciate any help adding a label to indicate the leg.
Data :
structure(c("53.193418", "53.1905138631287", "53.186744", "53.189836",
"53.1884117", "53.1902965", "53.1940384", "53.1934748", "53.1894004",
"53.1916771", "-2.881248", "-2.89043889005541", "-2.890165",
"-2.893896", "-2.88802", "-2.8919373", "-2.8972299", "-2.8814698",
"-2.8886692", "-2.8846099"), .Dim = c(10L, 2L))
Function :
create.map<-function(lst){
library("ggmap")
cncat<-c(paste(lst[,1],lst[,2],sep=","))
df2<-data.frame(cncat)
leg <-function(start, dest, order){
r<- route(from=start,to=dest,mode = c("walking"),structure = c("legs"))
c<- geom_leg(aes(x = startLon, y = startLat,xend = endLon, yend = endLat),
alpha = 2/4, size = 2, data = r,colour = 'blue')+
geom_text(aes(label = order), size = 3)
return (c)
}
a<-qmap('Chester, UK', zoom = 15, maptype = 'road')
for (n in 1:9){
l<-leg(as.character(df2[n,1]), as.character(df2[n+1,1]),n)
a<-a+l
}
a
}

Is this close? (Note: this calls your list of points way.points).
way.points <- as.data.frame(way.points,stringsAsFactors=FALSE)
library(ggmap)
rte.from <- apply(way.points[-nrow(way.points),],1,paste,collapse=",")
rte.to <- apply(way.points[-1,],1,paste,collapse=",")
rte <- do.call(rbind,
mapply(route, rte.from, rte.to, SIMPLIFY=FALSE,
MoreArgs=list(mode="walking",structure="legs")))
coords <- rbind(as.matrix(rte[,7:8]),as.matrix(rte[nrow(rte),9:10]))
coords <- as.data.frame(coords)
ggm <- qmap('Chester, UK', zoom = 15, maptype = 'road')
ggm +
geom_path(data=coords,aes(x=startLon,y=startLat),color="blue",size=2)+
geom_point(data=way.points,aes(x=as.numeric(V2),y=as.numeric(V1)),
size=10,color="yellow")+
geom_text(data=way.points,
aes(x=as.numeric(V2),y=as.numeric(V1), label=seq_along(V1)))
So this assembles a vector of from and to coordinates using apply(...), then uses mapply(...) to call route(...) with both vectors, returning the overall list of coordinates in a data frame rte. Because the coordinates are stored as, e.g. $startLat and $endLat, we form a coords data frame by adding the final $endLat and $endLong to rte to get the very last leg of the route. Then we use geom_path(...) to draw the path in one step. Finally we use geom_text(...) with x and y-values from the original way.points data frame, and we use geom_point(...) just to make them stand out a bit.

Here's a bare bones solution. I just added the labels to the finished ggmap object a. If you replace the line
a
with
lst2 <- data.frame(cbind(lst, leg = as.character(1:10) )
names(lst2) <- c("lat", "lon", "leg")
a <- a + geom_text(data=lst2,aes(x=lon,y=lat,label=leg),size=5, vjust = 0, hjust = -0.5)
return(a)
in your create.map function, you should get (roughly) the desired result. I might have reversed the lat and lon variables, and you probably want to tweak the size, location, etc. Hope this helps!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Download, Plot map, and extract data in R - r

Related

Why does spplot take so much time for multiple panels

Partial Row Labels Heatmap - R

Visualization issue while using Leaflet

Saving multiply pdf plots r

Add a label to map at each leg start

Categories

Resources