R: How to create multiple maps (rworldmap) using apply? - r

I want to create multiple maps (similar to this example) using the apply family. Here a small sample of my code (~200 rows x 150 cols). (UN and ISO3 are codes for rworldmap):
df <- structure(list(BLUE.fruits = c(12803543,
3745797, 19947613, 0, 130, 4), BLUE.nuts = c(21563867, 533665,
171984, 0, 0, 0), BLUE.veggies = c(92690, 188940, 34910, 0, 0,
577), GREEN.fruits = c(3389314, 15773576, 8942278, 0, 814, 87538
), GREEN.nuts = c(6399474, 1640804, 464688, 0, 0, 0), GREEN.veggies = c(15508,
174504, 149581, 0, 0, 6190), UN = structure(c(4L, 5L, 1L, 6L,
2L, 3L), .Label = c("12", "24", "28", "4", "8", "n/a"), class = "factor"),
ISO3 = structure(c(1L, 3L, 6L, 4L, 2L, 5L), .Label = c("AFG",
"AGO", "ALB", "ASM", "ATG", "DZA"), class = "factor")), .Names = c("BLUE.fruits", "BLUE.nuts", "BLUE.veggies", "GREEN.fruits", "GREEN.nuts",
"GREEN.veggies", "UN", "ISO3"), row.names = c(97L, 150L, 159L,
167L, 184L, 191L), class = "data.frame")
and the code I used before to plot one single map:
library(rworldmap)
mapDevice('x11')
spdf <- joinCountryData2Map(df, joinCode="ISO3", nameJoinColumn="ISO3")
mapWF <- mapCountryData(spdf, nameColumnToPlot="BLUE.nuts",
catMethod="quantiles")
Note: in mapCountryData() I used the names of single columns (in this case "BLUE.nuts"). My question is: is there a way to apply this mapping code for the different columns creating six different maps? Either in one multi-panel using layout() or even better creating six different plots that get saved according to their colnames. Ideas? Thanks a lot in advance

You are close.
Add this to save one plot per column.
#put column names to plot in a vector
col_names <- names(df)[1:6]
lapply(col_names, function(x) {
#opens device to store pdf
pdf(paste0(x,'.pdf'))
#plots map
mapCountryData(spdf, nameColumnToPlot=x)
#closes created pdf
dev.off()
})

Related

ggraph edges are connecting wrong?

I am working on generating a hierarchical edge plot where the edge's color/transparency/thickness varies by the column (pvalue) in my connect dataframe, however the color/transparency/thickness of the edges in the plot I generated don't always map to the values in column (pvalue). For example, subgroup1 and subgroup4 should have the strongest thickest connection (pvalue is E-280), when in fact they don't, rather the connection between subgroup3 and subgroup4 looks to be strongest.
This data generates a reproducible example:
> dput(vertices)
structure(list(name = structure(c(3L, 1L, 2L, 4L, 5L, 6L, 7L), .Label = c("gp1",
"gp2", "origin", "subgroup1", "subgroup2", "subgroup3", "subgroup4"
), class = "factor"), id = c(NA, NA, NA, 1L, 2L, 3L, 4L), angle = c(NA,
NA, NA, 0, -90, 0, -90), hjust = c(NA, NA, NA, 1, 1, 1, 1)), row.names = c(NA,
-7L), class = "data.frame")
> dput(hierarchy)
structure(list(from = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label = c("gp1",
"gp2", "origin"), class = "factor"), to = structure(1:6, .Label = c("gp1",
"gp2", "subgroup1", "subgroup2", "subgroup3", "subgroup4"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
> dput(connect)
structure(list(from = structure(c(1L, 1L, 2L, 3L, 1L, 2L, 3L,
1L), .Label = c("subgroup1", "subgroup2", "subgroup3"), class = "factor"),
to = structure(c(1L, 2L, 2L, 1L, 3L, 3L, 3L, 3L), .Label = c("subgroup2",
"subgroup3", "subgroup4"), class = "factor"), pvalue = c(1.68e-204,
1.59e-121, 9.32e-73, 9.32e-73, 1.59e-21, 9.32e-50, 9.32e-40,
9.32e-280)), class = "data.frame", row.names = c(NA, -8L))
and this is the code I used to make this example plot:
from <- match( connect$from, vertices$name)
to <- match( connect$to, vertices$name)
col <- connect$pvalue
#Let's add information concerning the label we are going to add: angle, horizontal adjustement and potential flip
#calculate the ANGLE of the labels
vertices$id <- NA
myleaves <- which(is.na( match(vertices$name, hierarchy$from) ))
nleaves <- length(myleaves)
vertices$id[ myleaves ] <- seq(1:nleaves)
vertices$angle <- 90 - 360 * vertices$id / nleaves
# calculate the alignment of labels: right or left
# If I am on the left part of the plot, my labels have currently an angle < -90
vertices$hjust <- ifelse( vertices$id < 41, 1, 0)
# flip angle BY to make them readable
vertices$angle <- ifelse(vertices$angle < -90, vertices$angle+180, vertices$angle)
mygraph <- graph_from_data_frame( hierarchy, vertices=vertices )
ggraph(mygraph, layout = 'dendrogram', circular = TRUE) +
geom_node_point(aes(filter = leaf, x = x*1.05, y=y*1.05), size = 2, alpha = 0.8) +
geom_conn_bundle(data = get_con(from = from, to = to, col = col), aes(colour=col, alpha = col, width = col)) +
geom_node_text(aes(x = x*1.1, y=y*1.1, filter = leaf, label=name, angle = angle, hjust=hjust), size=3.5, alpha=0.6) +scale_edge_color_continuous(trans = "log",low="red", high="yellow")+ scale_edge_alpha_continuous(trans = "log",range = c(1, 0.1)) +scale_edge_width_continuous(trans = "log", range = c(4, 1))+
theme_void()
I think there is wrong mapping somewhere but I can't figure out where. Thank you so much for your input!
I believe there is a bug in this library. Rearranging the input data by the column of choice (pvalue in my case) in an ascending order helped but did not solve the issue.
connect_new <- arrange(connect, pvalue)
and I found the solution in a github issue submitted by another user. The subgroups within each group need to be ordered alphabetically in the hierarchy and vertices file. In addition, in the connect dataframe, the subgroups need to be ordered following the same order in the hierarchy and vertices file. Thanks to zhuxr11

Calculate moving geometric mean by water sampling station

I need to calculate the moving geometric mean on fecal coliform over time(at each value I want the geomean of that value and the previous 29 values), by individual sampling stations. When I download the data from our database the column headers are:
Station SampleDate FecalColiform
Depending on the growing area there are a few to over a dozen stations.
I tried to adapt some code that I found at HERE:
#File: Fecal
Fecal <- group_by(Fecal, Station) %>%
arrange(SampleDate) %>%
mutate(logres = log10(ResultValue)) %>%
mutate(mgm = stats::filter(logres, rep(1/24, 24), sides =1))
This worked, but the problem is that I don't want the resulting log values. I want just the regular geomean so that I can plot it and everyone can easily understand the values. I tried to somehow sneak the geometric.mean function from the psych package in there I could not make that work.
There are resources for calculating a moving average, and code for calculating geometric mean and I have tried to combine several of them. I can't find an example for moving geometric mean.
Eventually I would like to graph all of geomeans by station similar to the example in the link above.
> dput(ByStationRGMData[1:10,])
structure(list(Station = c(114L, 114L, 114L, 114L, 114L, 114L,
114L, 114L, 114L, 114L), Classification = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(" Approved ", " Conditionally Approved ",
" Prohibited "), class = "factor"), SampleDate = c(19890103L,
19890103L, 19890209L, 19890316L, 19890413L, 19890511L, 19890615L,
19890713L, 19890817L, 19890914L), SWTemp = c(NA, NA, 5L, 8L,
NA, 13L, 15L, 18L, NA, 18L), Salinity = c(NA, NA, 22L, 18L, NA,
26L, 22L, 24L, NA, 32L), FecalColiform = c(180, 49, 2, 17, 7.9,
1.8, 4.5, 11, 33, 1.8), RGM = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
)), .Names = c("Station", "Classification", "SampleDate", "SWTemp",
"Salinity", "FecalColiform", "RGM"), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), vars = list(
Station), drop = TRUE, indices = list(0:9), group_sizes = 10L, biggest_group_size = 10L, labels = structure(list(
Station = 114L), class = "data.frame", row.names = c(NA,
-1L), vars = list(Station), drop = TRUE, .Names = "Station"))
I would also like to add a moving 90th percentile to the dataframe and the graphs. I tried the following:
ByStationRGMData <- RawData %>%
group_by(Station) %>%
arrange(SampleDate) %>%
mutate(RGM = as.numeric(rollapply(FecalColiform, 30, geometric.mean, fill=NA, align="right"))) +
mutate(F90 = as.numeric(rollapply(FecalColiform, 30, quantile, p=0.90, fill=NA, align="right")))
This gives me the error:
Error in mutate_(.data, .dots = lazyeval::lazy_dots(...)) : argument ".data" is missing, with no default
I can't seem to figure out what I'm missing.
You can use rollapply from the zoo package (illustrated here using the built-in mtcars data frame). I've used a window of 3 values, but you can set that to 30 in your actual data. align="left" uses the current value and n-1 previous values, where n is the window width:
library(psych)
library(dplyr)
library(zoo)
mtcars %>%
mutate(mpgGM = rollapply(mpg, 3, geometric.mean, fill=NA, align="left"))
Include a grouping variable to get rolling geometric means separately for each group.

Excel Dates and R?

I have a short data frame I randomly created to have a practice before it gets to Big Data frames. I made it with the same Variables as the original should be but way shorter.
The problem I'm having is that Excel takes dates with month first, so R is confused and it's putting 10/1/2015 first. When it's supposed to be last.
What can I do so R correctly orders the dates?
Also I want to for example calculate the Total amount of money (Data$Total) that I made in one month.
What would be the script for that?
Also if I'm already here I could kill two birds with one stone. I know there is already an answer for this, but the answer I saw involves using Direct.labels package that completely messes up with the whole graphic.
What would you advise to prevent the labels going over the plot
margin?
DPUT()
dput(Data)
structure(list(JOB = structure(c(2L, 3L, 1L, 3L, 3L), .Label = c("JAGER",
"PLAY", "RUGBY"), class = "factor"), AGENCY = structure(c(1L,
1L, 2L, 1L, 1L), .Label = c("LONDON", "WILHEL"), class = "factor"),
DATE = structure(c(4L, 5L, 1L, 2L, 3L), .Label = c("10/1/2015",
"10/3/2015", "10/9/2015", "9/24/2015", "9/26/2015"), class = "factor"),
RATE = c(90L, 90L, 100L, 90L, 90L), HS = c(8L, 6L, 4L, 6L,
4L), TOTAL = c(720L, 540L, 400L, 540L, 360L)), .Names = c("JOB",
"AGENCY", "DATE", "RATE", "HS", "TOTAL"), class = "data.frame", row.names = c(NA,
-5L))
Here is how I went about what you're after:
rugger is the dataset I constructed from your dput()
plot(order(as.Date(rugger$DATE,"%m/%d/%Y")),rugger$TOTAL,xaxt="n",xlab="",ylab="Total")
labs <- as.Date(rugger$DATE,"%m/%d/%Y")
axis(side = 1,at = rugger$DATE,labels = rep("",5))
text(cex=1, x=order(as.Date(rugger$DATE,"%m/%d/%Y"))+0.1, y=min(rugger$TOTAL)-25, labs, xpd=TRUE, srt=45, pos=2)
The text call allows you to manipulate the labels far more, srt is a rotation call. I used order() to put the days in chronological order, this will also turn them into the numbers that represent those Dates as ordered Dates appeared to be managed as factors (I'm not positive on that, it's just what I'm seeing).
If you don't want dots check out the pch argument within plot(). Pch types.

importing data from excel to R via psych::read.clipboard

I am trying to streamline a process by which I select and copy two columns from an excel worksheet and import them into R, where I further subset them. Here is my issue:
The excel data has multiple sets of data in the same column. So for example: column 1 is [V,1,2,3,4,V,1,2,3,4] and column two is [A,2,4,6,10,A,3,6,9,12] where V and A are the column headers. I tried copying the two relevant columns, then running the following code in R:
testing<-read.clipboard(header=TRUE, sep=" ")
testinga<-testing[1:4,]
the resulting table looks fine, but when plotted in ggplot
ggplot(testing, aes(V,A))+geom_point()
resulting graphs orders my data points by the first number (i.e. the 10 is plotted as a 1)
This is NOT an issue if I simply copy the first data set and import it using read.clipboard
What is going on here, and how do I get around it?
Edit:
# from dput()
testing <- structure(list(V = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), .Label = c("1", "2", "3", "4", "V"), class = "factor"), A = structure(c(3L, 5L, 6L, 1L, 8L, 4L, 6L, 7L, 2L), .Label = c("10", "12", "2", "3", "4", "6", "9", "A"), class = "factor")), .Names = c("V", "A"), class = "data.frame", row.names = c(NA, -9L))
Your problem is that the big data.frame's columns get converted to factors (not numerics) if there are things other than numbers in them, like more column names. You just need to convert back to numeric.
testinga <- testing[1:4, ]
testinga <- sapply(testinga, FUN = function(x){as.numeric(as.character(x))})
Then you should be able to plot just fine.

Nomogram plot using ggplot

I am trying to reproduce Naive Bayes nomogram as given in Nomograms for Visualization of Naive Bayesian Classifier by Mozina. It is a great visualization for looking at Bayes probabilities. I have been searching and trying various things, but no luck. (I am unable to put all the points on one row for a column.) I've computed probabilities and put them in a data frame called df
structure(list(.id = c("outlook", "outlook", "outlook", "windy",
"windy"), variablevalue = structure(c(1L, 2L, 3L, 5L, 6L), .Label = c("sunny",
"overcast", "rainy", "'All'", "FALSE", "TRUE"), class = "factor"),
prob = c(0.222222222222222, 0.444444444444444, 0.333333333333333,
0.666666666666667, 0.333333333333333)), .Names = c(".id",
"variablevalue", "prob"), row.names = c(1L, 3L, 5L, 11L, 13L), class = "data.frame")
Here's how the chart would like (this chart is all cut and paste):
Does this work?
ggplot(df, aes(prob,.id,label=variablevalue)) +
geom_text() +
xlim(c(0,1))

Resources