Optimizing geom_point on top of expanded geom_raster background - r

I'm trying to visualize how a neural network separates a simple 2 dimension points into 2 classes. I use geom_point to denote the training points and geom_raster to denote how the neural network separates the 2D space. Here's the functions and some of the data points plotted.
library(tidyverse)
library(neuralnet)
data2 <- structure(list(X1 = c(152, 178, 19, 101, 145, 184), x = c(32.4083268723916,
84.5016641449183, 114.483315175202, 51.914560098842, 79.6402378017537,
82.6861507166177), y = c(18.339864264708, 83.42093185056, 63.2843023451388,
55.7215069333086, 42.6517407153766, 86.5805756277405), label = structure(c(2L,
1L, 1L, 2L, 2L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(152L,
178L, 19L, 101L, 145L, 184L), class = "data.frame")
nn.model <- neuralnet(label~x+y, data2, hidden=4, linear.output=FALSE)
background <- expand_grid(x=seq(-40,120,0.1), y=seq(0,100,0.1))
background$label <- predict(nn.model, background) %>% apply(1, which.max)
ggplot()+geom_raster(data=background, aes(x, y, fill=label))+geom_point(data=data2, aes(x, y, color=label))+scale_color_manual(values=c("white","red"))
In the original dataset, the points lie in x range (-40, 120) and y range (0, 100); therefore the background expands accordingly. This approach, of course, takes some time because R will need to have the neural network predict some 1600 x 1000 points and then render them on the geom_raster layer.
My question: is there way to optimize or do this another way in ggplot (or in another package, if this problem is solved well there), as this approach is brute force in geom_rastering the background?

Related

ggplot r: How to Highlight the Data from a Year [duplicate]

This question already has answers here:
geom_smooth on a subset of data
(3 answers)
Closed 3 years ago.
Data: Height was recorded daily
I want to plot the Height of my Plants (Plant A1 - Z50)
in single Plots, and i want to Highlight the current Year.
So i made a Subset of each Plant and a subset for the current year (2018)
Now i need a Plot with the total record an the highlighted Data from 2018
dput(Plant)
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("Plant A1", "Plant B1", "Plant C1"), class = "factor"),
Date = structure(c(1L, 4L, 5L, 7L, 1L, 4L, 6L, 1L, 2L, 3L
), .Label = c(" 2001-01-01", " 2001-01-02", " 2001-01-03",
" 2002-01-01", " 2002-02-01", " 2019-01-01", " 2019-12-31"
), class = "factor"), Height_cm = c(91, 106.1, 107.4, 145.9,
169.1, 192.1, 217.4, 139.8, 140.3, 140.3)), .Names = c("Name",
"Date", "Height_cm"), class = "data.frame", row.names = c(NA,
-10L))
Plant_A1 <- filter(Plant, Name == "Plant A1")
Current_Year <- as.numeric("2018")
Plant_A1_Subset <- filter(Plant_A1, format(Plant_A1$Date, '%Y') == Current_Year)
ggplot(data=Plant_A1,aes(x=Plant_A1$Date, y=Plant_A1$Heigth)) +
geom_point() +
geom_smooth(method="loes", level=0.95, span=1/2, color="red") +
labs(x="Data", y="Height cm")
Now i don't know how to put my new Subset for 2018(Plant_A1_Subset) into this graph.
As noted, this question has a duplicate with an answer in this question.
That said here's likely the most common way of handling your problem.
In ggplot2 future calls inherits any arguments passed into aes of the ggplot(aes(...)) function. Thus the plot will always use these arguments in future ggplot functions, unless one manually overwrites the arguments. However we can solve your problem, by simply adding an extra argument in the aes of geom_point. Below I've illustrated a simple way to achieve what you might be looking for.
Specify the aes argument in individual calls
The first method is likely the most intuitive. aes controls the the plotted parameters. As such if you want to add colour to certain points, one way is to let the aes be individual to the geom_point and geom_smooth argument.
library(ggplot2)
library(lubridate) #for month(), year(), day() functions
current_year <- 2018
ggplot(data = Plant_A1, aes(x = Date, y = Heigth)) +
#Note here, colour set in geom_point
geom_point(aes(col = ifelse(year(Date) == current_year, "Yes", "No"))) +
geom_smooth(method="loess", level=0.95,
span=1/2, color="red") +
labs(x="Data", y="Height cm",
col = "Current year?") #Specify legend title for colour
Note here that i have used the inheritance of the aes argument. Simply put, the aes will check the names within data, and if it can find it, it will use these as variables. So there is no need to specify data$....

Connect two points with a line in R

I have a problem connecting two points with the same y value. My dataset looks like this (I hope the formatting is ok):
attackerip,min,max
125.88.146.123,2016-03-29 17:38:17.949778,2016-03-30 07:28:47.912983
58.218.205.101,2016-04-05 15:53:20.69986,2016-05-12 17:32:08.583255
183.3.202.195,2016-04-05 15:58:27.862509,2016-04-15 18:15:13.117774
58.218.199.166,2016-04-05 16:09:34.448588,2016-04-24 06:02:12.237922
58.218.204.107,2016-04-05 16:57:17.624509,2016-05-31 00:52:44.007908
What I have so far is the following:
mydata = read.csv("timeline.csv", sep=',')
mydata$min <- strptime(as.character(mydata$min), format='%Y-%m-%d %H:%M:%S')
mydata$max <- strptime(as.character(mydata$max), format='%Y-%m-%d %H:%M:%S')
plot(mydata$min, mydata$attackerip, col="red")
points(mydata$max, mydata$attackerip, col="blue")
Which results in:
Now I want to connect the points with the same y-axis value. And can not get lines or abline to work. Thanks in Advance!
EDIT: dput of data
dput(mydata)
structure(list(attackerip = structure(c(1L, 5L, 2L, 3L, 4L), .Label = c("125.88.146.123",
"183.3.202.195", "58.218.199.166", "58.218.204.107", "58.218.205.101"
), class = "factor"), min = structure(1:5, .Label = c("2016-03-29 17:38:17.949778",
"2016-04-05 15:53:20.69986", "2016-04-05 15:58:27.862509", "2016-04-05 16:09:34.448588",
"2016-04-05 16:57:17.624509"), class = "factor"), max = structure(c(1L,
4L, 2L, 3L, 5L), .Label = c("2016-03-30 07:28:47.912983", "2016-04-15 18:15:13.117774",
"2016-04-24 06:02:12.237922", "2016-05-12 17:32:08.583255", "2016-05-31 00:52:44.007908"
), class = "factor")), .Names = c("attackerip", "min", "max"), class = "data.frame", row.names = c(NA,
-5L))
Final Edit:
The reason why plotting lines did not work was, that the datatype of min and max was timestamps. Casting those to numeric values yielded the expected result. Thanks for your help everyone
The lines function should work just fine. However, you will need to call it for every pair (or set) of points that share the same y value. Here is a reproducible example:
# get sets of observations with the same y value
dupeVals <- unique(y[duplicated(y) | duplicated(y, fromLast=T)])
# put the corresponding indices into a list
dupesList <- lapply(dupeVals, function(i) which(y == i))
# scatter plot
plot(x, y)
# plot the lines using sapply
sapply(dupesList, function(i) lines(x[i], y[i]))
This returns
data
set.seed(1234)
x <- sort(5* runif(30))
y <- sample(25, 30, replace=T)
As it appears that you have two separate groups for which you would like draw these lines, the following would be the algorithm:
for each group, (min and max, I believe)
calculate the duplicate values of the y variable
put the indicies of these duplicates into a dupesList (maybe dupesListMin and dupesListMax).
plot the points
run one sapply function on each dupesList.

Lattice xyplot() Adding a different mean trend line to each panel?

I have a simple trellis scatterplot. Two panels - male/female. ID is a unique number for each participant. The var1 is a total test time. Mean.values is a vector of two numbers (the means for gender).
No point including a best fit line so what I want is to plot a trend line of the mean in each panel. The two panels have different means, say male = 1 minute, female = 2 minutes.
xyplot(var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values)
})
At the minute the graph is coming out so that both trendlines appear in each panel. I want only one trendline in each.
Does anyone have the way to do this?
I have tried a number of different ways including the long code for function Addline which just doesn't work for me. I just want to define which panel im looking at and i've looked at ?panel.number but not sure how that works as its coming up that I don't have a current row. (current.row(prefix)).
There must be a simple way of doing this?
[EDIT - Here's the actual data i'm using]
I've tried to simplify the DF
library(lattice)
dput(head(DF))
structure(list(ID = 1:6, Var1 = c(2333858, 4220644,
2941774, 2368496, 3165740, 3630300), mean = c(2412976, 2412976,
2412976, 2412976, 2412976, 2412976), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = c(NA, 6L), class = "data.frame")
dput(tail(DF))
structure(list(ID = 161:166, Var1= c(2825246, 3552170,
3688882, 2487760, 3849108, 3085342), mean = c(3689805, 3689805,
3689805, 3689805, 3689805, 3689805), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = 109:114, class = "data.frame")
plot i'm using:
xyplot((Var1/1000) ~ ID|Gender, data=DF,
group = Gender,scales=list(x=list(at=NULL)),
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values) })
causes 2 lines.
[EDIT - This is the code which includes the function Addline & is everywhere on all the posts and doesn't seem to work for me]
addLine<- function(a=NULL, b=NULL, v = NULL, h = NULL, ..., once=F) { tcL <- trellis.currentLayout() k<-0 for(i in 1:nrow(tcL)) for(j in 1:ncol(tcL)) if (tcL[i,j] > 0) { k<-k+1 trellis.focus("panel", j, i, highlight = FALSE) if (once) panel.abline(a=a[k], b=b[k], v=v[k], h=h[k], ...) else panel.abline(a=a,b=b, v=v, h=h, ...) trellis.unfocus() } }
then writing after the trellis plot (mean.values being a vector of two numbers, mean for female, mean for male)
addLine(v=(mean.values), once=TRUE)
Update - I managed to do it in ggplot2.
Make the ggplot using facet_wrap then -
hline.data <- data.frame(z = c(2413, 3690), Gender = c("Female","Male"))
This creates a DF of the two means and the Gender, 2x2 DF
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
This adds the lines to the ggplot.
If you just wanted plot the mean of values you are drawing on the plot aready, you can skip the mean.values variable and just do
xyplot(Var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.abline(h=mean(y))
}
)
With the sample data
DF<-data.frame(
ID=1:10,
Gender=rep(c("M","F"), each=5),
Var1=c(5,6,7,6,5,8,9,10,8,9)
)
this produces
I believe lattice has a specific panel function for this, panel.average().
Try replacing panel.abline(h=mean.values) with panel.average(...).
If that doesn't solve the problem, we might need more information; try using dput() on your data (e.g., dput(DF), or some representative subset).

changing strip's color in lattice multipanel plot with 2 (or possibly more) factors

I've checked quite extensively through the forum and on the web but I couldn't find anyone that already presented my case, so here you are the question:
my goal: how can I extend the example presented here in case I have more than one conditioning factor?
I've tried several ways to modify the which.panel variable of strip.default function, but I couldn't come out of my problem.
This is the code I'm using at the moment (with comments):
if (!require("plyr","lattice")) install.packages("plyr","lattice")
require("plyr")
require("lattice")
# dataframe structure (8 obs. of 6 variables)
data2 <- structure(list(
COD = structure(c(1L, 1L, 1L, 1L, 2L, 2L,2L, 2L),
.Label = c("A", "B"), class = "factor"),
SPEC = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
.Label = c("15/25-(15/06)", "15/26-(22/06)"), class = "factor"),
DATE = structure(c(16589, 16590, 16589, 16590, 16589, 16590, 16589, 16590), class = "Date"),
PM.BDG = c(1111.25, 1111.25, 1141.29, 1141.29, 671.26, 671.26, 707.99, 707.99),
PM = c(1033.14, 1038.4, 1181.48, 1181.48, 616.39, 616.39, 641.55, 641.55),
DELTA.PM = c(-78.12, -72.85, 40.19, 40.19, -54.87, -54.87, -66.44, -66.44)),
.Names = c("COD", "SPEC", "DATE", "PM.BDG", "PM", "DELTA.PM"),
row.names = c(NA, 8L), class = "data.frame")
# create a dataframe with a vector of colors
# based on the value of DELTA.PM for the last
# date available for each combination of COD and SPEC.
# Each color will be used for a specific panel, and it will
# forestgreen if DELTA.PM is higher than zero, red otherwise.
listaPM <- ddply(data2, .(COD,SPEC), summarize, ifelse(DELTA.PM[DATE=="2015-06-04"]<0, "red", "forestgreen"))
names(listaPM) <- c("COD","SPEC","COLOR")
# set a personalized strip, with bg color based on listaPM$COLOR
# and text based on listaPM$COD and listaPM$SPEC
myStripStylePM <- function(which.panel, factor.levels, ...) {
panel.rect(0, 0, 1, 1,
col = listaPM[which.panel,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[which.panel,1],listaPM[which.panel,2], sep=" - "),
col = "white")}
# prepare a xyplot function to plot that will be used later with dlply.
# Here I want to plot the values of PM.BDG and PM over time (DATE),
# conditioning them on the SPEC (week) and COD (code) factors.
graficoPM <- function(df) {
xyplot (PM.BDG + PM ~ DATE | SPEC + COD,
data=df,
type=c("l","g"),
col=c("black", "red"),
abline=c(h=0,v=0),
strip = myStripStylePM
)}
# create a trellis object that has a list of plots,
# based on different COD (codes)
grafico.PM <- dlply(data2, .(data2$COD), graficoPM)
# graphic output, 1st row should be COD "A",
# 2nd row should be COD "B", each panel is a different SPEC (week)
par(mfrow=c(2,1))
print(grafico.PM[[1]], position=c(0,0.5,1,1), more=TRUE)
print(grafico.PM[[2]], position=c(0,0,1,0.5))
As you can see, the first row of plots is correct: text of the first strip is "A" (1st COD), the weeks (SPEC) are shown and the color represents if PM is above or below PM.BDG on the last date of the plot
On the contrary, the 2nd row of plots just repeats the same scheme of the first row (as it can be seen by the fact that COD is Always "A" and 2nd strip's bg color in the 2nd row is green, when the line of PM in red is clearly well below the PM.BDG line in black).
Although I'd like to keep my code, I'm pretty sure my goal could be achieved with a different strategy. If you can find a better way to use my dataframe, I'll be happy to study the code and see if it works with my data.
The problem is match up the current panel data to the listaPM data. Because you are doing different sub-setting in each of the calls, it's difficult to use which.panel() to match up the data sets.
There is an undocumented feature which allows you to get the conditioning variable names to make the matching more robust. Here's how you would use it in your case.
myStripStylePM <- function(which.panel, factor.levels, ...) {
cp <- dimnames(trellis.last.object())
ci <- arrayInd(packet.number(), .dim=sapply(cp, length))
cv <- mapply(function(a,b) a[b], cp, as.vector(ci))
idx<-which(apply(mapply(function(n, v) listaPM[, n] == v, names(cv), cv),1,all))
stopifnot(length(idx)==1)
panel.rect(0, 0, 1, 1,
col = listaPM[idx,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[idx,1],listaPM[idx,2], sep=" - "),
col = "white")
}
When run with the rest of your code, it produces this plot

Inverse probability weights in r

I'm trying to apply inverse probability weights to a regression, but lm() only uses analytic weights. This is part of a replication I'm working on where the original author is using pweight in Stata, but I'm trying to replicate it in R. The analytic weights are providing lower standard errors which is causing problems with some of my variable being significance.
I've tried looking at the survey package, but am not sure how to prepare a survey object for use with svyglm(). Is this the approach I want, or is there an easier way to apply inverse probability weights?
dput :
data <- structure(list(lexptot = c(9.1595012302023, 9.86330744180814,
8.92372556833205, 8.58202430280175, 10.1133857229336), progvillm = c(1L,
1L, 1L, 1L, 0L), sexhead = c(1L, 1L, 0L, 1L, 1L), agehead = c(79L,
43L, 52L, 48L, 35L), weight = c(1.04273509979248, 1.01139605045319,
1.01139605045319, 1.01139605045319, 0.76305216550827)), .Names = c("lexptot",
"progvillm", "sexhead", "agehead", "weight"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
Linear Model (using analytic weights)
prog.lm <- lm(lexptot ~ progvillm + sexhead + agehead, data = data, weight = weight)
summary(prog.lm)
Alright, so I figured it out and thought I would update the post incase others were trying to figure it out. It's actually pretty straightforward.
data$X <- 1:nrow(data)
des1 <- svydesign(id = ~X, weights = ~weight, data = data)
prog.lm <- svyglm(lexptot ~ progvillm + sexhead + agehead, design=des1)
summary(prog.lm)
Standard errors are now correct.

Resources