I have a data frame which consists of 4 variables A, B, C, D, I need to plot 3D surface plot for x = A, y = B, z = C. My problem is the plot should contain 3 surfaces which is with respect to the values in D i.e D has values of 0,1 and -1 I need to have 3 surfaces for 3 different values of D.
I tried by sub setting the data frame into 3 different dataframes with respect to the values of D and adding surface to plot_ly function but it doesnt seem to work and I am getting blank graph. I don't know if I am using the right plot function.
Below is my data frame d1
A B C D
734.5 2.28125 3.363312755 0
738 2.53125 3.395864326 0
727.25 2.484375 3.41183431 1
737 2.421875 3.380499188 1
727.25 2.3828125 3.39538442 1
933.25 4.6875 3.148660474 1
932.75 4.671875 3.155840809 1
934 4.671875 3.165391107 1
920.75 4.671875 3.194808475 1
913.25 4.671875 3.22907393 1
896.75 4.671875 3.287157844 1
880 4.671875 3.341203642 -1
866.75 4.59375 3.388017143 -1
714.5 3.296875 3.572828317 -1
730.75 3.296875 3.535364241 -1
734.75 3.296875 3.526142314 -1
713.25 3.7734375 3.653888449 -1
711.75 3.8203125 3.665152882 -1
711.75 3.8125 3.65967422 -1
714 3.796875 3.630867839 0
754.25 3.796875 3.560165628 0
715.25 3.78125 3.650415301 0
Below is my R code
library(plotly)
pd1 <- subset(d1, (D == 1))
nd1 <- subset(d1, (D == -1))
zd1 <- subset(d1, (D == 0))
p <- plot_ly(showscale = FALSE) %>%
add_surface(x= pd1$A,y = pd1$B, z = pd1$C)%>%
add_surface(x= nd1$A,y =nd1$B, z = nd1$C)%>%
add_surface(x= zd1$A,y =zd1$B, z = zd1$C)%>%
I think you have to pass a numeric matrix as argument to add_surface.
pd1_ma <- as.matrix(pd1)
nd1_ma <- as.matrix(nd1)
zd1_ma <- as.matrix(zd1)
p <- plot_ly(showscale = FALSE) %>%
add_surface(z = ~pd1_ma) %>%
add_surface(z = ~nd1_ma, opacity = 0.98) %>%
add_surface(z = ~zd1_ma, opacity = 0.98)
p
That was working for me.
Related
I am trying to understand the difference between exact and weights argument in the extract function in terra package. My ultimate goal is to do a weighted average of raster within a polygon
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
v <- v[1:2,]
z <- rast(v, resolution=.1, names="test")
values(z) <- 1:ncell(z)
rf <- system.file("ex/elev.tif", package="terra")
x <- rast(rf)
head(terra::extract(x, v, weight = T, touches = T))
ID elevation weight
1 1 NA 0.08
2 1 NA 0.17
3 1 529 0.30
4 1 542 0.52
5 1 547 0.74
6 1 535 0.18
head(terra::extract(x, v, exact = T, touches = T))
ID elevation fraction
1 1 NA 0.03471857
2 1 NA 0.11553771
3 1 529 0.23902885
4 1 542 0.45706171
5 1 547 0.68120503
6 1 535 0.12033581
Why the weight and fraction are different in both cases? When I try to compute the weighted mean:
terra::extract(x, v, fun = mean, weights = T, touches = T, na.rm = T)
ID elevation
1 1 NaN
2 2 NaN
terra::extract(x, v, fun = mean, exact = T, touches = T, na.rm = T)
ID elevation
1 1 467.3792
2 2 334.6856
Why does the first gives me NaN while the 2nd gives me some values. If I calculate it manually, I can see the weighted mean is what the exact = T is giving me so what is the use of weights argument?
test_df <- terra::extract(x, v, exact = T, touches = T, na.rm = T)
test_df %>%
dplyr::group_by(ID) %>%
dplyr::summarise(wt_mean = weighted.mean(elevation, fraction, na.rm = T))
ID wt_mean
1 467.
2 335.
In my version of "terra" the results are very similar.
library(terra)
#terra 1.6.53
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
v <- v[1:2,]
z <- rast(v, resolution=.1, names="test")
values(z) <- 1:ncell(z)
rf <- system.file("ex/elev.tif", package="terra")
x <- rast(rf)
terra::extract(x, v, fun = mean, weights = T, touches = T, na.rm = T)
# ID elevation
#1 1 467.3934
#2 2 334.6551
terra::extract(x, v, fun = mean, exact = T, touches = T, na.rm = T)
# ID elevation
#1 1 467.3792
#2 2 334.6856
The documentation says that with "weights", "the approximate fraction of each cell" is used whereas with "exact", "the exact fraction" is used.
The reason for having both is in part because the argument "weights" predates the argument "exact". "weights" was kept because it could be faster and close enough in most cases.
P.S., you can also do
exactextractr::exact_extract(x, st_as_sf(v), "mean")
#[1] 467.3792 334.6855
I'm working to implement a lpSolve solution to optimizing a hypothetical daily fantasy baseball problem. I'm having trouble applying my last constraint:
position - Exactly 3 outfielders (OF) 2 pitchers (P) and 1 of everything else
cost - Cost less than 200
team - Max number from any one team is 6
team - Minimum number of teams on a roster is 3**
Say for example you have a dataframe of 1000 players with points, cost, position, and team and you're trying to maximize average points:
library(tidyverse)
library(lpSolve)
set.seed(123)
df <- data_frame(avg_points = sample(5:45,1000, replace = T),
cost = sample(3:45,1000, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),1000, replace = T),
team = sample(LETTERS,1000, replace = T)) %>% mutate(id = row_number())
head(df)
# A tibble: 6 x 5
# avg_points cost position team id
# <int> <int> <chr> <chr> <int>
#1 17 13 2B Y 1
#2 39 45 1B P 2
#3 29 33 1B C 3
#4 38 31 2B V 4
#5 17 13 P A 5
#6 10 6 SS V 6
I've implemented the first 3 constraints with the following code, but i'm having trouble figuring out how to implement the minimum number of teams on a roster. I think I need to add additional variable to the model, but i'm not sure how to do that.
#set the objective function (what we want to maximize)
obj <- df$avg_points
# set the constraint rows.
con <- rbind(t(model.matrix(~ position + 0,df)), cost = df$cost, t(model.matrix(~ team + 0, df)) )
#set the constraint values
rhs <- c(1,1,1,1,3,2,1, # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
200, # 2. at a cost less than 200
rep(6,26) # 3. max number from any team is 6
)
#set the direction of the constraints
dir <- c("=","=","=","=","=","=","=","<=",rep("<=",26))
result <- lp("max",obj,con,dir,rhs,all.bin = TRUE)
If it helps, i'm trying to replicate This paper (with minor tweaks) which has corresponding julia code here
This might be a solution for your problem.
This is the data I have used (identical to yours):
library(tidyverse)
library(lpSolve)
N <- 1000
set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
cost = sample(3:45,N, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
team = sample(LETTERS,N, replace = T)) %>%
mutate(id = row_number())
You want to find x1...xn that maximise the objective function below:
x1 * average_points1 + x2 * average_points1 + ... + xn * average_pointsn
With the way lpSolve works, you will need to express every LHS as the sum over
x1...xn times the vector you provide.
Since you cannot express the number of teams with your current variables, you can introduce new ones (I will call them y1..yn_teams and z1..zn_teams):
# number of teams:
n_teams = length(unique(df$team))
Your new objective function (ys and zs will not influence your overall objective funtion, since the constant is set to 0):
obj <- c(df$avg_points, rep(0, 2 * n_teams))
)
The first 3 constraints are the same, but with the added constants for y and z:
c1 <- t(model.matrix(~ position + 0,df))
c1 <- cbind(c1,
matrix(0, ncol = 2 * n_teams, nrow = nrow(c1)))
c2 = df$cost
c2 <- c(c2, rep(0, 2 * n_teams))
c3 = t(model.matrix(~ team + 0, df))
c3 <- cbind(c3, matrix(0, ncol = 2 * n_teams, nrow = nrow(c3)))
Since you want to have at least 3 teams, you will first use y to count the number of players per team:
This constraint counts the number of players per team. You sum up all players of a team that you have picked and substract the corresponding y variable per team. This should be equal to 0. (diag() creates the identity matrix, we do not worry about z at this point):
# should be x1...xn - y1...n = 0
c4_1 <- cbind(t(model.matrix(~team + 0, df)), # x
-diag(n_teams), # y
matrix(0, ncol = n_teams, nrow = n_teams) # z
) # == 0
Since each y is now the number of players in a team, you can now make sure that z is binary with this constraint:
c4_2 <- cbind(t(model.matrix(~ team + 0, df)), # x1+...+xn ==
-diag(n_teams), # - (y1+...+yn )
diag(n_teams) # z binary
) # <= 1
This is the constraint that ensures that at least 3 teams are picked:
c4_3 <- c(rep(0, nrow(df) + n_teams), # x and y
rep(1, n_teams) # z >= 3
)
You need to make sure that
You can use the big-M method for that to create a constraint, which is:
Or, in a more lpSolve friendly version:
In this case you can use 6 as a value for M, because it is the largest value any y can take:
c4_4 <- cbind(matrix(0, nrow = n_teams, ncol = nrow(df)),
diag(n_teams),
-diag(n_teams) * 6)
This constraint is added to make sure all x are binary:
#all x binary
c5 <- cbind(diag(nrow(df)), # x
matrix(0, ncol = 2 * n_teams, nrow = nrow(df)) # y + z
)
Create the new constraint matrix
con <- rbind(c1,
c2,
c3,
c4_1,
c4_2,
c4_3,
c4_4,
c5)
#set the constraint values
rhs <- c(1,1,1,1,3,2,1, # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
200, # 2. at a cost less than 200
rep(6, n_teams), # 3. max number from any team is 6
rep(0, n_teams), # c4_1
rep(1, n_teams), # c4_2
3, # c4_3,
rep(0, n_teams), #c4_4
rep(1, nrow(df))# c5 binary
)
#set the direction of the constraints
dir <- c(rep("==", 7), # c1
"<=", # c2
rep("<=", n_teams), # c3
rep('==', n_teams), # c4_1
rep('<=', n_teams), # c4_2
'>=', # c4_3
rep('<=', n_teams), # c4_4
rep('<=', nrow(df)) # c5
)
The problem is almost the same, but I am using all.int instead of all.bin to make sure the counts work for the players in the team:
result <- lp("max",obj,con,dir,rhs,all.int = TRUE)
Success: the objective function is 450
roster <- df[result$solution[1:nrow(df)] == 1, ]
roster
# A tibble: 10 x 5
avg_points cost position team id
<int> <int> <chr> <chr> <int>
1 45 19 C I 24
2 45 5 P X 126
3 45 25 OF N 139
4 45 22 3B J 193
5 45 24 2B B 327
6 45 25 OF P 340
7 45 23 P Q 356
8 45 13 OF N 400
9 45 13 SS L 401
10 45 45 1B G 614
If you change your data to
N <- 1000
set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
cost = sample(3:45,N, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
team = sample(c("A", "B"),N, replace = T)) %>%
mutate(id = row_number())
It will now be infeasable, because the number of teams in the data is less then 3.
You can check that it now works:
sort(unique(df$team))[result$solution[1027:1052]==1]
[1] "B" "E" "I" "J" "N" "P" "Q" "X"
sort(unique(roster$team))
[1] "B" "E" "I" "J" "N" "P" "Q" "X"
I am trying to overlay two different length datasets within ggplot.
Dataset 1: dataframe r where m is the date and V2 is the value with a range between -1 to +1:
> r
m V2
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
use these lines to generate r
m<-gsub("-", "/", as.Date(as.character(fileloc$V1), "%Y%m%d"))
r<-cbind(m, fileloc[2])
colnames(r)
r
Dataset 2: The following data sets which defines the recession period in US:
library(quantmod)
getSymbols("USREC",src="FRED")
getSymbols("UNRATE", src="FRED")
unrate.df <- data.frame(date= index(UNRATE),UNRATE$UNRATE)
start <- index(USREC[which(diff(USREC$USREC)==1)])
end <- index(USREC[which(diff(USREC$USREC)==-1)-1])
reccesion.df <- data.frame(start=start, end=end[-1])
recession.df <- subset(reccesion.df, start >= min(unrate.df$date))
The resulting recession.df
> recession.df
start end
1 1948-12-01 1949-10-01
2 1953-08-01 1954-05-01
3 1957-09-01 1958-04-01
.....
11 2008-01-01 2009-06-01
Plotting:
I can generate separate scatter plots with the following:
ggplot(r, aes(V2, r$m, colour=V2))+
geom_point()+xlab(label='Tone Score')+ylab(label='Dates')
and timeseries with shaded region for recession with:
ggplot()+
geom_line(data=unrate.df, aes(x=date, y=UNRATE)) +
geom_rect(data=recession.df,
aes(xmin=start,xmax=end, ymin=0,ymax=max(unrate.df$UNRATE)),
fill="red", alpha=0.2)
How do I merge these plots to see overlay those scatter plot over the time series?
Without you providing the full dataset for the question, I have generated some random data for the dates between the dates 1973/08/21 and 1999/12/21:
set.seed(123)
r <- data.frame(m = seq.Date(as.Date("2017/12/21"), as.Date("1950/08/21"),
length.out = 135),
V2 = rnorm(n = 135, mean = 0, sd = 0.5))
You can overlay multiple layers within a ggplot by adding different a different data and aes arguments for each of the geom_ items you are calling.
ggplot() +
geom_point(data = r, aes(x = m, y = V2, colour=V2))+
geom_line(data=unrate.df, aes(x=date, y=UNRATE)) +
geom_rect(data=recession.df,
aes(xmin=start, xmax=end, ymin=0, ymax=max(unrate.df$UNRATE)),
fill="red", alpha=0.2) +
xlab(label='Tone Score')+ylab(label='Dates')
I have 1 ppp with each point represents a farm. There are 2 marks attached to it.
1) Multitype marks: disease status (0=Not diseased, 1=Diseased) => DS1
2) Numeric Marks: Number of diseased animals => ND1
I don't want to be confused by those marks so I separated them into 2 ppp for each marks
sep_farm <- unstack.ppp(farm)
#Extract 'number of positive animals'from the sep_farm
ND2 <- sep_farm[["ND1"]]
#Extract 'disease status' from the sep_farm
DS2 <- sep_farm[["DS1"]]
I want to find the 1st-nearest diseased and non-diseased farm,
so I use;
n1 <- nnwhich(DS2, k=1, by=marks(DS2))
The problem is that I also want to know the number of diseased animals in each 1st-nearest diseased farm as well.
How could I do that?
Fake data to test with:
library(spatstat)
n <- 10
set.seed(42)
ds <- sample(0:1, n, replace = TRUE)
nd <- rpois(n, 100) * ds
farm <- runifpoint(n)
marks(farm) <- data.frame(DS1 = factor(ds), ND1 = nd)
marks(farm)
#> DS1 ND1
#> 1 1 98
#> 2 1 115
#> 3 0 0
#> 4 1 120
#> 5 1 99
#> 6 1 113
#> 7 1 122
#> 8 0 0
#> 9 1 113
#> 10 1 109
Plot of fake data with number of diseased animals given below each
location
plot(farm, which.marks = "DS1", cols = c("red", "blue"))
text(farm$x, farm$y, labels = nd, pos = 1, col = ifelse(ds==0, "red", "blue"))
Existing code from question:
sep_farm <- unstack.ppp(farm)
ND2 <- sep_farm[["ND1"]]
DS2 <- sep_farm[["DS1"]]
n1 <- nnwhich(DS2, k=1, by=marks(DS2))
Use indices to extract relevant marks
ND_neigh <- marks(ND2)[n1[,2]]
Plot of result with number of diseased animals at nearest infected
farm given above each location (with own disease count below
still)
plot(farm, which.marks = "DS1", cols = c("red", "blue"))
text(farm$x, farm$y, labels = nd, pos = 1)
text(farm$x, farm$y, labels = ND_neigh, pos = 3, col = "green")
I am plotting the joint density of two variables x and y using kde2d and persp.
How can I do the following:
1) Depending on the values of x and y, I want to colour the plot differently. I.e. for values x>.5 and y>.5 the color should be blue and for x<=.5 and y<=.5 the color should be red. Currently the whole graph is just blue (see code below).
2) How can I colour the area below the surface (i.e. the probability mass)? I would like to colour it also with blue and red respectively.
library(MASS)
i<-1
x<-NULL
y<-NULL
while (i<=30) {
x1 <- rnorm(1, 0, 1)
x2 <- rnorm(1, 0, 1)
x <- c(x, x1)
y<-c(y,x2)
i<-i+1
}
den3d <- kde2d(x, y)
persp(den3d,xlab="Var1", ylab="Var2", zlab="Density", shade = 0.1, col = "blue", box=TRUE)
Help is appreciated.
This answers the first question:
x <- rnorm(30)
y <- rnorm(30)
library(MASS)
den3d <- kde2d(x, y)
cols <- c("blue", "red")[(outer(na.omit(filter(den3d$x>0.5,c(1,1), sides=1)),
na.omit(filter(den3d$y>0.5,c(1,1), sides=1)),
"+")==4) + 1]
persp(den3d,xlab="Var1", ylab="Var2", zlab="Density",
shade = 0.1, col = cols, box=TRUE, phi=90)
Note that a facet is only coloured red if its x and y values are > 0.5 in the whole facet range.
I don't think you should do what you describe in the second question. Anyway, persp is not designed for that.
Edit: What is going on there?
den3d$x are the x values for plotting. We need to check for each facet if the x values on both ends are > 0.5. We can do this by calculating the sum of these values. This can be done using filter (see help("filter") for documentation):
filter(den3d$x>0.5,c(1,1), sides=1)
#[1] NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2
Logical input values were automatically coerced to 0 and 1 by filter. Resulting values of0 indicate > 0.5 at none of the ends, of 1 indicate > 0.5 at one end and of 2 indicate > 0.5 at both ends. The NA we need to remove.
We then do this for den3d$y as well and calculate the outer sum, which can result in values between 0 and 4. 0 and 4 mean all four borders of a facet are not and are > 0.5, respectively. Values 1 to 3 indicate 1 to 3 borders are > 0.5. We than set all values of 4 (those we want to color "red") to TRUE and the others to FALSE by using ==. Since we need values of 1 for FALSE and 2 for TRUE in order to subset the color vector, we (automatically) coerce the logical values to 0 and 1 and add 1.
That`s it.