How to design frequency polygons in R - r

I'm trying to plot frequency polygons based on the following: vector
x: c(48, 30, 35, 31, 21, 28, 34, 43, 36, 45, ,41, 33, 47, 47, 30, 47, 44, 45, 32, 46, 47, 23, 30, 23, 49, 20, 24, 20, 40, 50)
And the sample command is:
plot(x, y, type = "b", main = "DoThi", sub = "X", xlab = "Tuoi", ylab = "TS")
The difficulty is that I can't figure out how the variable y comes into being. Can anyone help me create a variable y to look like the picture. Thanks
[1

Picking up on #Berhard's comment that you may be looking for the frequency, i.e. count of the values in the vector x; and adding trillion units of measure to the x axis:
Alternatively if you don't want scientific notation but text try xlab = "Tuoi [Trillions]"
x <- c(48, 30, 35, 31, 21, 28, 34, 43, 36, 45, 41, 33, 47, 47, 30, 47, 44, 45, 32, 46, 47, 23, 30, 23, 49, 20, 24, 20, 40, 50)
#frequency count for the x vector
df <- data.frame(table(x))
# create a dataframe which includes the complete integer sequence
# between minimum and maximum values which will be merged with the
# original data. The merge create `NAs` where the original data has
# missing `x` values. After the merge `NAs` are substituted by 0.
df1 <- data.frame(x = min(x):max(x))
# add frequency of 0 for missing integer values within the x vector range
df <- merge(df1, df, all = TRUE)
df$Freq[is.na(df$Freq)] <- 0
plot(df, type = "l", main = "DoThi", xlab = expression(Tuoi~"["*x*10^{12}*"]"), ylab = "TS", col = "red")
Created on 2021-09-16 by the reprex package (v2.0.0)

A tidyverse approach
library(tidyverse)
x <- c(48, 30, 35, 31, 21, 28, 34, 43, 36, 45, 41, 33, 47, 47, 30, 47, 44, 45, 32, 46, 47, 23, 30, 23, 49, 20, 24, 20, 40, 50)
id <- 1:length(x)
df <-
tibble(
x = x,
id = id
)
df %>%
ggplot(aes(id,x))+
geom_line(col = "red")

I understand the question in a different way then #Peter. I understand y as being the counts of x as in
x <- c(48, 30, 35, 31, 21, 28, 34, 43, 36, 45, 41, 33, 47, 47, 30,
47, 44, 45, 32, 46, 47, 23, 30, 23, 49, 20, 24, 20, 40, 50)
x_coord <- sort(unique(x))
y_coord <- as.integer(table(x))
plot(x_coord, y_coord, type = "b", ylim = c(0,5))

Related

Predict xgboost model onto raster stack yields error

I am using an xgboost model to predict onto a raster stack. I have successfully used the same approach with CART, xgb and Random Forest models:
library(raster)
# create a RasterStack or RasterBrick with with a set of predictor layers
logo <- brick(system.file("external/rlogo.grd", package="raster"))
names(logo)
# known presence and absence points
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
# extract values for points
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
xgb <- xgboost(data = data.matrix(subset(v, select = -c(pa))), label = v$pa,
nrounds = 5)
raster::predict(model = xgb, logo)
But with xgboost I get the following error:
Error in xgb.DMatrix(newdata, missing = missing) :
xgb.DMatrix does not support construction from list
The problem is that predict.xgb.Booster does not accept a data.frame for argument newdata (see ?predict.xgb.Booster). That is unexpected (all common predict.* methods take a data.frame), but we can work around it. I show how to do that below, using the "terra" package instead of the obsolete "raster" package (but the solution is exactly the same for either package).
The example data
library(terra)
library(xgboost)
logo <- rast(system.file("ex/logo.tif", package="terra"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- extract(logo, xy[,2:3])
xgb <- xgboost(data = data.matrix(v), label=xy[,1], nrounds = 5)
The work-around is to write a prediction function that first coerces the data.frame with "new data" to a matrix. We can use that function with predict<SpatRaster>
xgbpred <- function(model, data, ...) {
predict(model, newdata=as.matrix(data), ...)
}
p <- predict(logo, model=xgb, fun=xgbpred)
plot(p)

How to save plots in list as jpeg using lapply in R?

I have a list of 10 plots/graphs from model_list for which I used the following code below. I stored these plots in the list var_list.
library(mixOmics)
var_list<-lapply(model_list, function(x) plotVar(x))
var_list contains thus 10 plots, for example below the first element of the list:
> var_list[[1]]
x y Block names pch cex col font Overlap
TPI200 -0.6975577 -0.5582925 X TPI200 1 5 #388ECC 1 Correlation Circle Plots
TPI350 -0.8561514 -0.4101970 X TPI350 1 5 #388ECC 1 Correlation Circle Plots
TPI500 -0.9403552 -0.1074518 X TPI500 1 5 #388ECC 1 Correlation Circle Plots
TPI700 -0.9256605 0.3070954 X TPI700 1 5 #388ECC 1 Correlation Circle Plots
TPI900 -0.8697037 0.4699423 X TPI900 1 5 #388ECC 1 Correlation Circle Plots
I want to save these plots from this list as a jpeg (resulting in 10 different jpeg's). I used the following code and R creates 10 images, but all the images are the same (so only the first plot is created and duplicated for the rest).
lapply(1:length(model_list), function (x) {
jpeg(paste0(names(model_list)[x], ".jpg"))
lapply(model_list, function(x) plotVar(x))
dev.off()
})
I have seen similar questions, but I can't find the right solution to have a jpg for each plot for each dataframe in the list! How can I solve this? Many thanks in advance!
Via this link you can find the dput(model_list[[1]]).
With data provided in a similar post by you, here a possible solution to your issue. It is better if you work around model_list because when you transform to var_list all data become graphical elements. Next code contains a replicate of model_list using datalist but in your real problem you must have it, also must include names for each of the components of the list:
library(mixOmics)
#Data
datalist <- list(df1 = structure(list(OID = c(-1, -1, -1, -1, -1, -1), POINTID = c(1,
2, 3, 4, 5, 6), WETLAND = c("no wetl", "no wetl", "no wetl",
"wetl", "wetl", "wetl"), TPI200 = c(70, 37, 45, 46, 58, 56),
TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55, 35, 45,
51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46),
TPI2000 = c(53, 17, 53, 54, 54, 54), TPI3000 = c(47, 35,
47, 47, 47, 47), TPI4000 = c(49, 49, 49, 49, 49, 49), TPI5000 = c(63,
63, 63, 62, 62, 61), TPI2500 = c(48, 26, 48, 49, 49, 49)), row.names = c(NA,
6L), class = "data.frame"), df2 = structure(list(OID = c(-1,
-1, -1, -1, -1, -1), POINTID = c(1, 2, 3, 4, 5, 6), WETLAND = c("no wetl",
"no wetl", "no wetl", "wetl", "wetl", "wetl"), TPI200 = c(70,
37, 45, 46, 58, 56), TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55,
35, 45, 51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46), TPI2000 = c(53,
17, 53, 54, 54, 54), TPI3000 = c(47, 35, 47, 47, 47, 47), TPI4000 = c(49,
49, 49, 49, 49, 49), TPI5000 = c(63, 63, 63, 62, 62, 61), TPI2500 = c(48,
26, 48, 49, 49, 49)), row.names = c(NA, 6L), class = "data.frame"))
#Function
custom_splsda <- function(datalist, ncomp, keepX, ..., Xcols, Ycol){
Y <- datalist[[Ycol]]
X <- datalist[Xcols]
res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
res
}
#Create model_list, you must have the object
model_list <- lapply(datalist, custom_splsda,
ncomp = 2, keepX = c(5, 5),
Xcols = 4:8, Ycol = "WETLAND")
Next the loop for plots:
#Loop
for(i in 1:length(model_list))
{
jpeg(paste0(names(model_list)[i], ".jpg"))
plotVar(model_list[[i]],title = names(model_list)[i])
dev.off()
}
That will produce plots in your folder as you can see here:
And also the plots that change (see titles):

add NA for a value based on a condition, with tidyverse only, R [duplicate]

This question already has an answer here:
Recode/replace variables conditionally with R dyplyr?
(1 answer)
Closed 2 years ago.
I have age variable with very odd numbers such as 1000, 6666. Now obviously this data is bad for any analysis. I want to keep the obvious age, but want to replace weird numbers with NA. For example, 0, 1,2,3 4, ... 100, I shall keep. But from >100 I want to put them as NA. Yet, want this only with tidyverse. I looked int several functions like na_if for example but cannot achieve what I want.
This is an example of data I have. Look at row 66 and you will see what I am talking about.
age_dput <- structure(list(Age = c(63, 19, 23, 28, 40, 31, 60, 26, 35, 44,
30, 47, 26, 45, 21, 38, 40, 28, 26, 40, 60, 33, 72, 40, 32, 32,
43, 24, 25, 39, 50, 22, 37, 53, 51, 42, 52, 29, 19, 42, 58, 61,
29, 26, 45, 29, 20, 26, 28, 43, 2, 42, 40, 33, 43, 53, 55, 27,
36, 41, 30, 54, 55, 6222, 21, 26, 38, 23, 48, 29, 44, 42, 35,
27, 28, 20, 59, 80, 35, 36, 24, 29, 34, 31, 25, 37, 30, 31, 48,
28, 30, 65, 45, 27, 39, 29, 34, 29, 76, 40)), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
row = c(2910L, 35958L), col = c("how_unwell", "how_unwell"
), expected = c("a double", "a double"), actual = c("How Unwell",
"How Unwell"), file = c("'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'",
"'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'"
)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)))
You can use replace or if_else :
library(dplyr)
age_dput %>%
mutate(clean_age_replace = replace(Age, Age > 100, NA_real_),
clean_age_if_else = if_else(Age > 100, NA_real_, Age))
Using na_if():
library(dplyr)
age_dput %>%
mutate(Age = na_if(Age, Age[Age > 100]))

Random Forest class probabilities in seperate raster layers

I'm using the randomForest package to classify a raster stack of different predictors. Classification works fine, but I also want to retrieve the class probabilities. With my code I only get a RasterLayer with the probability of the first class, but I'd like to get a RasterStack with the class probabilities for each class in one layer.
PRED_train$response <- as.factor(PRED_train$response)
rf <- randomForest(response~., data = PRED_train, na.action = na.omit, confusion = T)
pred_RF <- raster::predict(PRED,rf,)
beginCluster()
pred_RF <- clusterR(PRED, predict, args = list(rf,type="prob"))
endCluster()
The first place to look should be ?raster::predict; which has an example that shows how to do that. Here it is:
library(raster)
logo <- brick(system.file("external/rlogo.grd", package="raster"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
v$pa <- as.factor(v$pa)
library(randomForest)
rfmod <- randomForest(pa ~., data=v)
rp <- predict(logo, rfmod, type='prob', index=1:2)
spplot(rp)

Add vertical lines to time-series plot

I have the code below which plots two time-series. I'd like to add a vertical line every say 10 units on the x-axis to basically divide the plot up into like 5 squares. Any tips are very much appreciated.
Code:
## Plot Forecast & Actual
ts.plot(ts(CompareDf$stuff1),ts(CompareDf$stuff2),col=1:2,xlab="Hour",ylab="Minu tes",main='testVar')
legend("topleft", legend = c("Actual","Forecast"), col = 1:2, lty = 1)
Data:
dput(CompareDf)
structure(list(stuff1 = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55), stuff2 = c(8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)), .Names = c("stuff1",
"stuff2"), row.names = c(NA, -50L), class = "data.frame")
After plotting timeseries data, use abline to draw vertical lines.
abline(v = seq(10, 50, 10))

Resources