How to loop over multiple groups and create radar plots in R - r

I have the following dataframe:
group
Class
Maths
Science
Name1
7
74
78
Name2
7
80
91
Name3
6
69
80
I want to create different radar plots for the variables Maths and Science for each classes using R. eg: For the above dataframe, two radar plots should be created for two classes 7 and 6.
nrange <- 2
class <- c(7,6)
for (i in nrange){
plot <- ggradar::ggradar(df[i,2:3], values.radar = c(0, 50, 100), group.line.width = 1,
group.point.size = 2, legend.position = "bottom", plot.title=class[i])
}
plot
I using the above code. However, it is only creating the plot for the last row. Please help me with this issue.
Thanks a lot in advance!

You were almost there, but there were two little problems.
The for statement evaluated to for(i in 2) which means it is only using i=2. You can fix this by using for(i in 1:nrange)
You were overwriting plot each time through the loop. If you make plot a list and save each graph as a separate element in the list, then it should work.
mydat <- tibble::tribble(
~group, ~Class, ~Maths, ~Science,
"Name1", 7, 74, 78,
"Name2", 7, 80, 91,
"Name3", 6, 69, 80)
plots <- list()
nrange <- 2
class <- c(7,6)
for (i in 1:3){
plots[[i]] <- ggradar::ggradar(mydat[i,2:4], values.radar = c(0, 50, 100),
grid.max = 100, group.line.width = 1,
group.point.size = 2, legend.position = "bottom", plot.title=mydat$Class[i])
}
plots
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]
Created on 2023-02-03 by the reprex package (v2.0.1)
Putting Together with facet_wrap()
library(dplyr)
library(ggplot2)
mydat <- tibble::tribble(
~group, ~Class, ~Maths, ~Science,
"Name1", 7, 74, 78,
"Name2", 7, 80, 91,
"Name3", 6, 69, 80)
mydat <- mydat %>%
mutate(gp = paste(group, Class, sep=": ")) %>%
select(gp, Maths, Science)
ggradar::ggradar(mydat, values.radar = c(0, 50, 100),
grid.max = 100, group.line.width = 1,
group.point.size = 2, legend.position = "bottom") +
facet_wrap(~gp)
Created on 2023-02-06 by the reprex package (v2.0.1)

Related

Group_by not working, summarize() computing identical values?

I am using the data found here: https://www.kaggle.com/cdc/behavioral-risk-factor-surveillance-system. In my R studio, I have named the csv file, BRFSS2015. Below is the code I am trying to execute. I have created two new columns comparing people who have arthritis vs. people who do not have arthritis (arth and no_arth). Grouping by these variables, I am now trying to find the mean and sd for their weights. The weight variable was generated from another variable in the dataset using this code: (weight = BRFSS2015$WEIGHT2) Below is the code I am trying to run for mean and sd.
BRFSS2015%>%
group_by(arth,no_arth)%>%
summarize(mean_weight=mean(weight),
sd_weight=sd(weight))
I am getting output that says mean and sd for these two groups is identical. I doubt this is correct. Can someone check and tell me why this is happening? The numbers I am getting are:
arth: mean = 733.2044; sd= 2197.377
no_arth: mean= 733.2044; sd= 2197.377
Here is how I created the variables arth and no_arth:
a=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="1")
b=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="2")
as.data.frame(BRFSS2015)
arth=c(a)
no_arth=c(b)
BRFSS2015$arth <- c(arth, rep(NA, nrow(BRFSS2015)-length(arth)))
BRFSS2015$no_arth <- c(no_arth, rep(NA, nrow(BRFSS2015)-length(no_arth)))
as.tibble(BRFSS2015)
Before I started, I also removed NAs from weight using weight=na.omit(WEIGHT2)
Based on the info you provided one can only guess what when wrong in your analysis. But here is a working code using a snippet of the real data.
library(tidyverse)
BRFSS2015_minimal <- structure(list(HAVARTH3 = c(
1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2,
1, 1, 1, 1, 1, 1, 2, 1, 2
), WEIGHT2 = c(
280, 165, 158, 180, 142,
145, 148, 179, 84, 161, 175, 150, 9999, 140, 170, 128, 200, 178,
155, 163
)), row.names = c(NA, -20L), class = c(
"tbl_df", "tbl",
"data.frame"
))
BRFSS2015_minimal %>%
filter(!is.na(WEIGHT2), HAVARTH3 %in% 1:2) %>%
mutate(arth = HAVARTH3 == 1, no_arth = HAVARTH3 == 2,weight = WEIGHT2) %>%
group_by(arth, no_arth) %>%
summarize(
mean_weight = mean(weight),
sd_weight = sd(weight),
.groups = "drop"
)
#> # A tibble: 2 × 4
#> arth no_arth mean_weight sd_weight
#> <lgl> <lgl> <dbl> <dbl>
#> 1 FALSE TRUE 165 10.8
#> 2 TRUE FALSE 865 2629.
Code used to create dataset
BRFSS2015 <- readr::read_csv("2015.csv")
BRFSS2015_minimal <- dput(head(BRFSS2015[c("HAVARTH3", "WEIGHT2")], 20))

Add geom_vline in function with multiple density plots

I have the following
densityPlots <- lapply(numericCols, function(var_x){
p <- ggplot(df, aes_string(var_x)) + geom_density()
})
numericCols are the names of the columns that are numeric. I want to add the mean line, I have tried multiple things such as
densityPlots <- lapply(numericCols, function(var_x){
p <- ggplot(df, aes_string(var_x)) + geom_density() + geom_vline(aes(xintercept=mean(var_x)),
color="red", linetype="dashed", size=1)
})
The data
str(df)
tibble [9 × 4] (S3: tbl_df/tbl/data.frame)
$ A: num [1:9] 12 NA 34 45 56 67 78 89 100
$ B: num [1:9] 1 2 3 NA 5 6 7 8 9
$ C: num [1:9] 83 55 27 27 7 3 5 8 9
$ D: num [1:9] 6 2 NA 1 NA 3 4 5 6
dput(df)
structure(list(A = c(12, NA, 34, 45, 56, 67, 78, 89, 100), B = c(1,
2, 3, NA, 5, 6, 7, 8, 9), C = c(83, 55, 27, 27, 7, 3, 5, 8, 9
), D = c(6, 2, NA, 1, NA, 3, 4, 5, 6)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
print(numericCols)
[1] "A" "B" "C"
But it does not work, it just ignores the geom_vline function. Does someone have a suggestion? Thanks :)!
You should use mean(df[, var_x], na.rm=T) in geom_vline:
library(ggplot2)
df <- structure(list(A = c(12, NA, 34, 45, 56, 67, 78, 89, 100), B = c(1,
2, 3, NA, 5, 6, 7, 8, 9), C = c(83, 55, 27, 27, 7, 3, 5, 8, 9
), D = c(6, 2, NA, 1, NA, 3, 4, 5, 6)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
numericCols <- c("A","B","C")
df <- as.data.frame(df)
densityPlots <- lapply(numericCols, function(var_x) {
ggplot(df, aes_string(var_x)) + geom_density() +
geom_vline(aes(xintercept=mean(df[, var_x], na.rm=T)),
color="red", linetype="dashed", size=1)
})
gridExtra::grid.arrange(grobs=densityPlots)
Here is an approach somewhat different than what you tried in your question, but uses dplyr and tidyr to pivot the data and use ggplot mapping. Unfortunately, geom_vline doesn't summarize by group, so you have to pre-compute the values:
set.seed(3)
data <- data.frame(Category = paste0("Catagory",LETTERS[1:20]),
lapply(LETTERS[1:10],function(x){setNames(data.frame(runif(20,10,100)),x)}))
numericCols <- LETTERS[1:10]
library(dplyr)
library(tidyr)
library(ggplot2)
data.means <- data %>%
select(numericCols) %>%
pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
group_by(Variable) %>%
summarize(Mean = mean(var_x))
data %>%
select(numericCols) %>%
pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
ggplot(aes(x = var_x, color = Variable)) +
geom_density() +
geom_vline(data = data.means, aes(xintercept=Mean, color = Variable),
linetype="dashed", size=1)
Or you could combine with facet_wrap for multiple plots.
data %>%
select(numericCols) %>%
pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
ggplot(aes(x = var_x)) +
facet_wrap(.~Variable) +
geom_density() +
geom_vline(data = data.means, aes(xintercept=Mean, color = Variable),
linetype="dashed", size=1)

how to multiply multiple df columns

I have a df with a number of columns.
I want to multiply each of the column using a fixed constant.
I am looking for the best possible strategy to achieve this using purrr (I am still trying to get my head around lamp etc etc)
library(tidyverse)
library(lubridate)
df1 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(1, 2, 3, 4, 5),
y = c(2, 3, 4, 5, 6),
z = c(3, 4, 5, 6, 7)
)
The constants to multiply each of the column is as follows:
c(10, 20, 30)
This is the output I expect:
data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(10, 20, 30, 40, 50),
y = c(40, 60, 80, 100, 120),
z = c(90, 120, 150, 180, 210)
)
We can use map2 from purrr (part of the tidyverse) to achieve this.
df1[2:4] <- map2(df1[2:4], c(10, 20, 30), ~.x * .y)
df1
# date x y z
# 1 2019-02-01 10 40 90
# 2 2019-02-02 20 60 120
# 3 2019-02-03 30 80 150
# 4 2019-02-04 40 100 180
# 5 2019-02-05 50 120 210
The base R equivalent is mapply.
df1[2:4] <- mapply(FUN = function(x, y) x * y, df1[2:4], c(10, 20, 30), SIMPLIFY = FALSE)

Weighted mean calculation in R with missing values

Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately?
To convey this clearly, I created a hypothetical scenario. This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing.
Image: Weighted Mean Calculation
File: Weighted Mean Calculation in Excel
Using weighted.mean from the base stats package with the argument na.rm = TRUE should get you the result you need. Here is a tidyverse way this could be done:
library(tidyverse)
scores <- tribble(
~student, ~test1, ~test2, ~test3,
"Mark", 90, 91, 92,
"Mike", NA, 79, 98,
"Nick", 81, NA, 83)
weights <- tribble(
~test, ~weight,
"test1", 0.2,
"test2", 0.4,
"test3", 0.4)
scores %>%
gather(test, score, -student) %>%
left_join(weights, by = "test") %>%
group_by(student) %>%
summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#> student result
#> <chr> <dbl>
#> 1 Mark 91.20000
#> 2 Mike 88.50000
#> 3 Nick 82.33333
The best way to post an example dataset is to use dput(head(dat, 20)), where dat is the name of a dataset. Graphic images are a really bad choice for that.
DATA.
dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA),
Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
CODE.
You can use function weighted.mean in base package statsand sapply for this. Note that if your datasets of notes and weights are R objects of class matrix you will not need unlist.
sapply(seq_len(nrow(dat)), function(i){
weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})

R: How Plot an Excel Table(Matrix) with R

I got this problem I still haven't found out how to solve it. I want to plot all the Values MW1, MW2 and MW3 in function of "DHT + Procymidone". How can I plot all this values in the graphic so that I will get 3 different curves (in different colors and different number like curve 1, 2, ...)? And I want the labels of the X-Values("DHT + Procymidone") to be like -10, -9, ... , -4 instead of 1,00E-10, ...
DHT + Procymidone MW 1 MW 2 MW 3
1,00E-10 114,259526780335 111,022461066274 213,212408408682
1,00E-09 115,024187788314 111,083316791613 114,529425136628
1,00E-08 110,517449986348 107,867941606743 125,10230718665
1,00E-07 100,961311263444 98,4219995773135 116,045168653416
1,00E-06 71,2383604211297 73,539659636842 50,3213799775309
1,00E-05 20,3553333652104 36,1345771905088 15,42260866106
1,00E-04 4,06189509055904 18,1246447874679 10,1988107887318
I have shortened your data frame for convenience reasons, so here's an example:
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
library(tidyr)
library(ggplot2)
mydf <- gather(mydat, "grp", "MW", 2:4)
ggplot(mydf, aes(x = DHT_Procymidone, y = MW, colour = grp)) + geom_line()
which gives following plot:
To use ggplot, your data needs to be in long-format. gather does this for you, appending columns MW1-MW3 into one column, while the column names are added as new column values in the grp-column. This group-column allows to identify different groups, i.e. different colored lines in the plot.
Depending on the type of DHT + Procymidone, you can, e.g. use format(..., scientific = FALSE) to convert to numeric, however, this will result in -0.0000000001 (and not -10).
However, if this data column is a character vector (you can coerce with as.character), this may work:
a <- "1,00E-10"
sub("1,00E", "", a, fixed = TRUE)
> [1] "-10"
As an alternative answer to #Daniel's which doesn't rely on ggplot (thanks Daniel for providing the reproducible data).
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
plot(mydat[,2] ~ mydat[,1], typ = "l", ylim = c(0,220), xlim = c(-10,-2), xlab = "DHT Procymidone", ylab = "MW")
lines(mydat[,3] ~ mydat[,1], col = "blue")
lines(mydat[,4] ~ mydat[,1], col = "red")
legend(x = -4, y = 200, legend = c("MW1","MW2","MW3"), lty = 1, bty = "n", col = c("black","blue","red"))
To change axis labels see the text in xlab and ylab. To change axis limits see xlim and ylim.

Resources