I am using the data found here: https://www.kaggle.com/cdc/behavioral-risk-factor-surveillance-system. In my R studio, I have named the csv file, BRFSS2015. Below is the code I am trying to execute. I have created two new columns comparing people who have arthritis vs. people who do not have arthritis (arth and no_arth). Grouping by these variables, I am now trying to find the mean and sd for their weights. The weight variable was generated from another variable in the dataset using this code: (weight = BRFSS2015$WEIGHT2) Below is the code I am trying to run for mean and sd.
BRFSS2015%>%
group_by(arth,no_arth)%>%
summarize(mean_weight=mean(weight),
sd_weight=sd(weight))
I am getting output that says mean and sd for these two groups is identical. I doubt this is correct. Can someone check and tell me why this is happening? The numbers I am getting are:
arth: mean = 733.2044; sd= 2197.377
no_arth: mean= 733.2044; sd= 2197.377
Here is how I created the variables arth and no_arth:
a=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="1")
b=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="2")
as.data.frame(BRFSS2015)
arth=c(a)
no_arth=c(b)
BRFSS2015$arth <- c(arth, rep(NA, nrow(BRFSS2015)-length(arth)))
BRFSS2015$no_arth <- c(no_arth, rep(NA, nrow(BRFSS2015)-length(no_arth)))
as.tibble(BRFSS2015)
Before I started, I also removed NAs from weight using weight=na.omit(WEIGHT2)
Based on the info you provided one can only guess what when wrong in your analysis. But here is a working code using a snippet of the real data.
library(tidyverse)
BRFSS2015_minimal <- structure(list(HAVARTH3 = c(
1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2,
1, 1, 1, 1, 1, 1, 2, 1, 2
), WEIGHT2 = c(
280, 165, 158, 180, 142,
145, 148, 179, 84, 161, 175, 150, 9999, 140, 170, 128, 200, 178,
155, 163
)), row.names = c(NA, -20L), class = c(
"tbl_df", "tbl",
"data.frame"
))
BRFSS2015_minimal %>%
filter(!is.na(WEIGHT2), HAVARTH3 %in% 1:2) %>%
mutate(arth = HAVARTH3 == 1, no_arth = HAVARTH3 == 2,weight = WEIGHT2) %>%
group_by(arth, no_arth) %>%
summarize(
mean_weight = mean(weight),
sd_weight = sd(weight),
.groups = "drop"
)
#> # A tibble: 2 × 4
#> arth no_arth mean_weight sd_weight
#> <lgl> <lgl> <dbl> <dbl>
#> 1 FALSE TRUE 165 10.8
#> 2 TRUE FALSE 865 2629.
Code used to create dataset
BRFSS2015 <- readr::read_csv("2015.csv")
BRFSS2015_minimal <- dput(head(BRFSS2015[c("HAVARTH3", "WEIGHT2")], 20))
Related
I have the following dataframe:
group
Class
Maths
Science
Name1
7
74
78
Name2
7
80
91
Name3
6
69
80
I want to create different radar plots for the variables Maths and Science for each classes using R. eg: For the above dataframe, two radar plots should be created for two classes 7 and 6.
nrange <- 2
class <- c(7,6)
for (i in nrange){
plot <- ggradar::ggradar(df[i,2:3], values.radar = c(0, 50, 100), group.line.width = 1,
group.point.size = 2, legend.position = "bottom", plot.title=class[i])
}
plot
I using the above code. However, it is only creating the plot for the last row. Please help me with this issue.
Thanks a lot in advance!
You were almost there, but there were two little problems.
The for statement evaluated to for(i in 2) which means it is only using i=2. You can fix this by using for(i in 1:nrange)
You were overwriting plot each time through the loop. If you make plot a list and save each graph as a separate element in the list, then it should work.
mydat <- tibble::tribble(
~group, ~Class, ~Maths, ~Science,
"Name1", 7, 74, 78,
"Name2", 7, 80, 91,
"Name3", 6, 69, 80)
plots <- list()
nrange <- 2
class <- c(7,6)
for (i in 1:3){
plots[[i]] <- ggradar::ggradar(mydat[i,2:4], values.radar = c(0, 50, 100),
grid.max = 100, group.line.width = 1,
group.point.size = 2, legend.position = "bottom", plot.title=mydat$Class[i])
}
plots
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]
Created on 2023-02-03 by the reprex package (v2.0.1)
Putting Together with facet_wrap()
library(dplyr)
library(ggplot2)
mydat <- tibble::tribble(
~group, ~Class, ~Maths, ~Science,
"Name1", 7, 74, 78,
"Name2", 7, 80, 91,
"Name3", 6, 69, 80)
mydat <- mydat %>%
mutate(gp = paste(group, Class, sep=": ")) %>%
select(gp, Maths, Science)
ggradar::ggradar(mydat, values.radar = c(0, 50, 100),
grid.max = 100, group.line.width = 1,
group.point.size = 2, legend.position = "bottom") +
facet_wrap(~gp)
Created on 2023-02-06 by the reprex package (v2.0.1)
I am trying to make a table like this -
The table contains several scenarios and risk_type.
The scenarios are basically filters. For example
0 - loan_age > 18
1 - interest_rate > 8%
2 - interest_rate > 18% AND referee == "MALE" AND new_LTV > 50
risk_type are columns in the original dataset like
A - flood risk
B - wildfire risk
C - foundation risk
What I want to do is to create a summary table of all these different risks for all the filters.
This is how the data looks like -
Damage and new LTV is a function of risk score, and I want to filter for risk score > 4
Edit - The first 5 rows of the dummy dataframe.
structure(list(ID = c(1, 2, 3, 4, 5), LTV_value = c(43, 43, 32,
34, 35), loan_age = c(17, 65, 32, 33, 221), referee = c("MALE",
"FEMALE", "MALE", "MALE", "FEMALE"), interest_rate = c(0.02,
0.03, 0.05, 0.0633333333333333, 0.0783333333333333), value = c(70000,
80000, 90000, 1e+05, 45000), flood_risk_score = c(3, 4, 5, 0,
1), wildfire_risk_score = c(3, 4, 3, 3, 2), foundation_risk_score = c(5,
5, 2, 0, 1), flood_damage = c(21000, 32000, 45000, 0, 4500),
wildfire_damage = c(21000, 32000, 27000, 30000, 9000), foundation_damage = c(35000,
40000, 18000, 0, 4500), new_LTV_flood = c(40, 39, 27, 34,
34), new_LTV_wildfire = c(40, 39, 29, 31, 33), new_LTV_foundation = c(38,
38, 30, 34, 34)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
Till now I have tried these methods.
risk_list = c("flood_risk_score"
, "wildfire_risk_score"
, "foundation_risk_score")
for (i in risk_list){
table <- df %>%
filter(df[i] > 3) %>%
summarise(Count = n()
, mean = mean(value, na.rm = TRUE)
, LTV = mean(LTV))
# Using rbind() to append the output of one iteration to the dataframe
table_append= rbind(table_append, table)
}
This helps me get the values for all the risk scores, however, I have two issues here.
I am unable to filter according to a filter list
For the filter list, I tried this code, but I am unable to add it in a loop -
filters_list = list(which(df$interest > 8)
, which(df$loan_age > 18))
For LTV update, all of them have different new LTV
All of them need to be filtered for high LTV using their new LTV scores
risk_type_list = c("flood"
, "wildfire"
, "foundation")
for (i in list(paste0(risk_type_list,"_risk_level"))){
table <- df %>%
filter(df[paste0(i,"_risk_level")] > 3) %>%
summarise(Count = n())
#Using rbind() to append the output of one iteration to the dataframe
table_append = rbind(table_append, table)
}
In the end, I want to have code that will generate data from the given data by putting in required filters for all different risk types and also use their new LTV values.
I came up with the idea to represent stats on a chart like this. Example of the plot. And made it like this.
df_n <- df_normalized %>%
transmute(
Height_x = round(Height*cos_my(45), 2),
Height_y = round(Height*sin_my(45), 2),
Weight_x = round(Weight*cos_my(45*2), 2),
Weight_y = round(Weight*sin_my(45*2), 2),
Reach_x = round(Reach*cos_my(45*3), 2),
Reach_y = round(Reach*sin_my(45*3), 2),
SLpM_x = round(SLpM*cos_my(45*4), 2),
SLpM_y = round(SLpM*sin_my(45*4), 2),
Str_Def_x = round(`Str_Def %`*cos_my(45*5), 2),
Str_Def_y = round(`Str_Def %`*sin_my(45*5), 2),
TD_Avg_x = round(TD_Avg*cos_my(45*6), 2),
TD_Avg_y = round(TD_Avg*sin_my(45*6), 2),
TD_Acc_x = round(`TD_Acc %`*cos_my(45*7), 2),
TD_Acc_y = round(`TD_Acc %`*sin_my(45*7), 2),
Sub_Avg_x = round(Sub_Avg*cos_my(45*8), 2),
Sub_Avg_y = round(Sub_Avg*sin_my(45*8), 2))
Now I want to do this smart way, so I created a data frame with same number of rows empty_df, and later in for loop I try to mutate and array, with every iteration. So for example I want to multiply 1st column by cos(30), 2nd by cos(30*2), and so on
But...
It mutate only last column because all columns during iteration have the same name 'column'.
I want to name each column by the variable column, made with paste0().
reprex_df <- structure(list(Height = c(190, 180, 183, 196, 185),
Weight = c(120, 77, 93, 120, 84),
Reach = c(193, 180, 188, 203, 193),
SLpM = c(2.45, 3.8, 2.05, 7.09, 3.17),
`Str_Def %` = c(58, 56, 55, 34, 44),
TD_Avg = c(1.23, 0.33, 0.64, 0.91, 0),
`TD_Acc %` = c(24, 50, 20, 66, 0),
Sub_Avg = c(0.2, 0, 0, 0, 0)), row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame"))
temp <- apply(reprex_df[,1], function(x) x*cos(60), MARGIN = 2)
temp
empty_df <- data.frame(first_column = replicate(length(temp),1))
for (x in 1:8) {
temp <- apply(df[,x], function(x) round(x*cos((360/8)*x),2), MARGIN = 2)
column <- paste0("Column_",x)
empty_df <- mutate(empty_df, column = temp)
}
Later I want to make it a function where I can pass data frame and receive data frame with X, and Y coordinates.
So, how should I make it?
Perhaps this helps
library(purrr)
library(stringr)
nm1 <- names(reprex_df)
nm_cos <- str_c(names(reprex_df), "_x")
nm_sin <- str_c(names(reprex_df), "_y")
reprex_df[nm_cos] <- map2(reprex_df, seq_along(nm1),
~ round(.x * cos(45 *.y ), 2))
reprex_df[nm_sin] <- map2(reprex_df[nm1], seq_along(nm1),
~ round(.x * sin(45 *.y ), 2))
I have a dataset called carcom that looks like this
carcom <- data.frame(household = c(173, 256, 256, 319, 319, 319, 422, 422, 422, 422), individuals= c(1, 1, 2, 1, 2, 3, 1, 2, 3, 4))
Where individuals refer to father for "1" , mother for "2", child for "3" and "4". What I would like to get two new columns. First one should indicate the number of children in that household if there is. Second, assigning a weight to each individual respectively "1" for father, "0.5" to mother and "0.3" to each child. My new dataset should look like this
newcarcom <- data.frame(household = c(173, 256, 319, 422), child = c(0, 0, 1, 2), weight = c(1, 1.5, 1.8, 2.1)
I have been trying to find the solutions for days. Would be appreciated if someone helps me. Thanks
We can count number of individuals with value 3 and 4 in each household. To calculate weight we change the value for 1:4 to their corresponding weight values using recode and then take sum.
library(dplyr)
newcarcom <- carcom %>%
group_by(household) %>%
summarise(child = sum(individuals %in% 3:4),
weight = sum(recode(individuals,`1` = 1, `2` = 0.5, .default = 0.3)))
# household child weight
# <dbl> <int> <dbl>
#1 173 0 1
#2 256 0 1.5
#3 319 1 1.8
#4 422 2 2.1
Base R version suggested by #markus
newcarcom <- do.call(data.frame, aggregate(individuals ~ household, carcom, function(x)
c(child = sum(x %in% 3:4), weight = sum(replace(y <- x^-1, y < 0.5, 0.3)))))
An option with data.table
library(data.table)
setDT(carcom)[, .(child = sum(individuals %in% 3:4),
weight = sum(recode(individuals,`1` = 1, `2` = 0.5, .default = 0.3))), household]
Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately?
To convey this clearly, I created a hypothetical scenario. This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing.
Image: Weighted Mean Calculation
File: Weighted Mean Calculation in Excel
Using weighted.mean from the base stats package with the argument na.rm = TRUE should get you the result you need. Here is a tidyverse way this could be done:
library(tidyverse)
scores <- tribble(
~student, ~test1, ~test2, ~test3,
"Mark", 90, 91, 92,
"Mike", NA, 79, 98,
"Nick", 81, NA, 83)
weights <- tribble(
~test, ~weight,
"test1", 0.2,
"test2", 0.4,
"test3", 0.4)
scores %>%
gather(test, score, -student) %>%
left_join(weights, by = "test") %>%
group_by(student) %>%
summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#> student result
#> <chr> <dbl>
#> 1 Mark 91.20000
#> 2 Mike 88.50000
#> 3 Nick 82.33333
The best way to post an example dataset is to use dput(head(dat, 20)), where dat is the name of a dataset. Graphic images are a really bad choice for that.
DATA.
dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA),
Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")
CODE.
You can use function weighted.mean in base package statsand sapply for this. Note that if your datasets of notes and weights are R objects of class matrix you will not need unlist.
sapply(seq_len(nrow(dat)), function(i){
weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})