Reshaping a set of variables obtained through summarize into organized table - r

I have a big problem that I have spent a lot of time trying to solve
Through the code below I obtained a tibble with 1 row and 22 columns:
median_income = survey_median(VD5008, na.rm = TRUE),
sd_income = survey_sd(VD5008, na.rm = TRUE),
mean_age = survey_mean(V2009, na.rm = TRUE),
median_age = survey_median(V2009, na.rm = TRUE),
sd_age = survey_sd(V2009, na.rm = TRUE),
mean_study = survey_mean(VD3005, na.rm = TRUE),
median_study = survey_median(VD3005, na.rm = TRUE),
sd_study = survey_sd(VD3005, na.rm = TRUE),
mean_hmembers = survey_mean(n_household_members, na.rm = TRUE),
median_hmembers = survey_median(n_household_members, na.rm = TRUE),
sd_hmembers = survey_sd(n_household_members, na.rm = TRUE),
number_observations = survey_total(na.rm = TRUE)
) %>%
mutate_if(is.numeric, round, 2)
Output of code
What I want is to transform that tibble like that:
Table that I want
I use some tools of tidyr, but unsuccessfully.
Is possible through tidyr tools? I would appreciate it if someone could help me with the code to transform this table

Related

Different values for weighted median

I am getting three different weighted median values when I use the following code/functions:
data %>%
summarise(
matrixStats::weightedMedian(income,
w = wgt,
na.rm = TRUE),
spatstat::weighted.median(income,
w = wgt,
na.rm = FALSE),
spatstat::weighted.quantile(income,
w = wgt,
na.rm = TRUE),
DescTools::Median(income_4,
weights = wgt,
na.rm = TRUE),
)
Does anyone know why this might be? I don't see a clear answer in the documentation of each of these functions.

Using summarize across with multiple functions when there are missing values

If I want to get the mean and sum of all the numeric columns using the mtcars data set, I would use following codes:
group_by(gear) %>%
summarise(across(where(is.numeric), list(mean = mean, sum = sum)))
But if I have missing values in some of the columns, how do I take that into account? Here is a reproducible example:
test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE),
"Firm" = head(LETTERS, 5),
"Exporter"= sample(c("Yes", "No"), 20, replace = TRUE),
"Revenue" = sample(100:200, 20, replace = TRUE),
stringsAsFactors = FALSE)
test.df1 <- rbind(test.df1,
data.frame("Year" = c(2018, 2018),
"Firm" = c("Y", "Z"),
"Exporter" = c("Yes", "No"),
"Revenue" = c(NA, NA)))
test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))
test.df_summarized <- test.df1 %>% group_by(Firm) %>% summarize(across(where(is.numeric)), list(mean = mean, sum = sum)))
If I would just summarize each variable separately, I could use the following:
test.df1 %>% group_by(Firm) %>% summarize(Revenue_mean = mean(Revenue, na.rm = TRUE,
Profit_mean = mean(Profit, na.rm = TRUE)
But I am trying to figure out how can I tweak the code I wrote above for mtcars to the example data set I have provided here.
Because your functions all have a na.rm argument, you can pass it along with the ...
test.df1 %>% summarize(across(where(is.numeric), list(mean = mean, sum = sum), na.rm = TRUE))
# Year_mean Year_sum Revenue_mean Revenue_sum Profit_mean Profit_sum
# 1 2019.045 44419 162.35 3247 138.25 2765
(I left out the group_by because it's not specified properly in your code and the example is still well-illustrated without it. Also make sure that your functions are inside across().)
Just for the record, you could also do it like this (and this works when the different functions have different arguments)
test.df1 %>%
summarise(across(where(is.numeric),
list(
mean = ~ mean(.x, na.rm = T),
sum = ~ sum(.x, na.rm = T))
)
)
# Year_mean Year_sum Revenue_mean Revenue_sum Profit_mean Profit_sum
# 1 2019.045 44419 144.05 2881 119.3 2386

unused argument when doing descriptive stats on R

I keep getting the error unused argument for my by function , do I need to download a package I already have dplyr ,plyr,tidyr, data.table and pacman... need help thanks
DHB<- TA[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)), ####Median has problems with data.table so need to tell it to convert to double
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = "dhb2015"]
Error in `[.data.frame`(TA, , .(mean = mean(sum_tbret, na.rm = TRUE), :
unused argument (by = "lb2018")
Based on the error, it is still a data.frame, we can convert to data.table with setDT - converts in place (or as.data.table) and then the data.table method would work
library(data.table)
setDT(TA)[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)),
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = dhb2015]

creating frequency table in R

I have extracted some summary statistics from R:
group_by(starters, starters$Programme, starters$Gender ) %>% summarise(
count = n(),
# mean = mean(Total_testscore, na.rm = TRUE),
# sd = sd(Total_testscore, na.rm = TRUE),
percentage = (n()/238)*100)
group_by(starters, starters$Programme ) %>% summarise(
count = n(),
mean = mean(Total_testscore, na.rm = TRUE),
sd = sd(Total_testscore, na.rm = TRUE), percentage = (n()/238)*100)
and would like to get a table that looks like this :
I am using xtables to export my output to latex for all my other results. For xtables all my results have to be in one table. How can i combine the two outputs in order to get a table like pictured?

KnitR removes digits after the decimal point

I am having trouble getting Knit results to show digits after the decimal point.
When I run this code as a chunk:
cereclosure %>%
group_by(Outcome.of.2nd.Closure) %>%
summarise(
min = min(age.at.c2, na.rm = TRUE),
q1 = quantile(age.at.c2, 0.25, na.rm = TRUE),
median = median(age.at.c2, na.rm = TRUE),
q3 = quantile(age.at.c2, 0.75, na.rm = TRUE),
max = max(age.at.c2, na.rm = TRUE),
mean = mean(age.at.c2, na.rm = TRUE),
st.dev = sd(age.at.c2, na.rm = TRUE)
)
As an example, I get:
Outcome.of.2nd.Closure min
Failure 217.3772
Success 177.4907
Which is the outcome I want, that is I want to see the digits after the decimal. But when I knit the whole thing, my output looks like this:
Outcome.of.2nd.Closure st.dev
Failure 217.
Success 177.
So there are no digits after the decimal place.
I'm on R v 3.5.1 and tidyverse 1.2.1
Help is most appreciated.

Resources