Different values for weighted median - weighted

I am getting three different weighted median values when I use the following code/functions:
data %>%
summarise(
matrixStats::weightedMedian(income,
w = wgt,
na.rm = TRUE),
spatstat::weighted.median(income,
w = wgt,
na.rm = FALSE),
spatstat::weighted.quantile(income,
w = wgt,
na.rm = TRUE),
DescTools::Median(income_4,
weights = wgt,
na.rm = TRUE),
)
Does anyone know why this might be? I don't see a clear answer in the documentation of each of these functions.

Related

Reshaping a set of variables obtained through summarize into organized table

I have a big problem that I have spent a lot of time trying to solve
Through the code below I obtained a tibble with 1 row and 22 columns:
median_income = survey_median(VD5008, na.rm = TRUE),
sd_income = survey_sd(VD5008, na.rm = TRUE),
mean_age = survey_mean(V2009, na.rm = TRUE),
median_age = survey_median(V2009, na.rm = TRUE),
sd_age = survey_sd(V2009, na.rm = TRUE),
mean_study = survey_mean(VD3005, na.rm = TRUE),
median_study = survey_median(VD3005, na.rm = TRUE),
sd_study = survey_sd(VD3005, na.rm = TRUE),
mean_hmembers = survey_mean(n_household_members, na.rm = TRUE),
median_hmembers = survey_median(n_household_members, na.rm = TRUE),
sd_hmembers = survey_sd(n_household_members, na.rm = TRUE),
number_observations = survey_total(na.rm = TRUE)
) %>%
mutate_if(is.numeric, round, 2)
Output of code
What I want is to transform that tibble like that:
Table that I want
I use some tools of tidyr, but unsuccessfully.
Is possible through tidyr tools? I would appreciate it if someone could help me with the code to transform this table

unused argument when doing descriptive stats on R

I keep getting the error unused argument for my by function , do I need to download a package I already have dplyr ,plyr,tidyr, data.table and pacman... need help thanks
DHB<- TA[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)), ####Median has problems with data.table so need to tell it to convert to double
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = "dhb2015"]
Error in `[.data.frame`(TA, , .(mean = mean(sum_tbret, na.rm = TRUE), :
unused argument (by = "lb2018")
Based on the error, it is still a data.frame, we can convert to data.table with setDT - converts in place (or as.data.table) and then the data.table method would work
library(data.table)
setDT(TA)[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)),
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = dhb2015]

How do I create a summary statistic for multiple years and variables?

R newbie here.
I am working on a project for which I need to combine multiple years of data into a single summary statistic for each column. For example, I have five years worth of data that need to be averaged, with several columns for different variables.
The example provided in modern dive works:
summary_monthly_temp <- weather %>%
group_by(month) %>%
summarize(mean = mean(temp, na.rm = TRUE),
std_dev = sd(temp, na.rm = TRUE)
)
summary_monthly_temp
Then I modified it to fit my needs:
summarysummary<- filename%>%
group_by(country) %>%
summarize(mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE)
)
But within the summarize function, I need to summarize a few more variables such as population (getting the mean population) and total gdp.
What is the best way to do this?
I tried something like this but it is not working:
summary<- filename%>%
group_by(country) %>%
summarize(mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE))%>%
summarize(mean = mean(pop, na.rm = TRUE),
std_dev = sd(pop, na.rm = TRUE))%>%
I think I know why...piping one function into the other...
Thanks for your input!
First and foremost, you don't usually need to save data after applying a summarize function, because it's main use is to generate a summary of your data as an output on the console.
Now looking at your code, I see an issue:
filename %>%
group_by(country) %>%
summarize(
mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE)
)
The problem seems to be the object called "filename", you need to import it explicitly as an R object in your workspace.
This guide should help you importing data from local files:
https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf
Now regarding the usage of summarize, as you example show, you can have multiple outputs, let's assume your dataframe has a variable named "pop":
actually_a_dataframe%>%
group_by(country) %>%
summarize(
mean_gdp = mean(gdp, na.rm = TRUE),
std_dev_gdp = sd(gdp, na.rm = TRUE),
mean_pop = mean(pop, na.rm = TRUE),
std_dev_pop = sd(pop, na.rm = TRUE)
)
This would produce a mean and std for both gdp and pop, for each country.

creating frequency table in R

I have extracted some summary statistics from R:
group_by(starters, starters$Programme, starters$Gender ) %>% summarise(
count = n(),
# mean = mean(Total_testscore, na.rm = TRUE),
# sd = sd(Total_testscore, na.rm = TRUE),
percentage = (n()/238)*100)
group_by(starters, starters$Programme ) %>% summarise(
count = n(),
mean = mean(Total_testscore, na.rm = TRUE),
sd = sd(Total_testscore, na.rm = TRUE), percentage = (n()/238)*100)
and would like to get a table that looks like this :
I am using xtables to export my output to latex for all my other results. For xtables all my results have to be in one table. How can i combine the two outputs in order to get a table like pictured?

KnitR removes digits after the decimal point

I am having trouble getting Knit results to show digits after the decimal point.
When I run this code as a chunk:
cereclosure %>%
group_by(Outcome.of.2nd.Closure) %>%
summarise(
min = min(age.at.c2, na.rm = TRUE),
q1 = quantile(age.at.c2, 0.25, na.rm = TRUE),
median = median(age.at.c2, na.rm = TRUE),
q3 = quantile(age.at.c2, 0.75, na.rm = TRUE),
max = max(age.at.c2, na.rm = TRUE),
mean = mean(age.at.c2, na.rm = TRUE),
st.dev = sd(age.at.c2, na.rm = TRUE)
)
As an example, I get:
Outcome.of.2nd.Closure min
Failure 217.3772
Success 177.4907
Which is the outcome I want, that is I want to see the digits after the decimal. But when I knit the whole thing, my output looks like this:
Outcome.of.2nd.Closure st.dev
Failure 217.
Success 177.
So there are no digits after the decimal place.
I'm on R v 3.5.1 and tidyverse 1.2.1
Help is most appreciated.

Resources