creating frequency table in R - r

I have extracted some summary statistics from R:
group_by(starters, starters$Programme, starters$Gender ) %>% summarise(
count = n(),
# mean = mean(Total_testscore, na.rm = TRUE),
# sd = sd(Total_testscore, na.rm = TRUE),
percentage = (n()/238)*100)
group_by(starters, starters$Programme ) %>% summarise(
count = n(),
mean = mean(Total_testscore, na.rm = TRUE),
sd = sd(Total_testscore, na.rm = TRUE), percentage = (n()/238)*100)
and would like to get a table that looks like this :
I am using xtables to export my output to latex for all my other results. For xtables all my results have to be in one table. How can i combine the two outputs in order to get a table like pictured?

Related

Different values for weighted median

I am getting three different weighted median values when I use the following code/functions:
data %>%
summarise(
matrixStats::weightedMedian(income,
w = wgt,
na.rm = TRUE),
spatstat::weighted.median(income,
w = wgt,
na.rm = FALSE),
spatstat::weighted.quantile(income,
w = wgt,
na.rm = TRUE),
DescTools::Median(income_4,
weights = wgt,
na.rm = TRUE),
)
Does anyone know why this might be? I don't see a clear answer in the documentation of each of these functions.

Reshaping a set of variables obtained through summarize into organized table

I have a big problem that I have spent a lot of time trying to solve
Through the code below I obtained a tibble with 1 row and 22 columns:
median_income = survey_median(VD5008, na.rm = TRUE),
sd_income = survey_sd(VD5008, na.rm = TRUE),
mean_age = survey_mean(V2009, na.rm = TRUE),
median_age = survey_median(V2009, na.rm = TRUE),
sd_age = survey_sd(V2009, na.rm = TRUE),
mean_study = survey_mean(VD3005, na.rm = TRUE),
median_study = survey_median(VD3005, na.rm = TRUE),
sd_study = survey_sd(VD3005, na.rm = TRUE),
mean_hmembers = survey_mean(n_household_members, na.rm = TRUE),
median_hmembers = survey_median(n_household_members, na.rm = TRUE),
sd_hmembers = survey_sd(n_household_members, na.rm = TRUE),
number_observations = survey_total(na.rm = TRUE)
) %>%
mutate_if(is.numeric, round, 2)
Output of code
What I want is to transform that tibble like that:
Table that I want
I use some tools of tidyr, but unsuccessfully.
Is possible through tidyr tools? I would appreciate it if someone could help me with the code to transform this table

Store multiple outputs in one table

I am trying to store different outputs in one table so I can perform further analysis on them. below is my code where I need to run 4 times (for each company stocks). How can I store all value from the 4 companies in one table.
tapply(Ford_R_ER, as.integer(gl(length(Ford_R_ER), 12, length(Ford_R_ER))), FUN = mean, na.rm = TRUE)
tapply(GE_R_ER, as.integer(gl(length(GE_R_ER), 12, length(GE_R_ER))), FUN = mean, na.rm = TRUE)
tapply(MICROSOFT_R_ER, as.integer(gl(length(MICROSOFT_R_ER), 12, length(MICROSOFT_R_ER))), FUN = mean, na.rm = TRUE)
tapply(ORACLE_R_ER, as.integer(gl(length(ORACLE_R_ER), 12, length(ORACLE_R_ER))), FUN = mean, na.rm = TRUE)
If there are multiple columns, use summarise with across - create a data.frame/tibble with the vectors (assuming they are of the same length), create the grouping column with gl and summarise across the numeric columns to get the mean by group
library(dplyr)
dat %>%
group_by(grp = as.integer(gl(n(), 12, n()))) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
Or using aggregate from base R
aggregate(.~ grp, data = transform(df,
grp = as.integer(gl(nrow(df), 12, nrow(df)))),
mean, na.rm = TRUE, na.action = NULL)
In case we have different lengths for the vectors, create a function and reuse it
f1 <- function(vec, n = 12) {
tapply(vec, as.integer(gl(length(vec), n, length(vec))), FUN =
mean, na.rm = TRUE)
}
and then run the function either on a single vector or a list of vectors
f1(Ford_R_ER)
lapply(list(Ford_R_ER = Ford_R_ER, GE_R_ER = GE_R_ER,
MICROSOFT_R_ER = MICROSOFT_R_ER, ORACLE_R_ER = ORACLE_R_ER), f1)
data
dat <- data.frame(Ford_R_ER, GE_R_ER, MICROSOFT_R_ER, ORACLE_R_ER)

How do I create a summary statistic for multiple years and variables?

R newbie here.
I am working on a project for which I need to combine multiple years of data into a single summary statistic for each column. For example, I have five years worth of data that need to be averaged, with several columns for different variables.
The example provided in modern dive works:
summary_monthly_temp <- weather %>%
group_by(month) %>%
summarize(mean = mean(temp, na.rm = TRUE),
std_dev = sd(temp, na.rm = TRUE)
)
summary_monthly_temp
Then I modified it to fit my needs:
summarysummary<- filename%>%
group_by(country) %>%
summarize(mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE)
)
But within the summarize function, I need to summarize a few more variables such as population (getting the mean population) and total gdp.
What is the best way to do this?
I tried something like this but it is not working:
summary<- filename%>%
group_by(country) %>%
summarize(mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE))%>%
summarize(mean = mean(pop, na.rm = TRUE),
std_dev = sd(pop, na.rm = TRUE))%>%
I think I know why...piping one function into the other...
Thanks for your input!
First and foremost, you don't usually need to save data after applying a summarize function, because it's main use is to generate a summary of your data as an output on the console.
Now looking at your code, I see an issue:
filename %>%
group_by(country) %>%
summarize(
mean = mean(gdp, na.rm = TRUE),
std_dev = sd(gdp, na.rm = TRUE)
)
The problem seems to be the object called "filename", you need to import it explicitly as an R object in your workspace.
This guide should help you importing data from local files:
https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf
Now regarding the usage of summarize, as you example show, you can have multiple outputs, let's assume your dataframe has a variable named "pop":
actually_a_dataframe%>%
group_by(country) %>%
summarize(
mean_gdp = mean(gdp, na.rm = TRUE),
std_dev_gdp = sd(gdp, na.rm = TRUE),
mean_pop = mean(pop, na.rm = TRUE),
std_dev_pop = sd(pop, na.rm = TRUE)
)
This would produce a mean and std for both gdp and pop, for each country.

KnitR removes digits after the decimal point

I am having trouble getting Knit results to show digits after the decimal point.
When I run this code as a chunk:
cereclosure %>%
group_by(Outcome.of.2nd.Closure) %>%
summarise(
min = min(age.at.c2, na.rm = TRUE),
q1 = quantile(age.at.c2, 0.25, na.rm = TRUE),
median = median(age.at.c2, na.rm = TRUE),
q3 = quantile(age.at.c2, 0.75, na.rm = TRUE),
max = max(age.at.c2, na.rm = TRUE),
mean = mean(age.at.c2, na.rm = TRUE),
st.dev = sd(age.at.c2, na.rm = TRUE)
)
As an example, I get:
Outcome.of.2nd.Closure min
Failure 217.3772
Success 177.4907
Which is the outcome I want, that is I want to see the digits after the decimal. But when I knit the whole thing, my output looks like this:
Outcome.of.2nd.Closure st.dev
Failure 217.
Success 177.
So there are no digits after the decimal place.
I'm on R v 3.5.1 and tidyverse 1.2.1
Help is most appreciated.

Resources