proportion within each factor using dplyr [duplicate] - r

This question already has answers here:
Relative frequencies / proportions with dplyr
(10 answers)
Closed 1 year ago.
I want to get the prop inside each factor using dplyr. The desired result appears in desired$prop
Thanks in advance :))
data <- data.frame(
team = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
country = c("usa","uk",
"spain","usa","uk","spain","usa","uk","spain"),
value = c(40, 20, 10, 50, 30, 35, 50, 60, 25)
)
desired <- data.frame(
team = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
country = c("usa",
"uk","spain","usa","uk","spain","usa","uk",
"spain"),
value = c(40, 20, 10, 50, 30, 35, 50, 60, 25),
prop = c(0.285714286,0.181818182,0.142857143,0.357142857,
0.272727273,0.5,0.357142857,0.545454545,
0.357142857)
)

#MrFlick is right. And also faster than I am.
library(dplyr)
df <- data %>%
group_by(country) %>%
mutate(prop = value/sum(value))

Related

Transform a df into individual observations [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 7 months ago.
I want to transform a df from a "counting" approach (number of cases) to a "individual observations" approach.
Example:
df <- dplyr::tibble(
city = c("a", "a", "b", "b", "c", "c"),
sex = c(1,0,1,0,1,0),
age = c(1,2,1,2,1,2),
cases = c(2, 3, 1, 1, 1, 1))
Expected result
df <- dplyr::tibble(
city = c("a","a","a","a","a", "b", "b", "c", "c"),
sex = c(1,1,0,0,0,1,0,1,0),
age = c(1,1,2,2,2,1,2,1,2))
uncount() from tidyr can do that for you.
df |> tidyr::uncount(cases)

Recoding several columns at once

I'm recoding values to letters with the following line of code (which worked) :
df_mean$COMMUNITY_mean <- cut(df_mean$COMMUNITY_mean, breaks=c(0, 10, 25, 50, 75, 90, Inf), labels=c("a", "b", "c", "d", "e", "f"))
In order to apply it to multiple columns :
names <- colnames(df_mean) #extract columns names to a list
names <- names[-c(1:10)]; #remove 10 first columns not interested in
for(i in 1:length(names)) {
df_mean <- cut(names[[i]], breaks=c(0, 10, 25, 50, 75, 90, Inf), labels=c("a", "b", "c", "d", "e", "f"))
}
But it fails to execute "Error in cut.default(names[[i]], breaks = c(0, 10, 25, 50, 75, 90, Inf), :
'x' must be numerical"
Any suggestions ?
Try this in the first line
names <- colnames(df_mean[as.logical(lapply(df_mean , is.numeric))])
# remove this line ===> names <- colnames(df_mean)
to extract the numerical columns from your data

Allocate ordinal values to numerical vector in R [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 2 years ago.
I have a set of data from children, recorded across a number of sessions. The number of sessions and age of each child in each session is different for each participant, so it looks something like this:
library(tibble)
mydf <- tribble(~subj, ~age,
"A", 16,
"A", 17,
"A", 19,
"B", 10,
"B", 11,
"B", 12,
"B", 13)
What I don't currently have in the data is a variable for Session number, and I'd like to add this to my dataframe. Basically I want to create a numeric variable that is ordinal from 1-n for each child, something like this:
mydf2 <- tribble(~subj, ~age, ~session,
"A", 16, 1,
"A", 17, 2,
"A", 19, 3,
"B", 10, 1,
"B", 11, 2,
"B", 12, 3
"B", 13, 4)
Ideally I'd like to do this in dplyr().
You simply need to group by subj and use row_number():
mydf %>%
group_by(subj) %>%
mutate(session = row_number())

Aggregate the data in R

I have a data set that is shown below:
library(tidyverse)
data <- tribble(
~category, ~product_id,
"A", 10,
"B", 20,
"C", 30,
"A", 10,
"A", 10,
"B", 20,
"C", 30,
"A", 10,
"A", 10,
"B", 20,
)
And now, I want to group it by the "category" variable, keep the "product_id" and add a new variable that counts the categories:
aggregated_data <- tribble(
~category, ~product_id, ~numberOfcategory
"A", 10, 5,
"B", 20, 3,
"C", 30, 2,
)
I already got the "numberOfcategory" with this code:
data %>%
group_by(category) %>%
tally(sort=TRUE)
But somehow I could not keep the product_id.
Could someone help me to get the dataframe (aggregated_data)? Thanks in advance.
You were close! Just also group by product_id as follows:
data %>%
group_by(category,product_id) %>%
tally(sort=TRUE)

R -ggplot two boxplots for separate columns

So I am trying to make two boxplots on one graph of two separate variables.
I have a dataset with multiple variables but I wanna compare only two: income_husband, and income_wife.
I have done it using boxplot() but how can i do it using ggplot ?
It would help if you had some data to work with but I have put some sample data together. Group C is filtered out. Is this what you are sort of after?
library(tidyverse)
group = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", 'c', "c")
income = c(100, 120, 110, 23, 34, 120, 45, 156, 65, 52, 65, 98)
data <- tibble(group, income)
data
data2 <- data %>%
filter(group == "a" | group == "b" )
b <- ggplot(data2, aes(x = group, y = income))
b + geom_boxplot()

Resources