Given the input and code below, using dplyr and groups, how can I produce the results shown in the output? I know how to sum columns in groups using dplyr, but in this case I need to count how many of each non-numeric grade occurred in each class.
**INPUT**
Class Student Grade
1 Jack C
1 Mary B
1 Mo B
1 Jane A
1 Tom C
2 Don C
2 Betsy B
2 Sue C
2 Tayna B
2 Kim C
**CODE**
# Create the dataframe
Class <- c(1,1,1,1,1,2,2,2,2,2)
Name <- c("Jack", "Mary", "Mo", "Jane", "Tom", "Don", "Betsy", "Sue", "Tayna", "Kim")
Grade <- c("C","B","B","A","C","C","B","C","B","C")
StudentGrades <- data.frame(Class, Name, Grade)
**OUTPUT**
Class Grade-A Grade-B Grade-C
1 1 2 2
2 0 2 3
We can use count to get the frequency count and then with pivot_wider change from 'long' to 'wide' format
library(dplyr)
library(tidyr)
library(stringr)
StudentGrades %>%
count(Class, Grade = str_c('Grade_', Grade)) %>%
pivot_wider(names_from = Grade, values_from = n, values_fill = list(n = 0))
# A tibble: 2 x 4
# Class Grade_A Grade_B Grade_C
# <dbl> <int> <int> <int>
#1 1 1 2 2
#2 2 0 2 3
Or in base R
table(StudentGrades[c('Class', 'Grade')])
Here is a base R solution, where table() + split() are used
dfout <- do.call(rbind,lapply(split(StudentGrades,StudentGrades$Class),
function(v) c(unique(v[1]),table(v$Grade))))
such that
> dfout
Class A B C
1 1 1 2 2
2 2 0 2 3
Related
I'd like to make a frequency count individually for multiple columns with same possible values. The idea is to keep all columns from original data table, just adding a new one for levels and aggregating.
Here is an example of input data:
foo <- data.table(a = c(1,3,2,3,3), b = c(2,3,3,1,1), c = c(3,1,2,3,2))
# a b c
#1: 1 2 3
#2: 3 3 1
#3: 2 3 2
#4: 3 1 3
#5: 3 1 2
And desired output:
data.table(levels = 1:3, a = c(1,1,3), b = c(2,1,2), c = c(1,2,2))
# levels a b c
#1: 1 1 2 1
#2: 2 1 1 2
#3: 3 3 2 2
Thanks for helping !
We may use
library(data.table)
dcast(melt(foo)[, .N, .(variable, levels = value)],
levels ~ variable, value.var = 'N')
-output
Key: <levels>
levels a b c
<num> <int> <int> <int>
1: 1 1 2 1
2: 2 1 1 2
3: 3 3 2 2
Or using base R
table(stack(foo))
ind
values a b c
1 1 2 1
2 1 1 2
3 3 2 2
You could also use recast from reshape2:
reshape2::recast(foo, value~variable)
# No id variables; using all as measure variables
# Aggregation function missing: defaulting to length
value a b c
1 1 1 2 1
2 2 1 1 2
3 3 3 2 2
or even
reshape2::recast(foo, value~variable, length)
Here is an option using purrr and dplyr from the tidyverse:
library(purrr)
library(dplyr)
foo %>%
imap(~ as.data.frame(table(.x, dnn = "levels"), responseName = .y)) %>%
reduce(left_join, by = "levels")
Alternatively, you could use the pivot functions from tidyr:
library(dplyr)
library(tidyr)
foo %>%
pivot_longer(everything(),
values_to = "levels") %>%
count(name, levels) %>%
pivot_wider(id_cols = levels,
names_from = name,
values_from = n)
foo |>
melt() |>
dcast(value ~ variable, fun.aggregate = length)
# value a b c
# 1: 1 1 2 1
# 2: 2 1 1 2
# 3: 3 3 2 2
This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Closed 2 years ago.
My dataframe looks like
df <- data.frame(Role = c("a","a","b", "b", "c", "c"), Men = c(1,0,3,1,2,4), Women = c(2,1,1,4,3,1))
df.melt <- melt(df)
I only have access to the version that looks like df.melt, how to I get it in the df form?
Useing dcast just gets me errors I cant figure out the syntax of it.
We need a sequence column to specify the rows as there are duplicates in the melt for 'variable
library(tidyr)
library(dplyr)
library(data.table)
df.melt %>%
mutate(rn = rowid(variable)) %>%
pivot_wider(names_from = variable, values_from = value) %>%
select(-rn)
# A tibble: 6 x 3
# Role Men Women
# <chr> <dbl> <dbl>
#1 a 1 2
#2 a 0 1
#3 b 3 1
#4 b 1 4
#5 c 2 3
#6 c 4 1
If we are looking for efficient way for "best" way, dcast from data.table is fast
library(data.table)
dcast(setDT(df.melt), rowid(variable) + Role ~
variable, value.var = 'value')[, variable := NULL][]
# Role Men Women
#1: a 1 2
#2: a 0 1
#3: b 3 1
#4: b 1 4
#5: c 2 3
#6: c 4 1
Here is a base R option using unstack
cbind(
Role = df.melt[1:(nrow(df.melt) / length(unique(df.melt$variable))), 1],
unstack(rev(df.melt[-1]))
)
which gives
Role Men Women
1 a 1 2
2 a 0 1
3 b 3 1
4 b 1 4
5 c 2 3
6 c 4 1
Another option is using reshape
subset(
reshape(
transform(
df.melt,
id = ave(1:nrow(df.melt), Role, variable, FUN = seq_along)
),
direction = "wide",
idvar = c("Role", "id"),
timevar = "variable"
),
select = -id
)
which gives
Role value.Men value.Women
1 a 1 2
2 a 0 1
3 b 3 1
4 b 1 4
5 c 2 3
6 c 4 1
I have a dataframe of two columns id and result, and I want to assign factor levels to result depending on id. So that for id "1", result c("a","b","c","d") will have factor levels 1,2,3,4.
For id "2", result c("22","23","24") will have factor levels 1,2,3.
id <- c(1,1,1,1,2,2,2)
result <- c("a","b","c","d","22","23","24")
I tried to group them by split, but they will be converted to a list instead of a data frame, which causes a length problem for modeling. Can you help please?
Though the question was closed as a duplicate by user #Ronak Shah, I don't believe it is the same question.
After numbering the row by group the new column must be coerced to class "factor".
library(dplyr)
id <- c(1,1,1,1,2,2,2)
result <- c("a","b","c","d","22","23","24")
df <- data.frame(id, result)
df %>%
group_by(id) %>%
mutate(fac = row_number()) %>%
ungroup() %>%
mutate(fac = factor(fac))
# A tibble: 7 x 3
# id result fac
# <dbl> <fct> <fct>
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 1 d 4
#5 2 22 1
#6 2 23 2
#7 2 24 3
Edit.
If there are repeated values in result, coerce as.integer/factor to get numbers, then coerce those numbers to factor.
id2 <- c(1,1,1,1,2,2,2,2)
result2 <- c("a","b","c","d","22", "22","23","24")
df2 <- data.frame(id = id2, result = result2)
df2 %>%
group_by(id) %>%
mutate(fac = as.integer(factor(result))) %>%
ungroup() %>%
mutate(fac = factor(fac))
# A tibble: 8 x 3
# id result fac
# <dbl> <fct> <fct>
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 1 d 4
#5 2 22 1
#6 2 22 1
#7 2 23 2
#8 2 24 3
After grouping by id, we can use match with unique to assign unique number to each result. Using #Rui Barradas' dataframe df2
library(dplyr)
df2 %>%
group_by(id) %>%
mutate(ans = match(result, unique(result))) %>%
ungroup %>%
mutate(ans = factor(ans))
# id result ans
# <dbl> <fct> <fct>
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 1 d 4
#5 2 22 1
#6 2 22 1
#7 2 23 2
#8 2 24 3
I have a large data set that requires some converting but I am not sure what to do.
Let's say I have 2 participants in my study.
football_enjoyment <- c(5,3)
basketball_enjoyment <- c(5,5)
football_participation <- c(1,2)
basketball_participation <- c(1,3)
df<- data.frame(football_enjoyment,football_participation,
basketball_enjoyment,basketball_participation)
df$id <- seq.int(nrow(df))
df
## football_enjoyment football_participation basketball_enjoyment basketball_participation id
# 5 1 5 1 1
# 3 2 5 3 2
I want it to be like this
sports <- c("football","football", "basketball","basketball")
enjoyment_score <- c(5,3,5,5)
participation_score <- c(1,2,1,3)
id <- c(1,2)
df2 <- data.frame(sports, enjoyment_score,participation_score, id)
df2
## sports enjoyment_score participation_score id
# football 5 1 1
# football 3 2 2
# basketball 5 1 1
# basketball 5 3 2
I am stuck with the structure and the column/row names are just for demonstration purpose.
With tidyverse you could do:
library(tidyverse)
library(reshape2)
df %>% gather("variable", "value", - id) %>%
separate(variable, into = c("sports", "variable"), sep = "_") %>%
dcast(id + sports ~ variable) %>% arrange(desc(sports))
# id sports enjoyment participation
#1 1 football 5 1
#2 2 football 3 2
#3 1 basketball 5 1
#4 2 basketball 5 3
Or, in base you could do:
df2 <- reshape(df, varying = c("football_enjoyment", "football_participation", "basketball_enjoyment", "basketball_participation"),
direction = "long",
idvar = "id",
sep = "_",
timevar = "sports",
times = c("football", "basketball"), v.names = c('enjoyment', 'participation'))
rownames(df2) <- NULL
# id sports enjoyment participation
#1 1 football 5 1
#2 2 football 3 2
#3 1 basketball 5 1
#4 2 basketball 5 3
tidyr 1.0.0 has a pivot_longer function that can do this:
library(tidyr)
football_enjoyment <- c(5,3)
basketball_enjoyment <- c(5,5)
football_participation <- c(1,2)
basketball_participation <- c(1,3)
df<- data.frame(football_enjoyment,football_participation,
basketball_enjoyment,basketball_participation)
df$id <- seq.int(nrow(df))
df
#> football_enjoyment football_participation basketball_enjoyment
#> 1 5 1 5
#> 2 3 2 5
#> basketball_participation id
#> 1 1 1
#> 2 3 2
df %>% pivot_longer(-id, names_to = c("sports",".value"), names_sep = "_")
#> # A tibble: 4 x 4
#> id sports enjoyment participation
#> <int> <chr> <dbl> <dbl>
#> 1 1 football 5 1
#> 2 1 basketball 5 1
#> 3 2 football 3 2
#> 4 2 basketball 5 3
Created on 2019-09-20 by the reprex package (v0.3.0)
I have following dataframe in r
Company Education Health
A NA 1
A 1 2
A 1 NA
I want the count of levels in each columns(1,2,NA) in a following format
Company Education_1 Education_NA Health_1 Health_2 Health_NA
A 2 1 1 1 1
How can I do it in R?
You can do the following:
library(tidyverse)
df %>%
gather(k, v, -Company) %>%
unite(tmp, k, v, sep = "_") %>%
count(Company, tmp) %>%
spread(tmp, n)
## A tibble: 1 x 6
# Company Education_1 Education_NA Health_1 Health_2 Health_NA
# <fct> <int> <int> <int> <int> <int>
#1 A 2 1 1 1 1
Sample data
df <- read.table(text =
" Company Education Health
A NA 1
A 1 2
A 1 NA ", header = T)
Using DF in the Note at the end where we have added a company B as well and using the reshape2 package it can be done in one recast call. The id.var and fun arguments can be omitted and the same answer will be given but it will produce a message saying it used those defaults.
library(reshape2)
recast(DF, Company ~ variable + value,
id.var = "Company", fun = length)
giving this data frame:
Company Education_1 Education_NA Health_1 Health_2 Health_NA
1 A 2 1 1 1 1
2 B 2 1 1 1 1
Note
Lines <- " Company Education Health
1 A NA 1
2 A 1 2
3 A 1 NA
4 B NA 1
5 B 1 2
6 B 1 NA"
DF <- read.table(text = Lines)
In plyr you can use a hack with ddply by transposing tables to get what appear to be new columns:
x <- data.frame(Company="A",Education=c(NA,1,1),Health=c(1,2,NA))
library(plyr)
ddply(x,.(Company),plyr::summarise,
Education=t(table(addNA(Education))),
Health=t(table(addNA(Health)))
)
Company Education.1 Education.NA Health.1 Health.2 Health.NA
1 A 2 1 1 1 1
However, they are not really columns, but table elements in the data.frame.
You can use a do.call(data.frame,y) construct to make them proper data frame columns, but you need more than one row for it to work.