Transform a df into individual observations [duplicate] - r

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 7 months ago.
I want to transform a df from a "counting" approach (number of cases) to a "individual observations" approach.
Example:
df <- dplyr::tibble(
city = c("a", "a", "b", "b", "c", "c"),
sex = c(1,0,1,0,1,0),
age = c(1,2,1,2,1,2),
cases = c(2, 3, 1, 1, 1, 1))
Expected result
df <- dplyr::tibble(
city = c("a","a","a","a","a", "b", "b", "c", "c"),
sex = c(1,1,0,0,0,1,0,1,0),
age = c(1,1,2,2,2,1,2,1,2))

uncount() from tidyr can do that for you.
df |> tidyr::uncount(cases)

Related

proportion within each factor using dplyr [duplicate]

This question already has answers here:
Relative frequencies / proportions with dplyr
(10 answers)
Closed 1 year ago.
I want to get the prop inside each factor using dplyr. The desired result appears in desired$prop
Thanks in advance :))
data <- data.frame(
team = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
country = c("usa","uk",
"spain","usa","uk","spain","usa","uk","spain"),
value = c(40, 20, 10, 50, 30, 35, 50, 60, 25)
)
desired <- data.frame(
team = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
country = c("usa",
"uk","spain","usa","uk","spain","usa","uk",
"spain"),
value = c(40, 20, 10, 50, 30, 35, 50, 60, 25),
prop = c(0.285714286,0.181818182,0.142857143,0.357142857,
0.272727273,0.5,0.357142857,0.545454545,
0.357142857)
)
#MrFlick is right. And also faster than I am.
library(dplyr)
df <- data %>%
group_by(country) %>%
mutate(prop = value/sum(value))

R: cumsum and group_by [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Calculate cumulative sum (cumsum) by group
(5 answers)
Closed 1 year ago.
I have the following sample data frame:
dates <- c("2021-01-01", "2021-01-03", "2021-01-06", "2021-01-02", "2021-01-04", "2021-01-06")
group <- c("A", "A", "A", "B", "B", "B")
values <- c(1, 5, 4, 2, 7, 3)
df <- data.frame(dates = as.Date(dates), group = group, values)
df
Can someone please tell me how I can compute a variable as the cumulated sum of values for each group (A and B) separately (+ chronologically)?
values_cumulated should be 1, 6, 10, 2, 9, 12
I was trying it with group_by() and mutate(values_cum = cumsum(values) ) but couldnt get it to work.

Plotting based on occurrence in group

I would to make a bar chart that plots the bar as a proportion of the total group rather than the usual percentage. For a var to "count" it only needs to occur once in a group. For example in this df where id is the grouping variable
df <-
tibble(id = c(rep(1, 3), rep(2, 3), rep(3, 3)),
vars = c("a", NA, "b", "c", "d", "e", "a", "a", "a"))
The a bars would be:
a = 2/3 # since a occurs in 2 out of 3 groups
b = 1/3
c = 1/3
d = 1/3
e = 1/3
If I understand you correctly, a one-liner would suffice:
ggplot(distinct(df)) + geom_bar(aes(vars, stat(count) / n_distinct(df$id)))
Working answer:
tibble(id = c(rep(1, 3), rep(2, 3), rep(3, 3)),
vars = c("a", "a", "b", "c", "d", "e", "a", "a", "a")) %>%
group_by(id) %>%
distinct(vars) %>%
ungroup() %>%
add_count(vars) %>%
mutate(prop = n / n_distinct(id)) %>%
distinct(vars, .keep_all = T) %>%
ggplot(aes(vars, prop)) +
geom_col()

Using R merge() to collect non matching IDs [duplicate]

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 3 years ago.
So I have these two dataframes:
id <- c(1, 2, 3, 4, 5, 6, 7, 8)
drug <- c("A", "B", "C", "D", "E", "F", "G", "H")
value <- c(100, 200, 300, 400, 500, 600, 700, 800)
df1 <- data.frame(id, drug, value)
id <- c(1, 2, 3, 4, 6, 8)
treatment <- c("C", "IC", "C", "IC", "C", "C")
value <- c(700, 800, 900, 100, 200, 900)
df2 <- data.frame(id, treatment, value)
I used merge() to combined the two datasets like this
key = "id"
merge(df1,df2[key],by=key)
This worked but I end up droping some fields(due to not matching ids).
Is there a way I can see or collect the ids which were dropped as well?
My real dataset consists of 100s of entries so finding a way to find dropped ids would be very useful in R
library(dplyr)
> anti_join(df1, df2, by = "id")
id drug value
1 5 E 500
2 7 G 700
Or if you just want the IDs
> anti_join(df1, df2, by = "id")$id
[1] 5 7

Using matplot in R whenever certain column changes

Sorry in advance because I am new at asking questions here and don't know how to input this table properly.
Say I have a data frame in R constructed like:
team = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
m = cbind(team, value)
I want to create a plot that will give me 3 lines graphing the values for teams A, B, and C. I believe I can do this inputting the matrix m into matplot somehow, but I'm not sure how.
EDIT: I've gotten a lot closer to solving my problem. However I've realized that for some reason, with the code I have, "Value" is a list of 745 which matches the number of rows in my dataframe m. However when I unlist(Value) it turns into a numeric of length 894. Any ideas on why this would happen?
You can try something like this:
team = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
m = cbind.data.frame(team, value)
library(ggplot2)
ggplot(m, aes(x=as.factor(1:nrow(m)), y=value, group=team, col=team)) +
geom_line(lwd=2) + xlab('index')
if you have same number of ordered values for each team, you could use matplot to visualize them. but the data should be converted to matrix first;
m = cbind.data.frame(team, value, index = rep(1:3, 3))
m <- reshape(m, v.names = 'value', idvar = 'team', direction = 'wide', timevar = 'index')
matplot(t(m[, 2:4]), type = 'l', lty = 1)
legend('top', legend = m[, 1], lty = 1, col = 1:3)

Resources