How do I create new rows based on cell value?

How do I create new rows based on cell value? - r

I have a dataframe df where:
Days Treatment A Treatment B Treatment C
0 5 1 1
1 0 2 3
2 1 1 0
For example, there were 5 individuals receiving Treatment A that survived 0 days and 1 who survived 2, etc. However, I would like it where those 5 individuals now become a unique row, with that cell representing the days they survived:
Patient # A B C
1 0
2 0
3 0
4 0
5 0
6 2
7 0
8 1
9 1
10 2
11 0
12 1
13 1
14 1
Let Patient # = an arbitrary value.
I am sorry if this is not descriptive enough, but I appreciate any and all help you have to offer! I have the dataset in Excel at the moment, but I can place it into R if that's easier.

We can replicate values the 'Days' with each of the 'Patient' column values in a list, then create a list of the sequence, use Map to construct a data.frame and finally use bind_rows
library(dplyr)
lst1 <- lapply(df[-1], function(x) rep(df$Days, x))
bind_rows(Map(function(x, y, z) setNames(data.frame(x, y),
c("Patient", z)), relist(seq_along(unlist(lst1)),
skeleton = lst1), lst1, sub("Treatment\\s+", "", names(lst1))))
-output
# Patient A B C
#1 1 0 NA NA
#2 2 0 NA NA
#3 3 0 NA NA
#4 4 0 NA NA
#5 5 0 NA NA
#6 6 2 NA NA
#7 7 NA 0 NA
#8 8 NA 1 NA
#9 9 NA 1 NA
#10 10 NA 2 NA
#11 11 NA NA 0
#12 12 NA NA 1
#13 13 NA NA 1
#14 14 NA NA 1
Or another option with reshaping into 'long' and then to 'wide'
library(tidyr)
df %>%
pivot_longer(cols = -Days) %>%
separate(name, into = c('name1', 'name2')) %>%
group_by(name2) %>%
summarise(value = rep(Days, value), .groups = 'drop') %>%
mutate(Patient = row_number()) %>%
pivot_wider(names_from = name2, values_from = value)
-output
# A tibble: 14 x 4
# Patient A B C
# <int> <int> <int> <int>
# 1 1 0 NA NA
# 2 2 0 NA NA
# 3 3 0 NA NA
# 4 4 0 NA NA
# 5 5 0 NA NA
# 6 6 2 NA NA
# 7 7 NA 0 NA
# 8 8 NA 1 NA
# 9 9 NA 1 NA
#10 10 NA 2 NA
#11 11 NA NA 0
#12 12 NA NA 1
#13 13 NA NA 1
#14 14 NA NA 1
data
df <- structure(list(Days = 0:2, `Treatment A` = c(5L, 0L, 1L),
`Treatment B` = c(1L,
2L, 1L), `Treatment C` = c(1L, 3L, 0L)), class = "data.frame", row.names = c(NA,
-3L))

Related

Is there a way to group values in a column between data gaps in R?

I want to group my data in different chunks when the data is continuous. Trying to get the group column from dummy data like this:
a b group
<dbl> <dbl> <dbl>
1 1 1 1
2 2 2 1
3 3 3 1
4 4 NA NA
5 5 NA NA
6 6 NA NA
7 7 12 2
8 8 15 2
9 9 NA NA
10 10 25 3
I tried using
test %>% mutate(test = complete.cases(.)) %>%
group_by(group = cumsum(test == TRUE)) %>%
select(group, everything())
But it doesn't work as expected:
group a b test
<int> <dbl> <dbl> <lgl>
1 1 1 1 TRUE
2 2 2 2 TRUE
3 3 3 3 TRUE
4 3 4 NA FALSE
5 3 5 NA FALSE
6 3 6 NA FALSE
7 4 7 12 TRUE
8 5 8 15 TRUE
9 5 9 NA FALSE
10 6 10 25 TRUE
Any advice?

Using rle in base R -
transform(df, group1 = with(rle(!is.na(b)), rep(cumsum(values), lengths))) |>
transform(group1 = replace(group1, is.na(b), NA))
# a b group group1
#1 1 1 1 1
#2 2 2 1 1
#3 3 3 1 1
#4 4 NA NA NA
#5 5 NA NA NA
#6 6 NA NA NA
#7 7 12 2 2
#8 8 15 2 2
#9 9 NA NA NA
#10 10 25 3 3

A couple of approaches to consider if you wish to use dplyr for this.
First, you could look at transition from non-complete cases (using lag) to complete cases.
library(dplyr)
test %>%
mutate(test = complete.cases(.)) %>%
group_by(group = cumsum(test & !lag(test, default = F))) %>%
mutate(group = replace(group, !test, NA))
Alternatively, you could add row numbers to your data.frame. Then, you could filter to include only complete cases, and group_by enumerating with cumsum based on gaps in row numbers. Then, join back to original data.
test$rn <- seq.int(nrow(test))
test %>%
filter(complete.cases(.)) %>%
group_by(group = c(0, cumsum(diff(rn) > 1)) + 1) %>%
right_join(test) %>%
arrange(rn) %>%
dplyr::select(-rn)
Output
a b group
<int> <int> <dbl>
1 1 1 1
2 2 2 1
3 3 3 1
4 4 NA NA
5 5 NA NA
6 6 NA NA
7 7 12 2
8 8 15 2
9 9 NA NA
10 10 25 3

Using data.table, get rleid then remove group IDs for NAs, then fix the sequence with factor to integer conversion:
library(data.table)
setDT(test)[, group1 := {
x <- complete.cases(test)
grp <- rleid(x)
grp[ !x ] <- NA
as.integer(factor(grp))
}]
# a b group group1
# 1: 1 1 1 1
# 2: 2 2 1 1
# 3: 3 3 1 1
# 4: 4 NA NA NA
# 5: 5 NA NA NA
# 6: 6 NA NA NA
# 7: 7 12 2 2
# 8: 8 15 2 2
# 9: 9 NA NA NA
# 10: 10 25 3 3

trying to calculate sum of row with dataframe having NA values

I am trying to sum the row of values if any column have values but not working for me like below
df=data.frame(
x3=c(2,NA,3,5,4,6,NA,NA,3,3),
x4=c(0,NA,NA,6,5,6,NA,0,4,2))
df$summ <- ifelse(is.na(c(df[,"x3"] & df[,"x4"])),NA,rowSums(df[,c("x3","x4")], na.rm=TRUE))
the output should be like

An alternative solution:
library(data.table)
setDT(df)[!( is.na(x3) & is.na(x4)),summ:=rowSums(.SD, na.rm = T)]

You can do :
df <- transform(df, summ = ifelse(is.na(x3) & is.na(x4), NA,
rowSums(df, na.rm = TRUE)))
df
# x3 x4 summ
#1 2 0 2
#2 NA NA NA
#3 3 NA 3
#4 5 6 11
#5 4 5 9
#6 6 6 12
#7 NA NA NA
#8 NA 0 0
#9 3 4 7
#10 3 2 5
In general for any number of columns :
cols <- c('x3', 'x4')
df <- transform(df, summ = ifelse(rowSums(is.na(df[cols])) == length(cols),
NA, rowSums(df, na.rm = TRUE)))

Try the code below with rowSums + replace
df$summ <- replace(rowSums(df, na.rm = TRUE), rowSums(is.na(df)) == 2, NA)
which gives
> df
x3 x4 summ
1 2 0 2
2 NA NA NA
3 3 NA 3
4 5 6 11
5 4 5 9
6 6 6 12
7 NA NA NA
8 NA 0 0
9 3 4 7
10 3 2 5

This is not much different from already posted answers, however, it contains some useful functions:
library(dplyr)
df %>%
rowwise() %>%
mutate(Count = ifelse(all(is.na(cur_data())), NA,
sum(c_across(everything()), na.rm = TRUE)))
# A tibble: 10 x 3
# Rowwise:
x3 x4 Count
<dbl> <dbl> <dbl>
1 2 0 2
2 NA NA NA
3 3 NA 3
4 5 6 11
5 4 5 9
6 6 6 12
7 NA NA NA
8 NA 0 0
9 3 4 7
10 3 2 5

dplyr conditional column if not null to calculate overall percent

Hello a really simple question but I have just got stuck, how do I add a conditional column containing number 1 where completed column is not NA?
id completed
<chr> <chr>
1 abc123sdf 35929
2 124cv NA
3 125xvdf 36295
4 126v NA
5 127sdsd 43933
6 128dfgs NA
7 129vsd NA
8 130sdf NA
9 131sdf NA
10 123sdfd NA
I need this to calculate an overall percent of completed column/id.
(Additional question - how can I do this in dplyr without using a helper column?)
Thanks

You can use is.na to check for NA values.
library(dplyr)
df %>% mutate(newcol = as.integer(!is.na(completed)))
# id completed newcol
#1 abc123sdf 35929 1
#2 124cv NA 0
#3 125xvdf 36295 1
#4 126v NA 0
#5 127sdsd 43933 1
#6 128dfgs NA 0
#7 129vsd NA 0
#8 130sdf NA 0
#9 131sdf NA 0
#10 123sdfd NA 0

library("dplyr")
df <- data.frame(id = 1:10,
completed = c(35929, NA, 36295, NA, 43933, NA, NA, NA, NA, NA))
df %>%
mutate(is_na = as.integer(!is.na(completed)))
#> id completed is_na
#> 1 1 35929 1
#> 2 2 NA 0
#> 3 3 36295 1
#> 4 4 NA 0
#> 5 5 43933 1
#> 6 6 NA 0
#> 7 7 NA 0
#> 8 8 NA 0
#> 9 9 NA 0
#> 10 10 NA 0
But you shouldn't need this extra column to calculate a percentage, you can just use na.rm:
df %>%
mutate(pct = completed / sum(completed, na.rm = TRUE))
#> id completed pct
#> 1 1 35929 0.3093141
#> 2 2 NA NA
#> 3 3 36295 0.3124650
#> 4 4 NA NA
#> 5 5 43933 0.3782209
#> 6 6 NA NA
#> 7 7 NA NA
#> 8 8 NA NA
#> 9 9 NA NA
#> 10 10 NA NA

We can also do
library(dplyr)
df %>%
mutate(newcol = +(!is.na(completed)))

Update multiple NA columns to 0

How to efficiently update multiple columns from NA to 0? Don't want to update all NA to 0. Only certain columns need be updated.
My current solution: Is there a better method?
dataframe$col1 = replace(dataframe$col1, is.na(dataframe$col1), 0)
dataframe$col2 = replace(dataframe$col2, is.na(dataframe$col2), 0)
dataframe$col3 = replace(dataframe$col3, is.na(dataframe$col3), 0)
Syntax used and this is not working as expected. Meaning, does not replace NA to 0.
dataframe = dataframe %>% mutate(across(c('col1', 'col2', 'col3'), ~ replace(., all(is.na(.)), 0)))
Sample Data.
structure(list(col1 = c(63755.4062, 61131.3242,
61131.3242, 192055.25, 191429.9844, 190076.4688), col2 = c(18.8754,
14.6002, 14.6002, 24.0053, 24.4012, 25.3588), col3 = c(NA, NA, NA, 45.6442, 43.9821, 47.2581)), row.names = c(NA, 6L), class = "data.frame")
Following worked. Thanks #MATT, and #Karthik.
dataframe = dataframe %>% mutate(across(c('col1', 'col2', 'col3'), ~ ~tidyr::replace_na(., 0)))

I'm not sure why Karthik's solution still returns NA for your sample data, but using replace_na from tidyr seems to work:
library(tidyr)
dataframe %>% mutate(across(c('col1', 'col2', 'col3'), ~ replace_na(., 0)))
Which gives us:
col1 col2 col3
1 63755.41 18.8754 0.0000
2 61131.32 14.6002 0.0000
3 61131.32 14.6002 0.0000
4 192055.25 24.0053 45.6442
5 191429.98 24.4012 43.9821
6 190076.47 25.3588 47.2581

Does this work:
library(dplyr)
library(tibble)
df <- tibble(c1 = round(rnorm(10, 10,1)),
c2 = NA_real_,
c3 = round(rnorm(10, 10,1)),
c4 = NA_real_,
c5 = round(rnorm(10, 10,1)),
c6 = NA_real_)
df
# A tibble: 10 x 6
c1 c2 c3 c4 c5 c6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 NA 11 NA 11 NA
2 9 NA 10 NA 10 NA
3 11 NA 11 NA 10 NA
4 11 NA 9 NA 10 NA
5 10 NA 9 NA 10 NA
6 9 NA 13 NA 12 NA
7 10 NA 10 NA 9 NA
8 10 NA 10 NA 9 NA
9 11 NA 11 NA 10 NA
10 10 NA 10 NA 10 NA
df %>% mutate(across(c3:c6, ~ replace(., all(is.na(.)), 0)))
# A tibble: 10 x 6
c1 c2 c3 c4 c5 c6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 NA 11 0 11 0
2 9 NA 10 0 10 0
3 11 NA 11 0 10 0
4 11 NA 9 0 10 0
5 10 NA 9 0 10 0
6 9 NA 13 0 12 0
7 10 NA 10 0 9 0
8 10 NA 10 0 9 0
9 11 NA 11 0 10 0
10 10 NA 10 0 10 0

The quickest way IMO would be to subset the columns you want to edit as a new dataframe, edit all NAs in the subset to 0, then overwrite your original df's selected columns.
DFsubset <- DF[,10:12] #whichever columns
DFsubset[is.na(DFsubset) == T] <- 0
DF[,10:12] <- DFsubset

Cross Tabulation in R Dataframe

I have a dataframe in R:
Subject T O E P Score
1 0 1 0 1 256
2 1 0 1 0 325
2 0 1 0 1 125
3 0 1 0 1 27
4 0 0 0 1 87
5 0 1 0 1 125
6 0 1 1 1 100
This is just a display of the dataframe. In reality, I have a lot of lines for each of the subjects. But the subjects are only from 1 to 6
For each Subject, the possible values are:
T : 0 or 1
O : 0 or 1
E : 0 or 1
P : 0 or 1
Score : Numeric value
I want to create a new dataframe with 6 lines (one for each subject) and the calculated MEAN score for each of these combinations :
T , O , E , P , TO , TE, TP, OE , OP , PE , TOP , TOE , POE , PET
The above will the columns of the new dataframe.
The final output should look like this
Subject T O E P TO TE TP OE OP PE TOP TOE POE PET
1
2
3
4
5
6
For each of these lines x columns the value is the MEAN SCORE
I tried aggregate and table but I can't seem to get what I want
Sorry I am new to R
Thanks

I had to rebuild sample data to answer the question as I understood it, tell me if it works for you :
set.seed(2)
df <- data.frame(subject=sample(1:3,9,T),
T = sample(c(0,1),9,T),
O = sample(c(0,1),9,T),
E = sample(c(0,1),9,T),
P = sample(c(0,1),9,T),
score=round(rnorm(9,10,3)))
# subject T O E P score
# 1 1 1 0 0 1 12
# 2 3 1 0 1 0 9
# 3 2 0 1 0 1 13
# 4 1 1 0 0 0 3
# 5 3 0 1 0 1 14
# 6 3 0 0 1 0 13
# 7 1 1 0 1 0 17
# 8 3 1 0 1 0 12
# 9 2 0 0 1 1 14
cols1 <- c("T","O","E","P")
df$comb <- apply(df[cols1],1,function(x) paste(names(df[cols1])[as.logical(x)],collapse=""))
# subject T O E P score comb
# 1 1 1 0 0 1 12 TP
# 2 3 1 0 1 0 9 TE
# 3 2 0 1 0 1 13 OP
# 4 1 1 0 0 0 3 T
# 5 3 0 1 0 1 14 OP
# 6 3 0 0 1 0 13 E
# 7 1 1 0 1 0 17 TE
# 8 3 1 0 1 0 12 TE
# 9 2 0 0 1 1 14 EP
library(tidyverse)
df %>%
group_by(subject,comb) %>%
summarize(score=mean(score)) %>%
spread(comb,score) %>%
ungroup
# # A tibble: 3 x 7
# subject E EP OP T TE TP
# * <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 NA NA NA 3 17.0 12
# 2 2 NA 14 13 NA NA NA
# 3 3 13 NA 14 NA 10.5 NA
The second step in base R:
means <- aggregate(score ~ subject + comb,df,mean)
means2 <- reshape(means,timevar="comb",idvar="subject",direction="wide")
setNames(means2,c("subject",sort(unique(df$comb))))
# subject E EP OP T TE TP
# 1 3 13 NA 14 NA 10.5 NA
# 2 2 NA 14 13 NA NA NA
# 5 1 NA NA NA 3 17.0 12

I'd do it like this:
# using your table data
df = read.table(text =
"Subject T O E P Score
1 0 1 0 1 256
2 1 0 1 0 325
2 0 1 0 1 125
3 0 1 0 1 27
4 0 0 0 1 87
5 0 1 0 1 125
6 0 1 1 1 100", stringsAsFactors = FALSE, header=TRUE)
# your desired column names
new_names <- c("T", "O", "E", "P", "TO", "TE", "TP", "OE",
"OP", "PE", "TOP", "TOE", "POE", "PET")
# assigning each of your scores to one of the desired column names
assign_comb <- function(dfrow) {
selection <- c("T", "O", "E", "P")[as.logical(dfrow[2:5])]
do.call(paste, as.list(c(selection, sep = "")))
}
df$comb <- apply(df, 1, assign_comb)
# aggregate all the means together
df_agg <- aggregate(df$Score ~ df$comb + df$Subject, FUN = mean)
# reshape the data to wide format
df_new <- reshape(df_agg, v.names = "df$Score", idvar = "df$Subject",
timevar = "df$comb", direction = "wide")
# clean up the column names to match your desired output
# any column names not found will be added as NA
colnames(df_new) <- gsub("df\\$|Score\\.", "", colnames(df_new))
df_new[, new_names[!new_names %in% colnames(df_new)]] <- NA
df_new <- df_new[, c("Subject", new_names)]
With the result:
> df_new
Subject T O E P TO TE TP OE OP PE TOP TOE POE PET
1 1 NA NA NA NA NA NA NA NA 256 NA NA NA NA NA
2 2 NA NA NA NA NA 325 NA NA 125 NA NA NA NA NA
4 3 NA NA NA NA NA NA NA NA 27 NA NA NA NA NA
5 4 NA NA NA 87 NA NA NA NA NA NA NA NA NA NA
6 5 NA NA NA NA NA NA NA NA 125 NA NA NA NA NA
7 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I create new rows based on cell value? - r

Related

Is there a way to group values in a column between data gaps in R?

trying to calculate sum of row with dataframe having NA values

dplyr conditional column if not null to calculate overall percent

Update multiple NA columns to 0

Cross Tabulation in R Dataframe

Categories

Resources