I am new to R and trying to summarize a dataframe with multiple functions and I would like the result to appear in the same column, rather than in separated columns for each function. For example, my data set looks something like this
data =
A B
----
1 2
2 2
3 2
4 2
And I call summarize_all(data, c(min, max)) the dataframe becomes
a_fn1 b_fn1 a_fn2 b_fn2
1 2 4 2
How can I make it so that the result of the summarize_all becomes this:
A B
----
1 2
4 2
Thanks
Does this work:
library(dplyr)
bind_rows(apply(data,2,min),apply(data,2,max))
# A tibble: 2 x 2
A B
<dbl> <dbl>
1 1 2
2 4 2
Here is an option with transpose
library(dplyr)
library(tidyr)
pivot_longer(df1, cols = everything()) %>%
group_by(name) %>%
summarise(min = min(value), max = max(value)) %>%
data.table::transpose(., make.names = 'name')
A B
1 1 2
2 4 2
data
df1 <- structure(list(A = 1:4, B = c(2L, 2L, 2L, 2L)),
class = "data.frame", row.names = c(NA,
-4L))
Related
I have a table that is somewhat like this:
var
RC
distance50
2
distance20
4
precMax
5
precMin
1
total_prec
8
travelTime
5
travelTime
2
I want to sum all similar type variables, resulting in something like this:
var
sum
dist
6
prec
14
trav
7
Using 4 letters is enough to separate the different types. I have tried and tried but not figured it out. Could anyone please assist? I generally try to work with dplyr, so that would be preferred. The datasets are small (n<100) so speed is not required.
Base R solution:
aggregate(
RC ~ var,
data = transform(
with(df, df[!(grepl("total", var)),]),
var = gsub("^(\\w+)([A-Z0-9]\\w+$)", "\\1", var)
),
FUN = sum
)
Data:
df <- structure(list(var = c("distance50", "distance20", "precMax",
"precMin", "total_prec", "travelTime", "travelTime"), RC = c(2L,
4L, 5L, 1L, 8L, 5L, 2L)), class = "data.frame", row.names = c(NA,
-7L))
library(dplyr)
library(tidyr)
df %>%
separate(var, c("var", "b"), sep = "[_A-Z0-9]", extra = "merge") %>%
group_by(var = ifelse(b %in% var, b, var)) %>%
summarize(RC = sum(RC), .groups = "drop")
separate var into two columns by splitting on underscores (_), capital letters A-Z or numbers 0-9.
In the group_by statement, if the second column can be found in the first then fill the first column.
Lastly, sum RC by group.
Output
var RC
<chr> <int>
1 distance 6
2 prec 14
3 travel 7
tibble(
var=c('dista', 'distb', 'travelTime'),
rc=2:4) %>%
print() %>%
# A tibble: 3 x 2
# var rc
# <chr> <int>
#1 dista 2
#2 distb 3
#3 travelTime 4
group_by(var=str_sub(var, end=4)) %>%
print() %>%
# A tibble: 3 x 2
# Groups: var [2]
# var rc
# <chr> <int>
#1 dist 2
#2 dist 3
#3 trav 4
summarise(sum=sum(rc))
# A tibble: 2 x 2
# var sum
# <chr> <int>
#1 dist 5
#2 trav 4
I would like to use R to create an expanded_df from a template_df, where each row is repeated by a number of times specified in a separate column in the template_df, and an integer count is concatenated to the ID column in the expanded_df, specifying the number this row has been repeated in the expanded_df.
I would like this count to start at 600 for each ID class.
E.g., template_df:
Initial_ID Count
a 2
b 3
c 1
d 4
expanded_df:
Expanded_ID
a-600
a-601
b-600
b-601
b-602
c-600
d-600
d-601
d-602
d-603
Anyone have any ideas? Thanks!
We may use uncount to expand the rows and then get the rowid (of the 'Initial_ID' to paste after adding 599
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
template_df %>%
uncount(Count) %>%
transmute(Expanded_ID = str_c(Initial_ID, 599 + rowid(Initial_ID), sep = '-'))
-output
Expanded_ID
1 a-600
2 a-601
3 b-600
4 b-601
5 b-602
6 c-600
7 d-600
8 d-601
9 d-602
10 d-603
Or using base R with rep and paste
data.frame(Expanded_ID = with(template_df, paste0(rep(Initial_ID, Count), "-",
599 + sequence(Count))))
-output
Expanded_ID
1 a-600
2 a-601
3 b-600
4 b-601
5 b-602
6 c-600
7 d-600
8 d-601
9 d-602
10 d-603
data
template_df <- structure(list(Initial_ID = c("a", "b", "c", "d"), Count = c(2L,
3L, 1L, 4L)), class = "data.frame", row.names = c(NA, -4L))
An alternative dplyr solution:
library(dplyr)
template_df %>%
group_by(Initial_ID) %>%
slice(rep(1:n(), each = Count)) %>%
mutate(row = 600 + row_number()-1) %>%
ungroup() %>%
transmute(Expanded_ID = paste(Initial_ID,row, sep = "-"))
Expanded_ID
<chr>
1 a-600
2 a-601
3 b-600
4 b-601
5 b-602
6 c-600
7 d-600
8 d-601
9 d-602
10 d-603
I currently have a data frame of this structure
ID-No cigsaday activity
1 NA 1
2 NA 1
1 5 NA
2 5 NA
I want to concatenate the rows with the identical ID numbers and create a new data frame that is supposed to look like this
ID-No cigsaday activity
1 5 1
2 5 1
The data frame includes characters as well as numerical, in this way we would match based on a participant ID which occurs 4 times in the dataset within the first column.
Any help is appreciated!
A data.table option
> setDT(df)[, lapply(.SD, na.omit), ID_No]
ID_No cigsaday activity
1: 1 5 1
2: 2 5 1
Data
> dput(df)
structure(list(ID_No = c(1L, 2L, 1L, 2L), cigsaday = c(NA, NA,
5L, 5L), activity = c(1L, 1L, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))
Many ways lead to Rome. For the sake of completeness, here are some other approaches which return the expected result for the given sample dataset. Your mileage may vary.
1. dplyr, na.omit()
library(dplyr)
df %>%
group_by(ID_No) %>%
summarise(across(everything(), na.omit))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 3
ID_No cigsaday activity
<int> <int> <int>
1 1 5 1
2 2 5 1
Note, this a dplyr version of ThomasIsCoding's answer.
2. dplyr, reduce(), coalesce()
library(dplyr)
df %>%
group_by(ID_No) %>%
summarise(across(everything(), ~ purrr::reduce(.x, coalesce)))
3. data.table, fcoalesce()
library(data.table)
setDT(df)[, lapply(.SD, function(x) fcoalesce(as.list(x))), ID_No]
ID_No cigsaday activity
1: 1 5 1
2: 2 5 1
4. data.table, Reduce(), fcoalesce()
library(data.table)
setDT(df)[, lapply(.SD, Reduce, f = fcoalesce), ID_No]
A possible solution using na.locf() which replaces a value with the most recent non-NA value.
library(zoo)
dat %>%
group_by(IDNo) %>%
mutate_at(vars(-group_cols()),.funs=function(x) na.locf(x)) %>%
distinct(IDNo,cigsaday,activity,.keep_all = TRUE) %>%
ungroup()
I have multiple columns that I need to merge and return a contingency table counting each number.
Example of an ordinal data set:
df <- data.frame(ab = c(1,2,3,4,5),
ba = c(1,3,3,3,5))
>ab ba
1 1
2 3
3 3
4 3
5 5
I would like to be able to return a contingency table showing like this:
>1 2 3 4 5
2 1 4 1 2
Ive attempted examples featured on here for similar issues, but I get the sums returned instead of a count:
library(plyr)
colSums(rbind.fill(data.frame(t(unclass(df$ab))), data.frame(t(unclass(df$ba)))),`
na.rm = T)
Any help is greatly appreciated
We unlist the data.frame into a vector and apply table in base R
table(unlist(df))
# 1 2 3 4 5
# 2 1 4 1 2
Or with tidyverse, by reshaping the data into 'long' format with pivot_longer and get the count
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything()) %>%
count(value)
data
df <- structure(list(ab = 1:5, ba = c(1L, 3L, 3L, 3L, 5L)),
class = "data.frame", row.names = c(NA,
-5L))
I have a data frame with object names and a list of statistical moments for that object, like this:
Object Mean IQR Skew
x 1 1 1
y 2 2 2
z 3 3 3
What i want is to for each row create columns with the statistical moments and the object name prefixed. Like so:
xMean xIQR xSkew yMean yIQR ySkew zMean zIQR zSkew
1 1 1 2 2 2 3 3 3
In essence what I need is to collapse a data frame to a single row such that it list all statistical moments on a single line as i'll have many rows like the final one but a finite set of columns.
You could do:
df1$id <- 1
reshape(df1, idvar="id", timevar="Object", direction="wide")[-1]
# Mean.x IQR.x Skew.x Mean.y IQR.y Skew.y Mean.z IQR.z Skew.z
#1 1 1 1 2 2 2 3 3 3
Or using dcast, melt from reshape2
library(reshape2)
dcast(melt(df1, id.var=c('id', 'Object')), id~..., value.var='value')[-1]
# x_Mean x_IQR x_Skew y_Mean y_IQR y_Skew z_Mean z_IQR z_Skew
#1 1 1 1 2 2 2 3 3 3
Or using dplyr and tidyr
library(dplyr)
library(tidyr)
df1 %>%
gather(Var, Val, Mean:Skew) %>%
unite(VarNew,Object, Var, sep="") %>%
spread(VarNew, Val) %>%
select(-id)
# xIQR xMean xSkew yIQR yMean ySkew zIQR zMean zSkew
#1 1 1 1 2 2 2 3 3 3
data
df1 <- structure(list(Object = c("x", "y", "z"), Mean = 1:3, IQR = 1:3,
Skew = 1:3), .Names = c("Object", "Mean", "IQR", "Skew"), class = "data.frame", row.names = c(NA,
-3L))
Or maybe something like
setNames(unlist(data.frame(t(df[-1]))), paste0(rep(df[, 1], each = nrow(df)), names(df[, -1])))
# xMean xIQR xSkew yMean yIQR ySkew zMean zIQR zSkew
# 1 1 1 2 2 2 3 3 3