Build rowSums in dplyr based on columns containing pattern in their names [duplicate] - r

This question already has answers here:
Sum across multiple columns with dplyr
(8 answers)
R, create a new column in a data frame that applies a function of all the columns with similar names
(3 answers)
Closed 4 years ago.
My data frame looks something like this
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3
A 1 0 1 1
A 2 1 1 2
A 3 3 0 0
With dplyr I want to build a columns that sums the values of the count-variables for each row, selecting the count-variables based on their name.
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
A 1 0 1 1 2
A 2 1 1 2 4
A 3 3 0 0 3
How do I do that?

As you asked for a dplyr solution, you can do:
library(dplyr)
df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3

Related

Calculate difference between current row and first row within group [duplicate]

This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6
You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()

How to add a row to data frame if there are different number of columns? [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Efficient way to rbind data.frames with different columns
(4 answers)
Closed 7 months ago.
I have several dataframes with a single row per dataframe and I have an empty dataframe (let's name it 'total'). All dataframes have different count of columns, but some of them intersect. Total dataframe has all possible columns, so if I add any row from those dataframes it should match existed columns and fill values in accordance (if a column doesn't exist in adding row it should be filled as 0).
Example of dataframes with data:
A B C B E C E K J
1 2 5 4 2 3 2 5 7
Example of total dataframe:
A B C E K J
1 2 5 0 0 0
0 4 0 2 0 0
0 0 3 2 5 7
So, how to do that? I've tried various binds and inserts but they don't work - in some cases added row changes total dataframe column amount, in some cases added row just duplicates previous row etc.
A possible solution:
library(tidyverse)
bind_rows(df1, df2, df3) %>%
mutate(across(everything(), ~ replace_na(.x, 0)))
#> A B C E K J
#> 1 1 2 5 0 0 0
#> 2 0 4 0 2 0 0
#> 3 0 0 3 2 5 7

how to change my dataframe based on value of a column [duplicate]

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 3 years ago.
there is a dataframe with two column as below,and i want to change it into a dataframe with 3 column
df <- data.frame(key=c('a','a','a','b','b'),value=c(1,2,2,1,3))
I have tried it in python,that's ok,but in r i have no idea
the expect output should be like
1 2 3
a 1 2 0
b 1 0 1
library(data.table)
dcast(key~value, data=df, fun.aggregate=length)
# key 1 2 3
# 1 a 1 2 0
# 2 b 1 0 1

How to create a new row that would show me the number of observations in a group in an unbalanced panel dataset in R? [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 6 years ago.
I have a dataset that looks like this:
id time
1 1
1 2
2 5
2 3
3 2
3 7
3 8
And I want to add another column to show me how many observations there are in a group.
id time label
1 1 1
1 2 2
2 5 1
2 3 2
3 2 1
3 7 2
3 8 3
We can use ave
df1$label <- with(df1, ave(seq_along(id), id, FUN=seq_along))
Or with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(label = row_number())

R is it possible to get the output of table() using dcast? [duplicate]

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have the following data frame:
id<-c(1,2,3,4,1,1,2,3,4,4,2,2)
period<-c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df<-data.frame(id,period)
typing
table(df)
results in
period
id calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
Is there any way to get the same result using 'dcast' and save it as a new data frame?
Yes, there is a way:
library(reshape2)
dcast(df, id ~ period, length)
Using period as value column: use value.var to override.
id calib first valid
1 1 1 2 0
2 2 2 0 2
3 3 0 0 2
4 4 1 1 1
You can also type just dcast(df, id ~ period) and length will be chosen by default too. As I can see, you tried to find this out in your another question. Extended solution without dcast would look like this:
df <- data.frame(unclass(table(df)))
df$ID <- rownames(df)
df
calib first valid ID
1 1 2 0 1
2 2 0 2 2
3 0 0 2 3
4 1 1 1 4

Resources