identifying unique values of a grouped variable [duplicate] - r

This question already has answers here:
How to count the number of unique values by group? [duplicate]
(1 answer)
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 2 years ago.
I am trying to count the # of unique date values across multiple visits. Here is sample data:
id date
1 2017-08-31
1 2017-08-31
1 2017-05-06
2 2015-09-01
2 2015-11-01
3 2010-12-02
3 2010-12-02
I want a df that shows how many unique dates there are per participant. Something like this:
id total_visit
1 2
2 2
3 1
I tried this code, but it's not doing what I want it to do.
library(tidyverse)
df1 <- df %>% group_by(id) %>% count(distinct(date))
Can someone please help?

Related

How to sum a specific column of replicate rows in dataframe? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
How to group by two columns in R
(4 answers)
Closed 3 years ago.
I have a data frame which contains a lot of replicates rows. I would like to sum up the last column of replicates rows and remove the replications at the same time. Could anyone tell me how to do that?
The example is here:
name <- c("a","b","c","a","c")
position <- c(192,7,6,192,99)
score <- c(1,2,3,2,5)
df <- data.frame(name,position,score)
> df
name position score
1 a 192 1
2 b 7 2
3 c 6 3
4 a 192 2
5 c 99 5
#I would like to sum the score together if the first two columns are the
#same. The ideal result is like this way
name position score
1 a 192 3
2 b 7 2
3 c 6 3
4 c 99 5
Sincerely thank you for the help.
try this :
library(dplyr)
df %>%
group_by(name, position) %>%
summarise(score = sum(score, na.rm = T))

R data table rows subtraction [duplicate]

This question already has answers here:
subtract value from previous row by group
(3 answers)
Closed 4 years ago.
I have a data table with 117 objects (rows) and 51 variables (columns). I would like to subtract each row from the previous one and post the results in a new data table.
My data table are a time series of interest rates and I want to calculate the daily difference.
apply(dt, MARGIN = 2, diff)
would calculate, for each column, the difference between each element and the previous one.
Try:
a = data.frame(matrix(c(1,1,1,3,3,3,7,7,7),byrow = T,nrow=3))
apply(a,2,diff)
Let's say you have this as example data:
df <- data.frame(date = as.Date(c("2019-01-03", "2019-01-04", "2019-01-05", "2019-01-06")), value = c(3,5,7,6))
date value
1 2019-01-03 3
2 2019-01-04 5
3 2019-01-05 7
4 2019-01-06 6
Then using dplyr from tidyverse you can do this:
library(tidyverse)
df2 <- df %>%
mutate(difference = lag(value, n=1L) - value)
date value difference
1 2019-01-03 3 NA
2 2019-01-04 5 -2
3 2019-01-05 7 -2
4 2019-01-06 6 1
... you'll just need to decide what to do with that first NA in row index 1.

Creating sum of many columns from names in one column [duplicate]

This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Aggregating rows for multiple columns in R [duplicate]
(3 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have a large data frame where I have one column (Phylum) that has repeated names and 253 other columns (each with a unique name) that have counts of the Phylum column. I would like to sum the counts within each column that correspond to each Phylum.
This is a simplified version of what my data look like:
Phylum sample1 sample2 sample3 ... sample253
1 P1 2 3 5 5
2 P1 2 2 10 2
3 P2 1 0 0 1
4 P3 10 12 3 1
5 P3 5 7 14 15
I have seen similar questions, but they are for fewer columns, where you can just list the names of the columns you want summed. I don't want to enter 253 unique column names.
I would like my results to look like this
Phylum sample1 sample2 sample3 ... sample253
1 P1 4 5 15 7
2 P2 1 0 0 1
3 P3 15 19 17 16
I would appreciate any help. Sorry for the format of the question, this is my first time asking for help on stackoverflow (rather than sleuthing).
If your starting file looks like this (test.csv):
Phylum,sample1,sample2,sample3,sample253
P1,2,3,5,5
P1,2,2,10,2
P2,1,0,0,1
P3,10,12,3,1
P3,5,7,14,15
Then you can use group_by and summarise_each from dplyr:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise_each(funs(sum))
(I first loaded tidyverse with library(tidyverse).)
Note that, if you were trying to do this for one column you can simply use summarise:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise(sum(sample1))
summarise_each is required to run that function (in the above, funs(sum)) on each column.

Row numbering by group and date [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
numbering by groups [duplicate]
(8 answers)
Closed 6 years ago.
I have a question about numbering rows by group AND by one further condition. I know how to do this by group but not by adding one further condition.
Suppose I have the ID and the DATE and want to create NUM as shown in the table:
ID ...... DATE...... NUM
1 20160103 ...... 1
1 20160104...... 1
1 20160104...... 2
1 20160105...... 1
1 20160105...... 2
1 20160105...... 3
1 20160106...... 1
2 20160103...... 1
2 20160103...... 2
2 20160105...... 1
Any one knows How to do this?
We can use ave from base R
df$NUM <- with(df, ave(ID, ID, DATE, FUN =seq_along))

R-How to count unique number of rows based on another column [duplicate]

This question already has answers here:
Count variable on a Variable R [duplicate]
(4 answers)
Closed 6 years ago.
I am working on a big set of customer data and trying to find average number of stores a customer visit in each month. In my data I have unique identification number for each customer and the store codes they visited. Sample of my data frame looks like below:
sitecode<-c(1000,1000,1001,1000)
productcode<-c('X','X','Y','X')
customercode<-c('A','B','A','C')
Date<-c('01/01/2016','02/01/2016','03/01/2016','04/01/2016')
data1<-data.frame(customercode,Date,productcode,sitecode)
Based on this what I would like to have is a simple table for customers A-B-C with unique number of stores they visited which is 2 for A, 1 for B and C. Can you help?
data1
# customercode Date productcode sitecode
# 1 A 01/01/2016 X 1000
# 2 B 02/01/2016 X 1000
# 3 A 03/01/2016 Y 1001
# 4 C 04/01/2016 X 1000
result=table(data1$customercode,data1$sitecode)
result
1000 1001
A 1 1
B 1 0
C 1 0
Hope this helped to some extent.

Resources