R counter, counting frequency in a table [duplicate] - r

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Add column with order counts
(2 answers)
Closed 6 years ago.
I have following data set
id year
2 20332 2005
3 6383 2005
14 20332 2006
15 6806 2006
16 23100 2006
I would like to have an additional column, which counts the number of years the id variable is already available:
id year Counter
2 20332 2005 1
3 6383 2005 1
14 20332 2006 2
15 6806 2006 1
16 23100 2006 1
The dataset is currently not sorted according to the year. I thought about mutate rather than a function.
Any ideas? Thanks!

We can use ave from base R
df1$Counter <- with(df1, ave(id, id, FUN = seq_along))

Related

Count for every unique value in a column - R [duplicate]

This question already has answers here:
How to count the number of unique values by group? [duplicate]
(1 answer)
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 2 years ago.
I have a dataframe that contains a column representing the 'Year' and another column that represents 'Type':
a Year Creams
1 2004 11
2 2004 12
3 2001 13
4 2004 14
5 2002 15
. .... ..
How do I count every year in column 'Year' so it appears as:
a Year TypeCount
1 2004 3
2 2002 1
3 2001 1
It can be output into another dataframe, I don't mind. I just need it to be suitable to make a graph out of it at the end.

how to extract the value from multiple columns in a specific order [duplicate]

This question already has answers here:
Get Value of last non-empty column for each row [duplicate]
(3 answers)
Closed 4 years ago.
I have this dataset that contains variables from three previous years.
data <- read.table(text="
a 2015 2016 2017
1 100 100 100
2 1000 5 NA
3 10000 NA NA", header=TRUE)
I would like to create a new column in my data which contains the value from the most recent year. The order is 2017 ->2016 ->2015.
output <- read.table(text="
a 2015 2016 2017 recent
1 100 100 100 100
2 1000 5 NA 5
3 10000 NA NA 10000", header=TRUE)
I know that I can use "if" command to achieve it, but I am wondering if there is a quick and simple way to do it.
Thanks!
Here's a simple base R solution. This assumes that the years are sorted from left-right.
data$recent <- apply(data, 1, function(x) tail(na.omit(x), 1))
a X2015 X2016 X2017 recent
1 1 100 100 100 100
2 2 1000 5 NA 5
3 3 10000 NA NA 10000

Aggregates by group and including counts across rows [duplicate]

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.
You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.

Create a local id for a combination of 2 columns [duplicate]

This question already has answers here:
R - add column that counts sequentially within groups but repeats for duplicates
(3 answers)
Closed 7 years ago.
I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:
business_id year
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2009
UsFtqoBl7naz8AVUBZMjQQ 2004
UsFtqoBl7naz8AVUBZMjQQ 2005
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2008
cE27W9VPgO88Qxe4ol6y_g 2010
I want to turn it into this:
business_id year yr_id
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2009 2
UsFtqoBl7naz8AVUBZMjQQ 2004 1
UsFtqoBl7naz8AVUBZMjQQ 2005 2
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2008 2
cE27W9VPgO88Qxe4ol6y_g 2010 3
In other words, I want the ID to be sequential to the year, but local to the business_id, so that it resets when the program finds another business_id.
Is this something that is easily achievable in R?
I found this other question in SO, and the answer effectively answers this question, so this should be marked as duplicate.
https://stackoverflow.com/a/27896841/4858065
The way to achieve this is:
df %>% group_by(business_id) %>%
mutate(year_id = dense_rank(year))

How to create a step-by-step cumulation of data? [duplicate]

This question already has answers here:
Calculating cumulative sum for each row
(6 answers)
Closed 7 years ago.
Probably my question is really dull but I couldn't find an easy solution for that. So we have a data.frame without (overall) column. Overall column must present a cumulative number of pies (in my case) eaten up to a certain time period. What is the easiest way to create it in R for an infinite number of rows? Thanks!
Year Pies eaten Pies eaten(overall)
1 1960 3 3
2 1961 2 5
3 1962 5 10
4 1963 1 11
5 1964 7 18
6 1965 4 22
We can use cumsum
df1$Pies_eaten_Overall <- cumsum(df1$Pies_eaten)

Resources