Join numbers together in R [duplicate] - r

This question already has answers here:
Paste multiple columns together
(11 answers)
Closed 7 months ago.
I'm a beginner to R. What I want to do is join numbers together. I made a data as follows:
data<-data.frame(year=c(2020,2021,2022),month=c(10,11,12))
My expected output is as follows:
data=data.frame(year=c(2020,2021,2022),month=c(10,11,12),year_month=c(202010,202111,202212))
year_month is the column joining year and month together.
How can I do this?

You could concatenate the columns using paste0 like this:
data<-data.frame(year=c(2020,2021,2022),month=c(10,11,12))
data$year_month <- do.call(paste0, data)
data
#> year month year_month
#> 1 2020 10 202010
#> 2 2021 11 202111
#> 3 2022 12 202212
Created on 2022-07-30 by the reprex package (v2.0.1)

Related

Having aggregated data - wanna have data for each element [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
Hei,
My aim is to do a histogramm.
Therefor I need unaggregated data - but unfortunately I only have it in aggregated form.
My data:
tribble(~date,~groupsize,
"2020-09-01",3,
"2020-09-02",2,
"2020-09-03",1,
"2020-09-04",2)
I want to have:
tribble(~date,~n,
"2020-09-01",1,
"2020-09-01",1,
"2020-09-01",1,
"2020-09-02",1,
"2020-09-02",1,
"2020-09-01",1,
"2020-09-04",1,
"2020-09-04",1)
I think this is really simple, but I am at a loss. Sorry for that!
What can I do? I really like dplyr solutions :-)
Thank you!
repeat the date according to groupsize.
res <- data.frame(date=rep(dat$date, dat$groupsize), n=1)
res
# date n
# 1 2020-09-01 1
# 2 2020-09-01 1
# 3 2020-09-01 1
# 4 2020-09-02 1
# 5 2020-09-02 1
# 6 2020-09-03 1
# 7 2020-09-04 1
# 8 2020-09-04 1

How to combine data in rows into a new column and into a new data frame [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a data frame that has multiple entries on the same day with a TSS score.
athlete workoutday tss
1 Athlete_1 2020-03-20 30
2 Athlete_1 2020-03-20 21
3 Athlete_1 2020-03-20 64
I would like some help in knowing how to combine the tss scores into into a new column and be put into a new data frame so that there is only 1 entry for each athlete.
for example
athlete workoutday tss
1 Athlete_1 2020-03-20 115
2
3
Cheers
SELECT Athlete_1,workoutday, (select SUM(tss) from your_table where athlete='Athlete_1')
as tss
FROM your_table
GROUP BY Athlete_1;

Combining rows from the same data frame [duplicate]

This question already has answers here:
Merge multiple variables in R
(6 answers)
How to implement coalesce efficiently in R
(9 answers)
Closed 3 years ago.
I am trying to write a code that create a new column to combine two rows together. The idea is to add the row when there is NA.
The new column will be the "EventDate
Here is a sample data frame:
Id SDate CDate EventDate
101 2013-03-27 NA 2013-03-27
101 2013-05-09 NA 2013-05-09
101 NA 2013-05-30 2013-05-30
101 NA 2013-07-26 2013-07-26
We can use coalesce
library(tidyverse)
df1 %>%
mutate(EventDate = coalesce(SDate, CDate))

Ntile and decile function depended on two columns in R [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I would like to have a new column with Ntile but it should depend on column 1 - "year" and show the ntile number for column 2 - "mileage".
year mileage
<dbl> <dbl>
1 2011 7413
2 2011 10926
3 2011 7351
4 2011 11613
5 2012 8367
6 2010 25125
mydata$Ntile <- ntile(mydata$mileage, 10)
I know the easy to use function ntile, but I do not know how to make it depend on 2 columns. I would like to have ntiles for mileage but for each year, 2010, 2011 and 2012 to be calculated in new column "Ntile".
PS: I know there is not enough data to calculate Ntiles for 2011 and 2012, it is just an example.
I like the data.table approach:
library(data.table)
mydata <- as.data.table(mydata)
mydata[, Ntile:=ntile(mileage,10), by=year]
Best!

How to sum a variable by group but do not aggregate the data frame in R? [duplicate]

This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 4 years ago.
although I have found a lot of ways to calculate the sum of a variable by group, all the approaches end up creating a new data set which aggregates the double cases.
To be more precise, if I have a data frame:
id year
1 2010
1 2015
1 2017
2 2011
2 2017
3 2015
and I want to count the number of times I have the same ID by the different years, there are a lot of ways (using aggregate, tapply, dplyr, sqldf etc) which use a "group by" kind of functionality that in the end will give something like:
id count
1 3
2 2
3 1
I haven't managed to find a way to calculate the same thing but keep my original data frame, in order to obtain:
id year count
1 2010 3
1 2015 3
1 2017 3
2 2011 2
2 2017 2
3 2015 1
and therefore do not aggregate my double cases.
Has somebody already figured out?
Thank you in advance

Resources