In a data.frame, I would like to add a column that identifies groups of consecutive days.
I think I need to start by converting my strings to date format...
Here's my example :
mydf <- data.frame(
var_name = c(rep("toto",6),rep("titi",5)),
date_collection = c("09/12/2022","10/12/2022","13/12/2022","16/12/2022","16/12/2022","17/12/2022",
"01/12/2022","03/11/2022","04/11/2022","05/11/2022","08/11/2022")
)
Expected output :
Convert to Date class and do the adjacent diff to create a a logical vector and take the cumulative sum
library(dplyr)
library(lubridate)
mydf %>%
mutate(id = cumsum(c(0, abs(diff(dmy(date_collection)))) > 1)+1)
Related
I have two datasets (data1 and data2).
Data1 has (one of many) a column named: B23333391
Data2 has a column called id_number, where id numbers are listed (e.g. 344444491)
I need to extract the last two digits (91) from the variable in data1 and merge it with the last two digits of the id number in data2 in column id_number
Since the last two digits represents an individual.
E.g.:
Data1:
columns: -> B23333391..... and so on
Data2:
columns: -> id_number
344444491
and so on....
How can this be done?
Thanks in advance!
Try this approach. You can use a dplyr pipeline to format an id variable in both dataframes using substr(). The last two digits can be extracted with nchar(). After that you can merge using left_join(). Here the code with simulated data similar to those shared by you:
library(dplyr)
#Data
df1 <- data.frame(Var1=c('B23333391'),Val1=1,stringsAsFactors = F)
df2 <- data.frame(Varid=c('344444491'),Val2=1,stringsAsFactors = F)
#Merge
dfnew <- df1 %>%
mutate(id=substr(Var1,nchar(Var1)-1,nchar(Var1))) %>%
left_join(df2 %>% mutate(id=substr(Varid,nchar(Varid)-1,nchar(Varid))))
Output:
Var1 Val1 id Varid Val2
1 B23333391 1 91 344444491 1
There are three columns: website, Date ("%Y %m"), click_tracking (T/F). I would like to add a variable describing the number of websites whose click tracking = T in each month / the number of all website in that month.
I thought the steps would be something like:
aggregate(sum(df$click_tracking = TRUE), by=list(Category=df$Date), FUN = sum)
as.data.frame(table(Date))
Then somehow loop through Date and divide the two variables above which would have been already grouped by Date. How can I achieve this? Many thanks!
If we are creating a column, then do a group by 'Date' and get the sum of 'click_tracking' (assuming it is a logical column - TRUE/FALSE) iin mutate
library(dplyr)
df %>%
group_by(Date) %>%
mutate(countTRUE = sum(click_tracking))
If the column is factor, convert to logical with as.logical
df %>%
group_by(Date) %>%
mutate(countTRUE = sum(as.logical(click_tracking)))
If it is to create a summarised output
df %>%
group_by(Date) %>%
summarise(countTRUE = sum(click_tracking))
In the OP's code, = (assignment) is used instead of == in sum(df$click_tracking = TRUE) and there is no need to do a comparison on a logical column
aggregate(cbind(click_tracking = as.logical(click_tracking)) ~ Date, FUN = sum)
This will create the proportion of websites with click tracking (out of all websites) per month.
aggregate(data=df, click_tracking ~ Date, mean)
I am trying to generate a new column with values derived from the original chart. I would like to calculate the group average of same hotel and same date first, then use this group averages to divide the original sales.
Here is my code: I tried to calculate the group average by using group_by and summarise embedding in dplyr package, however, it did not generate my expected results.
hotel = c(rep("Hilton",3), rep("Caesar",3))
date1 = c(rep('2018-01-01',2), '2018-01-02', rep('2018-01-01',3))
dba = c(2,0,1,3,2,1)
sales = c(3,5,7,5,2,3)
df = data.frame(cbind(hotel, date1, dba, sales))
df1 = df %>%
group_by(date1, hotel) %>%
dplyr::summarise(avg = mean(sales)) %>%
acast(., date1~hotel)
Any suggestion would be highly appreciated!
Instead of summarise, we can use mutate. After grouping by 'date1', 'hotel', divide the 'sales' by the mean of 'sales' to create a new column
library(tidyverse)
df %>%
group_by(date1, hotel) %>%
mutate(SalesDividedByMean = sales/mean(sales))
NOTE: When there are columns having different types, cbinding results in a matrix and matrix can have only a single type. So, a character class vector can change the whole data into character. Wrapping with data.frame, propagate that change into either factor (by default stringsAsFactors = TRUE or `character)
data
df <- data.frame(hotel, date1, dba, sales)
I have a data frame where I had to convert all variables to the character class in order to bind_rows(). Now I want to identify and convert the columns that have numbers in them back to class numeric. I have 41 values so I don't want to have to mutate each of them separately.
Preferably the tidyverse way.
library(dplyr)
data_frame(number_var = as.character(rnorm(1:26)),
character_var = LETTERS)
You could use parse_guess from readrpackage:
library(dplyr)
library(readr)
df <- data_frame(number_var = as.character(rnorm(1:26)),
character_var = LETTERS)
df %>%
mutate_all(parse_guess) # guess column type for each column
We have an arbitrary dataset, called df:
enter <- c("2017-01-01", "2018-02-02", "2018-03-03")
guest <- c("Foxtrot","Uniform","Charlie","Kilo")
disposal <- c("2017-01-05", "2018-02-05", "2018-03-09")
rating <- c("60","50","50")
date <- c("2017-04-10", "2018-04-15", "2018-04-20")
clock <- c("16:02:00", "17:02:00", "18:02:00")
rolex <- c("20:10:00", "20:49:00", "17:44:00")
df <- data.frame(enter,guest,disposal,rating,date,clock,rolex, stringsAsFactors = F)
What I try to accomplish is to change the columns enter, disposal, and date from character to date, using dplyr package.
So, I came up with the following, simply by chaining it together:
library(dplyr)
library(chron)
df2 <- df %>% mutate(enter = as.Date(enter, format = "%Y-%m-%d"))
%>% mutate(disposal = as.Date(disposal, format = "%Y-%m-%d"))
%>% mutate(date = as.Date(date, format = "%Y-%m-%d"))
What I am after is this: which mutate function is needed from dplyr to get rid of the multiple chaining, i.e. when we have lots of columns with arbitrary namings that imply dates? I want to specify the columns by name, and then apply the as.Date function to change them from character to date.
Some solutions to different operations that are not applicable to this case:
1: convert column in data.frame to date
2: convert multiple columns to dates with lubridate and dplyr
3: change multiple character columns to date
For example, I've tried, but with no luck:
df2 <- df %>% mutate_at(data = df, each_of(c(enter, disposal, date)) = as.Date(format = "%Y-%m-%d"))
as given here: dplyr change many data types
As a bonus
Notice the clock and rolex columns. Using the chron package simply converts them to the right format, i.e. time instead of character
df2 <- df %>% mutate(clock = chron(times = clock)) %>% mutate(rolex = chron(times = rolex))
As suggested here:
convert character to time in r
Now, is the same solution available without all the chaining, especially when we have an arbitrary amount of columns with different namings etc.?
You just need to tweak the arguments of mutate_at. Any additional arguments to as.Date are specified as arguments to mutate_at.
df2 <- df %>% mutate_at(vars(enter,disposal,date), as.Date, format="%Y-%m-%d")
The second part of your question has a similar solution.
df2 <- df2 %>% mutate_at(vars(clock, rolex), function(x) chron(times. = x))