Updating all columns of a table dependent on a single column - r

I have a table that consists of only columns of type Date. The data is about shopping behavior of customers on a website. The columns correspond to the first time an event is triggered by a customer (NULL if no occurrence of the event). One of the columns is the purchase motion.
I want to update the table so that all the for a particular row, all the cells that do did not happen 7 days prior to the purchase are replaced with NULL. I'm looking for some guidance in coding this single line. I've tried utilizing mutate_all() to no avail.

Assuming your data is called df, date columns start with date_ and the purchase date is purchase_date, perhaps something like this?
mutate(
df,
across(starts_with(“date_”),
~ifelse(purchase_date - . < -7,
NA,
.))
)

Related

Looking for an R function to divide data by date

I'm just 2 days into R so I hope I can give enough Info on my problem.
I have an Excel Table on Endothelial Cell Angiogenesis with Technical Repeats on 4 different dates. (But those Dates are not in order and in different weeks)
My Data looks like this (of course its not only the 2nd of March):
I want to average the data on those 4 different days, so I can compare i.e the "Nb Nodes" from day 1 to day 4.
So to finally have a jitterplot containing the group, the investigated Data Point and the date.
I'm a medical student so I dont really have yet any knowledge about this kind of stuff but Im trying to learn it. Hopefully I provided enough Info!
Found the solution:
#Group by
library(dplyr)
DateGroup <- group_by(Exclude0, Exp.Date, Group)
#Summarizing the mean in every Group and Date
summarise(DateGroup, mymean = mean(Date$`Nb meshes`))
I think the below code will work.
group_by the dimension you want to summarize by
2a. across() is helper verb so that you don't need to manually type each column specifically, it allows us to use tidy select language so that we can quickly reference columns that contains "Nb" (a pattern that I noticed from your screenshot)
2b. With across(), second argument, you then use formula that you want to apply to each column from the first argument of across()
2c. Optional argument in across so that the new columns names have a name convention)
Good luck on your R learning! It's a really great language and you made the right choice.
#df is your data frame
df %>% group_by(Exp.Date) %>%
summarize(across(contains("Nb"),mean,.names = {.fn}_{.col}))
#if you just want a single column then do this
df %>% group_by(Exp.Date) %>%
summarize(mean_nb_nodes=mean(`Nb nodes`))

How to add specific row into data.table?

I have a problem with data table in r. I have created a data table (approx. 1x60 columns) by using "sample" function. (eg column 1 <- sample(data, 1) and so on. I have sampled over two (yes/no) or five values (a/b/c/d/e). So I ended up with 1 row data table with 60 columns, each column contain 'yes', 'no', and a/b/c/d or e value. The problem is that I have to have all combinations. I have tried 'expand grid' function, however, I stuck with 1 million rows, so I have to have more control. Is there any possibility to add another empty row to existing data table and fill that row with the remaining possibilities, then add third row and repeat? I mean: if in 1st column there is 'yes' value in the 1st row, there should be 'no' value in the 2nd row and so on. Please let me know if you have any idea what function could I use. I have spend many hours looking for some answer. Thanks a lot for your help.

Subtract every element in two lists of timestamps from each other, and assign corresponding ID in that row based on results

I have 2 data frames of unequal lengths, each with a column of timestamps. I would like to return the corresponding ID contained in df2 to df1 as a new column if the time difference is less than 60 minutes, meaning that so I know ID#1 in df2 with a specific appointment time are responsible for some of the entries in df1. Each ID should have 8 entries in df1.
To calculate the difference between each element in df1 and df2, I've tried
outer(df1$DataEntryTime, df2$ApptTime, '-')
and got a matrix of results.
enter image description here
What do I need to do next to build a conditional statement so it can return the ID# to df1 based on the results?
Many thanks!

Create a dummy variable for the first time observation for a unique ID

I´m handeling a panel data frame with a list of unique names and dates of recording. I want to create a dummy variable for the first time a name is recorded - i.e. a dummy that takes the value 1 the first time a name observation is recorded, and 0 for the second, third ... time.
Any help would be appreciated.
Maybe you could use combine from gdata package. It merges your dataframe into one keeping a column to identify them. After that you could do some count using dplyr, something like that
newdf=gdata::combine(df1,df2)
newdf%>%group_by(your_variable_of_interest)%>%mutate(count_index= row_number())
From that you will a the number of occurence from 1 to n and you will just have to replace n>1 by 0
Would it work for you ? Otherwise we will need a reproducible example

Count and summarise ID and the Date of Purchase while creating a third column that reflects the amount of Purchased per one day and Customer

Good afternoon dear Community,
I am quite new in the R language so forgive me if I am not to precise or specific with my description of the problem yet.
I have a data frame which contains two columns. First one being the ID and second one being the Date of purchase. However, some ID's appear more often during one Date and I would like to summarise the ID and Date, while the third column (amount of Purchases) reflects the quantity of purchases.
ID and Purchase Date
Many thanks in Advance.
There is an R package called dplyr that makes this kind of aggregation very easy. In your case you could summarise the data using a few lines of code.
library(dplyr)
results <- df %>%
group_by(ID, Date) %>%
summarise(numPurchases = n(),
totalPurchases = sum(Quantity))
df would be your input data. Your results will have the ID and Date columns, as well as a new column that counts the number of sales per ID per Date (numPurchases) and a new column that shows the total quantity of purchases per ID per date (totalPurchases). Hope that helps.

Resources