R data table rows subtraction [duplicate] - r

This question already has answers here:
subtract value from previous row by group
(3 answers)
Closed 4 years ago.
I have a data table with 117 objects (rows) and 51 variables (columns). I would like to subtract each row from the previous one and post the results in a new data table.
My data table are a time series of interest rates and I want to calculate the daily difference.

apply(dt, MARGIN = 2, diff)
would calculate, for each column, the difference between each element and the previous one.
Try:
a = data.frame(matrix(c(1,1,1,3,3,3,7,7,7),byrow = T,nrow=3))
apply(a,2,diff)

Let's say you have this as example data:
df <- data.frame(date = as.Date(c("2019-01-03", "2019-01-04", "2019-01-05", "2019-01-06")), value = c(3,5,7,6))
date value
1 2019-01-03 3
2 2019-01-04 5
3 2019-01-05 7
4 2019-01-06 6
Then using dplyr from tidyverse you can do this:
library(tidyverse)
df2 <- df %>%
mutate(difference = lag(value, n=1L) - value)
date value difference
1 2019-01-03 3 NA
2 2019-01-04 5 -2
3 2019-01-05 7 -2
4 2019-01-06 6 1
... you'll just need to decide what to do with that first NA in row index 1.

Related

Is there a way to use for loops (or nested for loops) to use date information from one dataframe and find the corresponding date from another df?

The following question is a rather general question. I have a data frame with certain individuals and some dates on the corresponding row. What I would like to do is using another daily data frame, find information pertaining to the consecutive days based on the date of the individual. For example, if I have individual X born on 01-01-2000 (1st df), using a function, I would like to find 01-01-2000 in the daily data frame (2nd df) and find the mean of the first 3 days post birth (namely 01-01-2000 : 05-01-2000) and then add it to a new column of the 1st df. Its not important what mean, it could be weight, sunlight hours, or number of calls. This question may be a bit vague so if someone could interpret this text, any help would be appreciated.
name<-c("A","B","C","D")
dob<-c("01-01-2000","02-01-2000","03-01-2000","08-01-2000")
df1<-data.frame(name,dob)
name dob
1 A 01-01-2000
2 B 02-01-2000
3 C 03-01-2000
4 D 08-01-2000
date<- c("31-12-1999","01-01-2000","02-01-2000","03-01-2000","04-01-2000","05-01-2000","06-01-2000","07-01-2000","08-01-2000","09-01-2000","10-01-2000","11-01-2000")
calls<-c(0,0,1,2,2,2,0,0,1,4,2,3)
df2<-data.frame(date,calls)
date calls
1 31-12-1999 0
2 01-01-2000 0
3 02-01-2000 1
4 03-01-2000 2
5 04-01-2000 2
6 05-01-2000 2
7 06-01-2000 0
8 07-01-2000 0
9 08-01-2000 1
10 09-01-2000 4
11 10-01-2000 2
12 11-01-2000 3
What I would like is the following;
name dob mean.call
1 A 01-01-2000 1.00
2 B 02-01-2000 1.67
3 C 03-01-2000 2.00
4 D 08-01-2000 2.33
As the data frames are rather large, I would like to implement for loops.
I would calculate the means first using zoos rollmean function and then join df2 and df1:
library(dplyr)
library(zoo)
df2 %>%
add_row(calls = rep(0, 2)) %>%
mutate(means = rollmean(calls, k = 3, align = "left", fill = NA),
.keep = "unused") %>%
right_join(df1, by = c("date" = "dob")) %>%
select(name, date, means)
This returns
name date means
1 A 01-01-2000 1.000000
2 B 02-01-2000 1.666667
3 C 03-01-2000 2.000000
4 D 08-01-2000 2.333333
Note: I added two dummy rows into df2 to calculate the mean of the last two entries. Since there is no specific rule for those values, I choose to do so. Keep this in mind.

identifying unique values of a grouped variable [duplicate]

This question already has answers here:
How to count the number of unique values by group? [duplicate]
(1 answer)
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 2 years ago.
I am trying to count the # of unique date values across multiple visits. Here is sample data:
id date
1 2017-08-31
1 2017-08-31
1 2017-05-06
2 2015-09-01
2 2015-11-01
3 2010-12-02
3 2010-12-02
I want a df that shows how many unique dates there are per participant. Something like this:
id total_visit
1 2
2 2
3 1
I tried this code, but it's not doing what I want it to do.
library(tidyverse)
df1 <- df %>% group_by(id) %>% count(distinct(date))
Can someone please help?

How can I reshape my data frame to be used with cor() in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am varily new to R and data wrangling.
I would like to calculate a correlation matrix using the cor()-function and for that I have to reshape my data frame. It consists of 1500 different articles (StockCodes) of an online store, with a sale history of 365 days per article (Quantity per Date per StockCode).
I need to reshape it in the form that the Dates become the new row names and the StockCodes become the new column names.
Concider this short example:
my_df <- data.frame(Stockcode = c("A","A", "B","B", "C", "C"), Quantity = c(1,5,3,2,1,4), Date = c("2010-12-01","2010-12-02","2010-12-01","2010-12-02","2010-12-01","2010-12-02") )
looking like this:
Stockcode Quantity Date
1 A 1 2010-12-01
2 A 5 2010-12-02
3 B 3 2010-12-01
4 B 2 2010-12-02
5 C 1 2010-12-01
6 C 4 2010-12-02
And I want it to be transformed into:
df_reshaped <- data.frame(A = c(1,5), B = c(3,2), C = c(1,4), row.names = c("2010-12-01","2010-12-02"))
looking like this:
A B C
2010-12-01 1 3 1
2010-12-02 5 2 4
I achieved this with a for-loop (and successfully calculated my correlation matrix), but the loop took "ages" to be executed (approx. 4 hours).
Is there a proper and faster way?
I would highly appreciate any help!
Here is a way using pivot_wider from tidyr:
my_df %>%
pivot_wider(names_from = Stockcode, values_from = Quantity) %>% ## pivot columns in wide format
column_to_rownames(var = "Date") ## convert Date column to row names
# A B C
#2010-12-01 1 3 1
#2010-12-02 5 2 4

R time series with missing dates [duplicate]

This question already has answers here:
R ts with missing values
(4 answers)
Closed 4 years ago.
I have a vector of numbers and a corresponding vector of dates (monthly). Some months are missing so I would like to create a time series object that includes NA for the missing dates.
x = c(1,2,3,4)
dates = c('2000-01-01','2000-02-01','2000-04-01','2000-07-01')
Is there an easy way to get a time series object that goes from '2000-01-01' to '2000-07-01' that includes NAs for the missind dates?
You can use padr package to do that
df <- data.frame(x = c(1,2,3,4),
dates = c('2000-01-01','2000-02-01','2000-04-01','2000-07-01'))
library(padr)
df %>%
mutate(dates = as.Date(dates)) %>%
pad()
pad applied on the interval: month
x dates
1 1 2000-01-01
2 2 2000-02-01
3 NA 2000-03-01
4 3 2000-04-01
5 NA 2000-05-01
6 NA 2000-06-01
7 4 2000-07-01

Selecting dates from two dataframes and creating a new dataframe in R [duplicate]

This question already has an answer here:
how to find dates that overlap from two different dataframes and subset
(1 answer)
Closed 4 years ago.
I would like to select the dates (in date B) that are closest to Date A and then create a new dataframe with these matches. There can be multiple rows for each ID (ie. multiple date combinations). I am using dplyr and data.table packages
dataframe A
ID DATE A
3 15/05/06
5 14/11/05
8 25/11/08
1 16/12/10
1 5/01/12
1 24/07/14
dataframe B
ID DATE B
3 12/12/05
3 17/04/06
5 25/07/05
5 26/09/05
5 1/12/05
8 12/09/08
8 13/11/08
8 23/12/08
8 31/03/09
1 26/11/10
1 12/08/11
1 12/11/11
1 14/03/14
1 8/08/14
Resultant dataframe:
ID DATE A DATE B
3 15/05/06 17/04/06
5 14/11/05 1/12/05
8 25/11/08 13/11/08
1 16/12/10 26/11/10
1 5/01/12 12/11/11
1 24/07/14 8/08/14
An idea is to merge on ID, subtract the dadtes and keep the minimum, i.e.
d1 <- transform(merge(df1, df2, by = 'ID'),
diff1 = as.POSIXct(DATE_A, '%d/%m/%y') - as.POSIXct(DATE_B, '%d/%m/%y'))
do.call(rbind, by(d1, d1$ID, function(i) i[which.min(i$diff1), ] ))
which gives,
ID DATE_A DATE_B diff1
3 3 15/05/06 17/04/06 -701 days
5 5 14/11/05 26/09/05 -4322 days
8 8 25/11/08 31/03/09 -1947 days

Resources