instantaneous velocity - reference previous value - r

Using a very simple equation, how can I calculate the instantaneous velocity.
Vi = V0 + acceleration * time
The following task is very easy with MS.Excel as one can click on the previous previous cell, but how do we call this in R?
acceleration <- c(1,2,3,4,5,4,3,2,1)
time <- rep(0.1,9)
df1 <- data.frame(acceleration, time)
df1$instant.vel <- df1$acceleration * df1$time + ....

Try using dplyr::lag
library(dplyr)
df1 %>%
mutate(V=(lag(acceleration,default=0)*lag(time,default=0))+(acceleration*time))
acceleration time V
1 1 0.1 0.1
2 2 0.1 0.3
3 3 0.1 0.5
4 4 0.1 0.7
5 5 0.1 0.9
6 4 0.1 0.9
7 3 0.1 0.7
8 2 0.1 0.5
9 1 0.1 0.3
Or step by step:
df1 %>%
mutate(V0=(acceleration*time)) %>%
mutate(V1=V0+(lag(acceleration,default=0)*lag(time,default=0)))
acceleration time V0 V1
1 1 0.1 0.1 0.1
2 2 0.1 0.2 0.3
3 3 0.1 0.3 0.5
4 4 0.1 0.4 0.7
5 5 0.1 0.5 0.9
6 4 0.1 0.4 0.9
7 3 0.1 0.3 0.7
8 2 0.1 0.2 0.5
9 1 0.1 0.1 0.3

Related

Multiply values depending on values of certains columns

I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.

Lagging variable by group does not work in dplyr

I'm desperately trying to lag a variable by group. I found this post that deals with essentially the same problem I'm facing, but the solution does not work for me, no idea why.
This is my problem:
library(dplyr)
df <- data.frame(monthvec = c(rep(1:2, 2), rep(3:5, 3)))
df <- df %>%
arrange(monthvec) %>%
mutate(growth=ifelse(monthvec==1, 0.3,
ifelse(monthvec==2, 0.5,
ifelse(monthvec==3, 0.7,
ifelse(monthvec==4, 0.1,
ifelse(monthvec==5, 0.6,NA))))))
df%>%
group_by(monthvec) %>%
mutate(lag.growth = lag(growth, order_by=monthvec))
Source: local data frame [13 x 3]
Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 0.3
3 2 0.5 NA
4 2 0.5 0.5
5 3 0.7 NA
6 3 0.7 0.7
7 3 0.7 0.7
8 4 0.1 NA
9 4 0.1 0.1
10 4 0.1 0.1
11 5 0.6 NA
12 5 0.6 0.6
13 5 0.6 0.6
This is what I'd like it to be in the end:
df$lag.growth <- c(NA, NA, 0.3, 0.3, 0.5, 0.5, 0.5, 0.7,0.7,0.7, 0.1,0.1,0.1)
monthvec growth lag.growth
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
I believe that one problem is that my groups are not of equal length...
Thanks for helping out.
Here is an idea. We group by monthvec in order to get the number of rows (cnt) of each group. We ungroup and use the first value of cnt as the size of the lag. We regroup on monthvec and replace the values in each group with the first value of each group.
library(dplyr)
df %>%
group_by(monthvec) %>%
mutate(cnt = n()) %>%
ungroup() %>%
mutate(lag.growth = lag(growth, first(cnt))) %>%
group_by(monthvec) %>%
mutate(lag.growth = first(lag.growth)) %>%
select(-cnt)
which gives,
# A tibble: 13 x 3
# Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
You may join your original data with a dataframe with a shifted "monthvec".
left_join(df, df %>% mutate(monthvec = monthvec + 1) %>% unique(), by = "monthvec")
# monthvec growth.x growth.y
# 1 1 0.3 NA
# 2 1 0.3 NA
# 3 2 0.5 0.3
# 4 2 0.5 0.3
# 5 3 0.7 0.5
# 6 3 0.7 0.5
# 7 3 0.7 0.5
# 8 4 0.1 0.7
# 9 4 0.1 0.7
# 10 4 0.1 0.7
# 11 5 0.6 0.1
# 12 5 0.6 0.1
# 13 5 0.6 0.1

Summing Values based on Hour and Month and Re-arranging Summed Time Series

I am trying to aggregate (sum) values across months and hours and re-arrange the summed data so that hour and month are on different "axes". I would like the hour to be column headers and the month to be row headers with summed values in each cell. Here's what I mean, through a dummy data example (obviously 12 months are present and 24 hours in the real data):
Month <- c(1,1,2,2,3,3,3,4,4,4,5,5,5,5,6,7,8,9,10,11,12)
Hour <- c(4,1,3,2,5,5,1,4,3,6,0,0,2,3,1,2,3,4,5,6,2)
Value <- c(0.1,0.4,0.02,0.1,0.1,0.2,0.02,0.01,0.01,0.02,0.1,0.3,0.2,0.1,0.2, 0.1,0.3,0.1,0.01,0.01,0.1)
z <- data.frame(Month, Hour, Value)
head(z)
Month Hour Value
1 4 0.10
1 1 0.40
2 3 0.02
2 2 0.10
3 5 0.10
3 5 0.20
My desired output, Hour = column headers (there will be 24 total, this just shows first 6 hours), Month = row headers (there will be 12 total)
z
0 1 2 3 4 5 6
1 0.3 0.2 0.1 0.7 0.1 1.1 0.7
2 0.1 0.1 0.8 1.7 0.2 0.1 0.6
3 0.2 0.7 0.1 0.4 2.1 1.3 0.1
4 0.1 0.2 0.2 0.1 3.1 0.1 0.7
5 0.7 0.8 1.2 0.2 0.4 0.1 0.2
6 0.5 0.2 3.0 0.8 0.2 5.1 1.2
7 0.5 0.2 3.0 0.8 0.2 5.1 1.2
8 0.5 0.2 3.0 0.8 0.2 5.1 1.2
9 0.5 0.2 3.0 0.8 0.2 5.1 1.2
10 0.5 0.2 3.0 0.8 0.2 5.1 1.2
11 0.5 0.2 3.0 0.8 0.2 5.1 1.2
12 0.5 0.2 3.0 0.8 0.2 5.1 1.2
We can use xtabs to create a contingency table
xtabs(Value~Month+Hour)

How to plot average of multiple columns by factor variables

I am trying to plot what is essentially calculated average time-series data for a dependent variable with 2 independent variables. DV = pupil dilation (at multiple time points "T") in response doing a motor task (IV_A) in combination with 3 different speech-in-noise signals (IV_B).
I would like to plot the average dilation across subjects at each time point (mean for each T column) , with separate lines for each condition.
So, the x axis would be T1 to T5 with a separate line for IV_A(=1):IV_B(=1),IV_A(=1):IV_B(=2),and IV_A(=1):IV_B(=3)
Depending how it looks, I might want the IV_A(=2) lines on a separate plot. But all in one graph would make for an easy visual comparison.
I'm wondering if I need to melt the data, to make it extremely long (there are about 110 T columns), or if there is away to accomplish what I want without restructuring the data frame.
The data look something like this:
Subject IV_A IV_B T1 T2 T3 T4 T5
1 1 1 0.2 0.3 0.5 0.6 0.3
1 1 2 0.3 0.2 0.3 0.4 0.4
1 1 3 0.2 0.4 0.5 0.2 0.3
1 2 1 0.3 0.2 0.3 0.4 0.4
1 2 2 0.2 0.3 0.5 0.6 0.3
1 2 3 0.2 0.4 0.5 0.2 0.3
2 1 1 0.2 0.3 0.5 0.6 0.3
2 1 2 0.3 0.2 0.3 0.4 0.4
2 1 3 0.2 0.4 0.5 0.2 0.3
2 2 1 0.3 0.2 0.3 0.4 0.4
2 2 2 0.2 0.3 0.5 0.6 0.3
2 2 3 0.2 0.4 0.5 0.2 0.3
3 1 1 0.2 0.3 0.5 0.6 0.3
3 1 2 0.3 0.2 0.3 0.4 0.4
3 1 3 0.2 0.4 0.5 0.2 0.3
3 2 1 0.3 0.2 0.3 0.4 0.4
3 2 2 0.2 0.3 0.5 0.6 0.3
3 2 3 0.2 0.4 0.5 0.2 0.3
Edit:
Unfortunately, I can't adapt #eipi10 's code to my actual data frame, which looks as follows:
Subject Trk_Y.N NsCond X.3 X.2 X.1 X0 X1 X2 X3
1 N Pink 0.3 0.4 0.6 0.4 0.8 0.6 0.6
1 N Babble 0.3 0.4 0.6 0.4 0.8 0.6 0.6
1 N Loss 0.3 0.4 0.6 0.4 0.8 0.6 0.6
1 Y Pink 0.3 0.4 0.6 0.4 0.8 0.6 0.6
1 Y Babble 0.3 0.4 0.6 0.4 0.8 0.6 0.6
1 Y Loss 0.3 0.4 0.6 0.4 0.8 0.6 0.6
Trk_Y.N means was the block with or without a secondary motor tracking task ("Yes" or "No"). NsCond is the type of noise the speech stimuli are presented in.
It's likely better to replace "Y" with "Tracking" and "N" with "No_Tracking".
I tried:
test_data[test_data$Trk_Y.N == "Y",]$Trk_Y.N = "Tracking"
But got an error:
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c("Tracking", "Tracking", :
invalid factor level, NA generated
I may not have understood your data structure, so please let me know if this isn't what you had in mind:
library(reshape2)
library(ggplot2)
library(dplyr)
"Melt" data to long format. This will give us one observation for each Subject, IV and Time:
# Convert the two `IV` columns into a single column
df.m = df %>% mutate(IV = paste0("A",IV_A,":","B",IV_B)) %>% select(-IV_A,-IV_B)
# Melt to long format
df.m = melt(df.m, id.var=c("Subject","IV"), variable.name="Time", value.name="Pupil_Dilation")
head(df.m)
Subject IV Time Pupil_Dilation
1 1 A1:B1 T1 0.2
2 1 A1:B2 T1 0.3
3 1 A1:B3 T1 0.2
4 1 A2:B1 T1 0.3
5 1 A2:B2 T1 0.2
6 1 A2:B3 T1 0.2
Now we can plot a line giving the average value of Pupil_Dilation for each Time point for each level of IV, plus 95% confidence intervals. In your sample data, there's only a single measurement at each Time for each level of IV so no 95% confidence interval is included in the example graph below. However, if you have multiple measurements in your actual data, then you can use the code below to include the confidence interval:
pd=position_dodge(0.5)
ggplot(df.m, aes(Time, Pupil_Dilation, colour=IV, group=IV)) +
stat_summary(fun.data=mean_cl_boot, geom="errorbar", width=0.1, position=pd) +
stat_summary(fun.y=mean, geom="line", position=pd) +
stat_summary(fun.y=mean, geom="point", position=pd) +
scale_y_continuous(limits=c(0, max(df.m$Pupil_Dilation))) +
theme_bw()

Reshape matrix to data frame

I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1

Resources