Generate absolute time from time difference in Dataframe in R - r

I have a Dataframe called "myseis" which contains Accelerationdata:
X Y Z dt
1 -0.843 3.854 8.247 0
2 -0.598 3.795 8.110 20
3 -0.402 3.834 8.325 19
4 -0.353 3.883 8.414 20
5 -0.117 4.059 8.404 20
6 0.039 3.854 8.159 20
7 0.235 3.726 7.894 20
8 0.372 3.706 7.717 20
9 0.451 3.903 7.835 21
10 0.568 4.197 8.061 19
I want to add a 5th column which contains the absolute time. Something like:
X Y Z dt Date/Time
1 -0.843 3.854 8.247 0 2018-08-020 10:00:00,000
2 -0.598 3.795 8.110 20 2018-08-020 10:00:00,020
3 -0.402 3.834 8.325 19 2018-08-020 10:00:00,039
4 -0.353 3.883 8.414 20 2018-08-020 10:00:00,059
etc.
So I want to add up the "dt" column (milliseconds) to the Starttime which is in POSIXct.
I kind of figured out that I could do it with:
time <- c(starttime)
i <- 1
while(i < nrow(myseis))
{
i <- i+1
time <- c(time,time[i-1]+myseis[i,4])
}
myseis <- data.frame(myseis,time)
On a small scale, this checks out but in this case, nrow(myseis) = 247118 and so this would take forever.
Is there another way? I want to plot a subset of this Dataframe later.
Thanks

Related

R: How to simply compare values of columns in 2 data frames

I am comparing two data frames: FU and FO
Here are short samples of what they look like
"Model_ID" "FU_Lin_Period" "FU_Growth_rate"
2 0.72127 0.0093333
3 0.69281 0.015857
4 0.66735 0.021103
5 0.64414 0.024205
6 0.62288 0.026568
7 0.60318 0.027749
8 0.58472 0.028161
9 0.56734 0.028008
10 0.55085 0.027309
11 0.53522 0.026068
12 0.52029 0.024684
13 0.50603 0.022866
14 0.49237 0.020991
15 0.47928 0.018773
"Model_ID" "FO_Lin_Period" "FO_Growth_rate"
7 0.44398 0.008868
8 0.43114 0.01674
9 0.41896 0.023248
10 0.40728 0.028641
11 0.39615 0.032192
12 0.38543 0.03543
13 0.37517 0.03692
14 0.36525 0.038427
15 0.35573 0.038195
As you can tell, they do not have all the same Model_ID
Basically, what I want to do is go through every Model_ID in the two tables, compare whether FU or FO's growth rate is larger for a given model ID, and...
if FU's is larger (or FU exists for the model number and FO does not), place the model number in a vector called selected_FU
if FO's is larger (or FO exists for the model number and FU does not), place the model number in a vector called selected_FO
Is there a way to do this without using loops?
data.table alternative using similar logic to the tidyverse answer.
Replace NAs with -Infinity, do the comparison of the two FU/FO_Growth_rate variables, flag which group had the larger value, and select the Model_ID into the variables requested.
library(data.table)
setDT(FU)
setDT(FO)
out <- merge(FU, FO, by="Model_ID", all=TRUE)[,
"gr_sel" := c("FO","FU")[(nafill(FU_Growth_rate, fill=-Inf) >
nafill(FO_Growth_rate, fill=-Inf)) + 1],
]
selected_FU <- out[gr_sel == "FU", Model_ID]
selected_FO <- out[gr_sel == "FO", Model_ID]
Data used:
FU <- read.table(text="Model_ID FU_Lin_Period FU_Growth_rate\n2 0.72127 0.0093333\n3 0.69281 0.015857\n4 0.66735 0.021103\n5 0.64414 0.024205\n6 0.62288 0.026568\n7 0.60318 0.027749\n8 0.58472 0.028161\n9 0.56734 0.028008\n10 0.55085 0.027309\n11 0.53522 0.026068\n12 0.52029 0.024684\n13 0.50603 0.022866\n14 0.49237 0.020991\n15 0.47928 0.018773", header=TRUE)
FO <- read.table(text="Model_ID FO_Lin_Period FO_Growth_rate\n7 0.44398 0.008868\n8 0.43114 0.01674\n9 0.41896 0.023248\n10 0.40728 0.028641\n11 0.39615 0.032192\n12 0.38543 0.03543\n13 0.37517 0.03692\n14 0.36525 0.038427\n15 0.35573 0.038195", header=TRUE)
With dplyr, tidyr, and reader.
library(dplyr)
library(tidyr)
library(readr)
FU <- read_table2("test.FU.LINA.table")
FO <- read_table2("test.FO.LINA.table")
df_compared <-
full_join(FU, FO, by = "model_id") %>%
replace_na(list(fo_growth_rate = -1, fu_growth_rate = -1)) %>%
mutate(select_fufo = if_else(fu_growth_rate >= fo_growth_rate, true = "fu", false = "fo"))
df_compared
# A tibble: 6,166 x 6
model_id fu_lin_period fu_growth_rate fo_lin_period fo_growth_rate select_fufo
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2 0.721 0.00933 NA -1 fu
2 3 0.693 0.0159 NA -1 fu
3 4 0.667 0.0211 NA -1 fu
4 5 0.644 0.0242 NA -1 fu
5 6 0.623 0.0266 NA -1 fu
6 7 0.603 0.0277 0.444 0.00887 fu
7 8 0.585 0.0282 0.431 0.0167 fu
8 9 0.567 0.0280 0.419 0.0232 fu
9 10 0.551 0.0273 0.407 0.0286 fo
10 11 0.535 0.0261 0.396 0.0322 fo
# ... with 6,156 more rows
selected_fu <- df_compared %>% filter(select_fufo == "fu") %>% .$model_id
selected_fo <- df_compared %>% filter(select_fufo == "fo") %>% .$model_id

R is not ordering data correctly - skips E values

I am trying to order data by the column weightFisher. However, it is almost as if R does not process e values as low, because all the e values are skipped when I try to order from smallest to greatest.
Code:
resultTable_bon <- GenTable(GOdata_bon,
weightFisher = resultFisher_bon,
weightKS = resultKS_bon,
topNodes = 15136,
ranksOf = 'weightFisher'
)
head(resultTable_bon)
#create Fisher ordered df
indF <- order(resultTable_bon$weightFisher)
resultTable_bonF <- resultTable_bon[indF, ]
what resultTable_bon looks like:
GO.ID Term Annotated Significant Expected Rank in weightFisher
1 GO:0019373 epoxygenase P450 pathway 19 13 1.12 1
2 GO:0097267 omega-hydroxylase P450 pathway 9 7 0.53 2
3 GO:0042738 exogenous drug catabolic process 10 7 0.59 3
weightFisher weightKS
1 1.9e-12 0.79744
2 7.9e-08 0.96752
3 2.5e-07 0.96336
what "ordered" resultTable_bonF looks like:
GO.ID Term Annotated Significant Expected Rank in weightFisher
17 GO:0014075 response to amine 33 7 1.95 17
18 GO:0034372 very-low-density lipoprotein particle re... 11 5 0.65 18
19 GO:0060710 chorio-allantoic fusion 6 4 0.35 19
weightFisher weightKS
17 0.00014 0.96387
18 0.00016 0.83624
19 0.00016 0.92286
As #bhas says, it appears to be working precisely as you want it to. Maybe it's the use of head() that's confusing you?
To put your mind at ease, try it with something simpler
dtf <- data.frame(a=c(1, 8, 6, 2)^-10, b=c(7, 2, 1, 6))
dtf
# a b
# 1 1.000000e+00 7
# 2 9.313226e-10 2
# 3 1.653817e-08 1
# 4 9.765625e-04 6
dtf[order(dtf$a), ]
# a b
# 2 9.313226e-10 2
# 3 1.653817e-08 1
# 4 9.765625e-04 6
# 1 1.000000e+00 7
Try the following :
resultTable_bon$weightFisher <- as.numeric (resultTable_bon$weightFisher)
Then :
resultTable_bonF <- resultTable_bon[order(resultTable_bonF$weightFisher),]

adding a new column after performig subtraction and division between two columns in R

I have a data set :
dataset A,
ID bidding_price Sale_price
10 74.88 67.27
11 23.1 18.14
12 62.5 56.14
13 34.5 27.09
14 55.32 49.69
15 900 706.77
16 260.84 260.84
I would like to add a column diff by performing the following operation
diff =(Bidding_price-Sale_price)/(Sale_price*100%)
and the output should look like this:
ID bidding_price Sale_price diff
10 74.88 67.27 0.113126208
11 23.1 18.14 0.273428886
12 62.5 56.14 0.113288208
13 34.5 27.09 0.273532669
14 55.32 49.69 0.113302475
15 900 706.77 0.273398701
16 260.84 260.84 0.00
Any help on this is appreciated.
Use the awesome dplyr package
library(dplyr)
df <- data.frame("id"=c(1, 2, 3),
"bidding_price"=c(10,11,12),
"sale_price"=c(9,10,11))
df <- mutate(df, diff=(bidding_price - sale_price)/sale_price)
The output is
id bidding_price sale_price diff
1 1 10 9 0.11111111
2 2 11 10 0.10000000
3 3 12 11 0.09090909
Assuming your data frame is A, then something like this
A$diff =(A$Bidding_price-A$Sale_price)/(A$Sale_price*100%)
would work for you?

How to prepare my data fo a factorial repeated measures analysis?

Currently, my dataframe is in wide-format and I want to do a factorial repeated measures analysis with two between subject factors (sex & org) and a within subject factor (tasktype). Below I've illustrated how my data looks with a sample (the actual dataset has a lot more variables). The variable starting with '1_' and '2_' belong to measurements during task 1 and task 2 respectively. this means that 1_FD_H_org and 2_FD_H_org are the same measurements but for tasks 1 and 2 respectively.
id sex org task1 task2 1_FD_H_org 1_FD_H_text 2_FD_H_org 2_FD_H_text 1_apv 2_apv
2 F T Correct 2 69.97 68.9 116.12 296.02 10 27
6 M T Correct 2 53.08 107.91 73.73 333.15 16 21
7 M T Correct 2 13.82 30.9 31.8 78.07 4 9
8 M T Correct 2 42.96 50.01 88.81 302.07 4 24
9 F H Correct 3 60.35 102.9 39.81 96.6 15 10
10 F T Incorrect 3 78.61 80.42 55.16 117.57 20 17
I want to analyze whether there is a difference between the two tasks on e.g. FD_H_org for the different groups/conditions (sex & org).
How do I reshape my data so I can analyze it with a model like this?
ezANOVA(data=df, dv=.(FD_H_org), wid=.(id), between=.(sex, org), within=.(task))
I think that the correct format of my data should like this:
id sex org task outcome FD_H_org FD_H_text apv
2 F T 1 Correct 69.97 68.9 10
2 F T 2 2 116.12 296.02 27
6 M T 1 Correct 53.08 107.91 16
6 M T 2 2 73.73 333.15 21
But I'm not sure. I tryed to achieve this wih the reshape2 package but couldn't figure out how to do it. Anybody who can help?
I think probably you need to rebuild it by binding the 2 subsets of columns together with rbind(). The only issue here was that your outcomes implied difference data types, so forced them both to text:
require(plyr)
dt<-read.table(file="dt.txt",header=TRUE,sep=" ") # this was to bring in your data
newtab=rbind(
ddply(dt,.(id,sex,org),summarize, task=1, outcome=as.character(task1), FD_H_org=X1_FD_H_org, FD_H_text=X1_FD_H_text, apv=X1_apv),
ddply(dt,.(id,sex,org),summarize, task=2, outcome=as.character(task2), FD_H_org=X2_FD_H_org, FD_H_text=X2_FD_H_text, apv=X2_apv)
)
newtab[order(newtab$id),]
id sex org task outcome FD_H_org FD_H_text apv
1 2 F T 1 Correct 69.97 68.90 10
7 2 F T 2 2 116.12 296.02 27
2 6 M T 1 Correct 53.08 107.91 16
8 6 M T 2 2 73.73 333.15 21
3 7 M T 1 Correct 13.82 30.90 4
9 7 M T 2 2 31.80 78.07 9
4 8 M T 1 Correct 42.96 50.01 4
10 8 M T 2 2 88.81 302.07 24
5 9 F H 1 Correct 60.35 102.90 15
11 9 F H 2 3 39.81 96.60 10
6 10 F T 1 Incorrect 78.61 80.42 20
12 10 F T 2 3 55.16 117.57 17
EDIT - obviously you don't need plyr for this (and it may slow it down) unless you're doing further transformations. This is the code with no non-standard dependencies:
newcolnames<-c("id","sex","org","task","outcome","FD_H_org","FD_H_text","apv")
t1<-dt[,c(1,2,3,3,4,6,8,10)]
t1$org.1<-1
colnames(t1)<-newcolnames
t2<-dt[,c(1,2,3,3,5,7,9,11)]
t2$org.1<-2
t2$task2<-as.character(t2$task2)
colnames(t2)<-newcolnames
newt<-rbind(t1,t2)
newt[order(newt$id),]

Using R to apply an equation to specific groups of data within a data set

I have a data set, and I would like to apply an equation to groups of my values. Specifically I would like to apply
sqrt(X^2+Y^2+Z^2)
to all values within a specific time and variable
Looking at the data below I would like to group my values by unique time (TS) and Bins (Bin), and grab the square root of the sum of squares for each of the X Y and Z components.
id D Bin value Month Day Year Hour Minute Second TS
1 X V1 -0.320 1 30 2012 13 59 50 2012-01-30 13:59:50
1 Y V1 -0.088 1 30 2012 13 59 50 2012-01-30 13:59:50
1 Z V1 0.171 1 30 2012 13 59 50 2012-01-30 13:59:50
1 X V2 0.368 1 30 2012 13 59 50 2012-01-30 13:59:50
1 Y V2 -0.104 1 30 2012 13 59 50 2012-01-30 13:59:50
1 Z V2 0.008 1 30 2012 13 59 50 2012-01-30 13:59:50
2 X V1 -0.052 1 30 2012 14 0 50 2012-01-30 14:00:50
2 Y V1 0.278 1 30 2012 14 0 50 2012-01-30 14:00:50
2 Z V1 -0.086 1 30 2012 14 0 50 2012-01-30 14:00:50
2 X V2 -0.214 1 30 2012 14 0 50 2012-01-30 14:00:50
2 Y V2 0.118 1 30 2012 14 0 50 2012-01-30 14:00:50
2 Z V2 -0.030 1 30 2012 14 0
So up first would be V1 at 13:59:50
sqrt(-0.320^2 + -0.088^2 + 0.171^2)
and then for V2 at t13:59:50
sqrt(0.368^2 +-0.104^2 + 0,008^2)
and so on
I had tried to use this formula (Data is called "V")
V=aggregate(value~TS+variable,data=V,sqrt((if(V$D=="X")V$value^2)+(if(V$D=="Y")V$value^2))+(if(V$D=="Z")V$value^2))
But obviously that does not work. So does anyone have a better way to first index unique groups in a data set, and than apply an equation to said group?
Use the plyr and reshape (or reshape2) packages. (Really. If you're not using those packages, you'll be astounded how much better things go.) Briefly, you'll want to first cast() your data into a wide form, so that instead of columns named D and value, you have columns named X, Y and Z. From there, you can use any number of techniques. transform in base would work, although I like mutate in the plyr package a bit better:
V <- mutate(V, norm=sqrt(X^2+Y^2+Z^2))
Assuming you always have one X, one Y, and one Z for each combination of (TS, Bin), I would try this:
aggregate(value ~ TS + Bin, data = V, FUN = function(x)sqrt(sum(x^2)))
library("plyr")
ddply(V, .(TS, Bin), summarise, norm=sqrt(sum(value*value)))
If there is exactly one X, Y, and Z per TS/Bin combination.

Resources