how to interpolate data within groups in R using seqtime? - r

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!

I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Related

Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { : argument is of length zero

I am using WarbleR in R to do some acoustic analyses. As freq_range couldn't detect all the bottom frequencies very well, I have created a data frame manually with all the right bottom frequencies, loaded this into R and turned it into a selection table. Traq_freq_contour and compare.methods and freq_DTW all work fine (although freq_DTW does give a warning message:
Warning message: In (0:(n - 1)) * f : NAs produced by integer overflow
However. If I try to do the function cross_correlation, I get the following error:
Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { :
argument is of length zero
I do not get this error with a selection table with the bottom and top frequency added with the freq_range function in R instead of manually. What could be the issue here? The selection tables both look similar:
This is the selection table partly made by R through freq_range:
And this is the one with the bottom frequencies added manually (which has more sound files than the one before):
This is part of the code I use:
#Comparing methods for quantitative analysis of signal structure
compare.methods(X = stnew, flim = c(0.6,2.5), bp = c(0.6,2.5), methods = c("XCORR", "dfDTW"))
#Measure acoustic parameters with spectro_analysis
paramsnew <- spectro_analysis(stnew, bp = c(0.6,2), threshold = 20)
write.csv(paramsnew, "new_acoustic_parameters.csv", row.names = FALSE)
#Remove parameters derived from fundamental frequency
paramsnew <- paramsnew[, grep("fun|peakf", colnames(paramsnew), invert = TRUE)]
#Dynamic time warping
dm <- freq_DTW(stnew, length.out = 30, flim = c(0.6,2), bp = c(0.6,2), wl = 300, img = TRUE)
str(dm)
#Spectrographic cross-correlation
xcnew <- cross_correlation(stnew, wl = 300, na.rm = FALSE)
str(xc)
Any idea what I'm doing wrong?

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

Loop in R through variable names with values as endings and create new variables from the result

I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)
You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.

How to continuously send data from LabVIEW to R? (code help)

I am trying to bring real time data from LabVIEW (vibration of a bearing and temperature) into an app written in R to create a control chart. It works for a while but eventually crashes with the following error message:
Error in aggregate.data.frame(B, list(rep(1:(nrow(B)%/%n + 1), each = n, :
no rows to aggregate
The process works as LabVIEW takes the data and projects it onto two Excel files. Those files are read in the R code and used to project a control chart in R. The process succeeds for some time, and the failure moment is not always the same amount of time. Sometimes the control chart will run for 6-7 min, other times is will crash in 2 min.
My suspicion is that the Excel files are not being updated fast enough, so the R code tries to read that Excel file when it is empty.
Any suggestions would be great! thank you!
I have tried to lower the sample size taken per second. That did not work.
getwd()
setwd("C:/Users/johnd/Desktop/R Data")
while(1) {
A = fread("C:/Users/johnd/Desktop/R Data/a1.csv" , skip = 4 , header = FALSE , col.names = c("t1","B2","t2","AM","t3","M","t4","B1"))
t1 = A$t1
B2 = A$B2
t2 = A$t2
AM = A$AM
t3 = A$t3
M = A$M
t4 = A$t4
B1 = A$B1
B = fread("C:/Users/johnd/Desktop/R Data/b1.csv" , skip = 4 , header = FALSE , col.names = c("T1","small","T2","big"))
T1 = B$T1
small = B$small
T2 = B$T2
big = B$big
DJ1 = A[seq(1,nrow(A),1),c('t1','B2','AM','M','B1')]
DJ1
n = 16
DJ2 = aggregate(B,list(rep(1:(nrow(B)%/%n+1),each=n,len=nrow(B))),mean)[-1]
DJ2
#------------------------------------------------------------------------
DJ6 = cbind(DJ1[,'B1'],DJ2[,c('small','big')]) # creates matrix for these three indicators
DJ6
#--------------T2 Hand made---------------------------------------------------------------------
new_B1 = DJ6[,'B1']
new_small = DJ6[,'small'] ### decompose the DJ6 matrix into vectors for each indicator(temperature, big & small accelerometers)
new_big = DJ6[,'big']
new_B1
new_small
new_big
mean_B1 = as.numeric(colMeans(DJ6[,'B1']))
mean_small = as.numeric(colMeans(DJ6[,'small'])) ##decomposes into vectors of type numeric
mean_big = as.numeric(colMeans(DJ6[,'big']))
cov_inv = data.matrix(solve(cov(DJ6))) # obtain inverse covariance matrix
cov_inv
p = ncol(DJ6) #changed to pull number of parameters by taking the number of coumns in OG matrix #p=3 # #ofQuality Characteristics
m=64 # #of samples (10 seconds of data)
a_alpha = 0.99
f= qf(a_alpha , df1 = p,df2 = (m-p)) ### calculates the F-Statistic for our data
f
UCL = (p*(m+1)*(m-1)*(f))/(m*(m-p)) ###produces upper control limit
UCL
diff_B1 = new_B1-mean_B1
diff_small = new_small-mean_small
diff_big = new_big-mean_big
DJ7 = cbind(diff_B1, diff_small , diff_big) #produces matrix of difference between average and observations (x-(x-bar))
DJ7
# DJ8 = data.matrix(DJ7[1,])
# DJ8
DJ9 = data.matrix(DJ7) ### turns matrix into appropriate numeric form
DJ9
# T2.1.1 = DJ8 %*% cov_inv %*% t(DJ8)
# T2.1.1
# T2.1 = t(as.matrix(DJ9[1,])) %*% cov_inv %*% as.matrix(DJ9[1,])
# T2.1
#T2 <- NULL
for(i in 1:64){ #### creates vector of T^2 statistic
T2<- t(as.matrix(DJ9[i,])) %*% cov_inv %*% as.matrix(DJ9[i,]) # calculation of T^2 test statistic ## there is no calculation of x-double bar
write.table(T2,"C:/Users/johnd/Desktop/R Data/c1.csv",append=T,sep="," , col.names = FALSE)#
#
DJ12 <-fread("C:/Users/johnd/Desktop/R Data/c1.csv" , header = FALSE ) #
}
# DJ12
DJ12$V1 = 1:nrow(DJ12)
# plot(DJ12 , type='l')
p1 = nrow(DJ12)-m
p2 = nrow(DJ12)
plot(DJ12[p1:p2,], type ='o', ylim =c(0,15), ylab="T2 Chart" , xlab="Data points") ### plots last 640 points
# plot(DJ12[p1:p2,], type ='o' , ylim =c(0,15) , ylab="T2 Chart" , xlab="Data points")
abline(h=UCL , col="red") ## displays upper control limit
Sys.sleep(1)
}
The process succeeds for some time, and the failure moment is not always the same amount of time. Sometimes the control chart will run for 6-7 min, other times is will crash in 2 min.
My suspicion is that the Excel files are not being updated fast enough, so the R code tries to read that Excel file when it is empty.
Your suspicion is correct.
With your current design, your R application can crash depending on how fast it runs relative to your LabVIEW application. This is called a race condition; you must eliminate race conditions from your code.
A quick and dirty solution
One simple solution to avoid the crash is to call NROW to check if any data exists. If there's no data available, don't call aggregate. This is described here: error message in r: no rows to aggregate
A more robust solution
A better solution is to use a communications protocol like TCP to stream data from LabVIEW to R, instead of using CSV files to transfer real-time data. For example, your R program could listen for data on a TCP socket. Make it wait for data to be sent from LabVIEW before running your data processing code.
Here is an example on using socketConnection in R: http://blog.corynissen.com/2013/05/using-r-to-communicate-via-socket.html
Here is an example on sending/receiving data over TCP in LabVIEW: http://www.ni.com/product-documentation/2710/en/

R HTS package: combinef and aggts not working with gts object

I'm trying to apply the combinef and aggts functions from the R hts package to a time series matrix in order to obtain an optimized set of forecasts across a hierarchy. I've run the same code every month without issue, and am now seeing errors after upgrading to hts package v4.5.
Reproducible example (I can share data file offline if needed)
#Read in forecast data for all levels of hierarchy#
fcast<-read.csv("SampleHierarchyForecast.csv", header = TRUE, check.names = FALSE)
#Convert to time series#
fcast<-ts(fcast, start = as.numeric(2010.25) + (64)/12, end = as.numeric(2010.25) + (75)/12, f= 12)
#Create time series of only the bottom level of the hierarchy#
index<-c()
fcastBottom<-fcast
for (i in 1:length(fcastBottom [1,]))
{
if(nchar(colnames(fcastBottom)[i])!=28)
index[i]<-i
else
index[i]<-0
}
fcastBottom<-fcastBottom[,-index]
#Create grouped time series from the bottom level forecast #
GtsForecast <- gts(fcastBottom, characters = list(c(12,12), c(4)), gnames = c("Category", "Item", "Customer", "Category-Customer"))
#Use combinef function to optimally combine the full hierarchy forecast using the groups from the full hierarchy gts#
combo <- combinef(fcast, groups = GtsForecast$groups)
*Warning message:
In mapply(rep, as.list(gnames), times, SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter*
traceback()
2: stop("Argument fcasts requires all the forecasts.")
1: combinef(fcast, groups = GtsForecast$groups)
There's a little bug when comebinef() function calls gts(). Now I've fixed it on github. So you can run your own code above without any trouble after updating the development version.
Alternatively, you need to tweak your code a bit if you don't want to install the newest version.
combo <- combinef(fcast, groups = GtsForecast$groups, keep = "bottom")
combo <- ts(combo, start = as.numeric(2010.25) + (64)/12,
end = as.numeric(2010.25) + (75)/12, f = 12)
colnames(combo) <- colnames(fcastBottom)
newGtsForecast <- gts(combo, characters = list(c(12,12), c(4)),
gnames = c("Category", "Item", "Customer",
"Category-Customer"))
Aggregate <- aggts(newGtsForecast)
Hope it helps.

Resources