interpolate data series with R

interpolate data series with R - r

I am having trouble interpolating the values of two data series. I have a reference time in first column. The second column is time linked for values of P130. I want to interpolate new values of P130 (third column) according to reference time.
The reference time and timeP130 have the first and last value the same and they are all in variable steps, so there is no pattern.
Reference_time timeP130 P130 results
0.0001 0.0001 0.2194 0.2194
0.000694 0.003 0.25 0.22552
0.00138889 0.0035 0.26 0.23164
0.00208333 0.006 0.24 0.23776
0.00277778 0.009 0.245 0.24388
0.003 0.009 0.255 0.25
0.00416667 0.0125 0.27 ETC
0.00486111 0.015 0.21
0.00555556 0.018 0.20
0.00625 0.0208 0.2194
0.00694444 0.021 0.2194
0.00763889 0.0211 0.2194
0.00833333 0.0215 0.2194
0.00902778 0.022 0.2195
0.00972222 0.0327 0.2591
0.0104167 0.0433 0.3664
0.0111111 0.0839 0.4068
0.0118056 2.5 0.4087
0.0125 0.27
0.0141944
0.0158889
0.0165833
0.0182778
2.5 0.4087

Related

Binning by equal standard deviation R

I have a vector containing some data, in particular
tau_3[p_3<3]
[1] 7.837 7.813 6.276 8.669 7.001 6.032 6.897 5.967 9.417 8.251 7.892 8.752 9.873 9.461 8.591 7.697 8.372 9.324 9.135 7.807
[21] 10.034 10.701 9.315 6.979 9.843 8.742 8.829 7.406 8.588 6.803 7.462 8.379 8.075 8.294 8.218
which has to be studied with respect to another set of datapoints
>p_3[p_3<3]
[1] 0.020 0.021 0.022 0.023 0.024 0.026 0.028 0.014 0.029 0.030 0.033 0.035 0.037 0.040 0.042 0.044 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085
[25] 0.090 0.100 0.110 0.120 0.130 0.150 0.160 0.190 0.200 0.230 0.240
I would like to divide the pressure p_3 data (the subset given above) it in such a way that each bin has, more or less, the same standard deviation for the decay time \tau_3 data that it contains. In particular, I should have a vector containing the breaks for such binned data.
I don't know of any package that could do this and I've been scratching my head on how to do it for hours. If you could give me a solution I would be very grateful.

Problem with grouped psych::alpha within the do::dplyr/tidyverse and broom::tidy

I have survey data performed using the same questionnaire in different languages. I would like to write an elegant dplyr/tidyverse code for the reliability for each language, using psych::alpha within. Let's imagine, that the data frame (df) looks like that:
I want to calculate item and scale reliability for Q_1:Q_6, for each group indicated by the group_var variable and the code I wrote looks like this
require(tidyverse)
require(psych)
require(broom)
df %>%
select(group_var, Q_1:Q_6) %>%
as.data.frame() %>%
group_by(group_var) %>%
do(tidy(psych::alpha(c(Q_1:Q_6))))
but when I run the code, I got an error message:
Error in psych::alpha(c(Q_1:Q_6)) :
object 'Q_1' not found
What is wrong with the code?
Thanks in advance.

I don't think tidy works on psych::alpha(), using an example:
r4 <- sim.congeneric()
tidy(alpha(r4))
Error: No tidy method for objects of class psych
So tidy is out of question, unless there is a Best thing you can do is wrap them up in a list within a tibble:
library(dplyr)
library(tidyr)
library(purrr)
library(psych)
library(broom)
df = data.frame(group_var=sample(LETTERS[1:6],100,replace=TRUE),
matrix(sample(0:3,900,replace=TRUE),nrow=100))
colnames(df)[-1] = c(paste0("Q_",1:6), paste0("V_", 23:25))
res = df %>%
select(group_var, Q_1:Q_6) %>%
nest(data=Q_1:Q_6) %>%
mutate(alpha = map(data,
~alpha(.x,keys=c("Q_1","Q_2","Q_3","Q_4","Q_5","Q_6"))
))
res$alpha[[1]]
Reliability analysis
Call: alpha(x = .x, keys = c("Q_1", "Q_2", "Q_3", "Q_4", "Q_5", "Q_6"))
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
-0.37 -0.3 0.13 -0.04 -0.23 0.6 1.6 0.36 0.039
lower alpha upper 95% confidence boundaries
-1.54 -0.37 0.81
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Q_1- -0.38 -0.38221 -0.143 -0.05854 -0.27652 0.61 0.028 -0.080
Q_2- -0.21 -0.19042 0.173 -0.03305 -0.15996 0.54 0.048 0.066
Q_3- -0.38 -0.26988 0.096 -0.04439 -0.21252 0.61 0.053 0.046
Q_4- -0.54 -0.41760 -0.064 -0.06261 -0.29458 0.68 0.045 -0.016
Q_5- -0.35 -0.26006 0.154 -0.04305 -0.20639 0.60 0.058 0.059
Q_6- 0.03 -0.00088 0.107 -0.00018 -0.00088 0.42 0.024 -0.016
Item statistics
n raw.r std.r r.cor r.drop mean sd
Q_1- 13 0.42 0.45 0.552 -0.062 0.77 1.01
Q_2- 13 0.38 0.33 -0.073 -0.162 1.85 1.14
Q_3- 13 0.39 0.38 0.083 -0.058 1.92 0.95
Q_4- 13 0.45 0.47 0.416 0.050 1.62 0.87
Q_5- 13 0.33 0.38 -0.039 -0.073 2.08 0.86
Q_6- 13 0.21 0.18 -0.137 -0.309 1.38 1.12
Non missing response frequency for each item
0 1 2 3 miss
Q_1 0.08 0.15 0.23 0.54 0
Q_2 0.38 0.23 0.23 0.15 0
Q_3 0.31 0.38 0.23 0.08 0
Q_4 0.15 0.38 0.38 0.08 0
Q_5 0.38 0.31 0.31 0.00 0
Q_6 0.15 0.38 0.15 0.31 0
A quick check seems tidystats might be able to do it, but I ran the example code and doesn't seem to work. So you can try it for yourself.

R principal() get eigenvalues of factors

I've done a PCA and the result looks something like this:
RC1 RC14 RC2 RC5 RC3 RC9 RC6 RC7 RC16 RC11 RC19 RC12 RC26 RC8 RC10 RC4 RC20 …
SS loadings 3.199 3.161 3.001 2.958 2.928 2.908 2.793 2.786 2.727 2.723 2.696 2.558 2.544 2.540 2.515 2.499 2.494 …
Proportion Var 0.005 0.005 0.005 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 …
Cumulative Var 0.005 0.010 0.014 0.019 0.023 0.027 0.032 0.036 0.040 0.044 0.048 0.052 0.056 0.060 0.063 0.067 0.071 …
As you can see the factors (RC1, RC14, etc.) aren't in the correct order.
To get the eigenvalues I can use fit$values and I'll get a list like this
[1] 4.9880983 4.3804479 3.4831868 3.4637441 3.1826873 2.9171613 2.7109790 2.7069910 2.6505181 2.5475078 2.5339040
[12] 2.5167436 2.4434298 2.4023438 2.3648536 2.3065183 2.2927025 2.2779793 2.2523245 2.2436222 2.2073776 2.1823970
[23] 2.1626319 2.1487751 2.1274126 2.0963421 2.0918373 2.0728735 2.0603362 2.0470462 2.0355974 2.0202679 2.0170792
[34] 2.0013015 1.9891380 1.9874788 …
Now I want the eigenvalues of those factors. The question is—because the factors are not ordered—how can I match factors and their respective eigenvalues? I guess RC1 has an eigenvalue of 4.9880983, but does RC14 have an eigenvalue of 4.3804479 or 2.4023438?

You could install the FactoExtra library which has a lot of great tools. It lists the eigen value beside the PC axis ID so there won't be any confusion.
library(FactoExtra)
eig.val <- get_eigenvalue(fit)
eig.val[1:8,] #spits out first 8 axes.

Convert column headers into new columns

My data frame consists of time series financial data from many public companies. I purposely set companies' weights as their column headers while cleaning the data, and I also calculated log returns for each of them in order to calculate weighted returns in the next step.
Here is an example. There are four companies: A, B, C and D, and their corresponding weights in the portfolio are 0.4, 0.3, 0.2, 0.1 separately. So the current data set looks like:
df1 <- data.frame(matrix(vector(),ncol=9, nrow = 4))
colnames(df1) <- c("Date","0.4","0.4.Log","0.3","0.3.Log","0.2","0.2.Log","0.1","0.1.Log")
df1[1,] <- c("2004-10-29","103.238","0","131.149","0","99.913","0","104.254","0")
df1[2,] <- c("2004-11-30","104.821","0.015","138.989","0.058","99.872","0.000","103.997","-0.002")
df1[3,] <- c("2004-12-31","105.141","0.003","137.266","-0.012","99.993","0.001","104.025","0.000")
df1[4,] <- c("2005-01-31","107.682","0.024","137.08","-0.001","99.782","-0.002","105.287","0.012")
df1
Date 0.4 0.4.Log 0.3 0.3.Log 0.2 0.2.Log 0.1 0.1.Log
1 2004-10-29 103.238 0 131.149 0 99.913 0 104.254 0
2 2004-11-30 104.821 0.015 138.989 0.058 99.872 0.000 103.997 -0.002
3 2004-12-31 105.141 0.003 137.266 -0.012 99.993 0.001 104.025 0.000
4 2005-01-31 107.682 0.024 137.08 -0.001 99.782 -0.002 105.287 0.012
I want to create new columns that contain company weights so that I can calculate weighted returns in my next step:
Date 0.4 0.4.W 0.4.Log 0.3 0.3.W 0.3.Log 0.2 0.2.W 0.2.Log 0.1 0.1.W 0.1.Log
1 2004-10-29 103.238 0.400 0.000 131.149 0.300 0.000 99.913 0.200 0.000 104.254 0.100 0.000
2 2004-11-30 104.821 0.400 0.015 138.989 0.300 0.058 99.872 0.200 0.000 103.997 0.100 -0.002
3 2004-12-31 105.141 0.400 0.003 137.266 0.300 -0.012 99.993 0.200 0.001 104.025 0.100 0.000
4 2005-01-31 107.682 0.400 0.024 137.080 0.300 -0.001 99.782 0.200 -0.002 105.287 0.100 0.012

We can try
v1 <- grep("^[0-9.]+$", names(df1), value = TRUE)
df1[paste0(v1, ".w")] <- as.list(as.numeric(v1))

R Error: Data and vector are not the same length

I'm trying to plot bacterial growth rates in R using a premade script. Basically I am attempting to use a function to give me the steepest slope between a set of points. I'm using the following data frame "tmp":
> str(tmp)
'data.frame': 54 obs. of 10 variables:
$ Strain : Factor w/ 54 levels "11A023","11A045",..: 1 2 3 4 5 6 7 8 9 10 ...
$ 0 : num 0.048 0.05 0.047 0.053 0.051 0.051 0.041 0.05 0.049 0.045 ...
$ 21.5 : num 0.04 0.042 0.037 0.037 0.041 0.03 0.031 0.043 0.037 0.036 ...
$ 47.5 : num 0.027 0.041 0.032 0.035 0.034 0.026 0.02 0.042 0.034 0.03 ...
$ 71.5 : num 0.026 0.039 0.028 0.032 0.032 0.022 0.019 0.041 0.03 0.031 ...
$ 94.5 : num 0.025 0.037 0.027 0.026 0.03 0.017 0.015 0.037 0.028 0.024 ...
$ 117.8333333: num 0.023 0.031 0.026 0.035 0.029 0.017 0.017 0.034 0.027 0.022 ...
$ 144.5 : num 0.021 0.032 0.031 0.029 0.035 0.022 0.012 0.034 0.03 0.023 ...
$ 154.75 : num 0.022 0.032 0.031 0.033 0.042 0.026 0.016 0.041 0.036 0.025 ...
$ 194 : num 0.02 0.034 0.034 0.03 0.04 0.022 0.014 0.038 0.034 0.028 ...
And the following code:
tmp = read.csv("sorted_data.csv") #substitute your file name for 'sorted_data'
source("find_gr.R") #this command loads the script (find_gr) that contains the analysis functions (needs to be in the present working directory)
time <- seq(0,9.25) #edit as appropriate
#note that the growth rate output will be scaled by the time units you use here (per hour, per min, per century, etc.)
M = nrow(tmp)
N = ncol(tmp)
pdf("growth_rate_plots.pdf", paper="letter", width=7.5, height=10) #substitute your desired file name for 'growth_rate_plots'
growth.rates = NULL
for (i in 1:M) {
print(i)
gr <- findgr(tmp[i, 3:N], time, tmp[i, 2], int=12, r2=0.6) #3 in [i, 3:N] is the column number where the data starts;
#2 in [i, 2] is the column containg the label you want on the plot;
#int is number of points taken at one time as an interval to find the highest slope;
#vary (i.e. lower) r2, i.e. rsquared as needed, blanks can be a problem here
growth.rates <- rbind(growth.rates, gr)
}
dev.off()
When I run the code, I get the following error:
Error: Your data and time are not the same length.
Error in findgr(tmp[i, 3:N], time, tmp[i, 2], int = 12, r2 = 0.6) :
I believe this refers to the vector 'time' created. My dataframe is length 9 or 10 (not sure if I count $Strain in length). I have tried creating a time vector with varying lengths, but always get this error returned.
Is there anything I am doing wrong? What should I be looking for?
Much thanks for any help, I am a complete beginner at this.
**Scripts were obtained from https://www.princeton.edu/genomics/botstein/protocols/

If you open the script find_gr.R . First lines says:
findgr = function(x, t, plottitle, int=15, r2.cutoff=0.6) {
...
#are x and t the same length?
if (length(x) != length(t)) {
cat("Error: Your data and time are not the same length.\n")
stop()
}
Length of x and t has to be the same. Have a look what are you putting there. You are putting:
gr <- findgr(tmp[i, 3:N], time, ....
Time should be:
time <- seq(0, length(tmp[i, 3:N])-1)
-1 because sequence starts from 0.
However in my case (I generated some data) it produces some other errors. I hope this gives you a starting point.