R Error: Data and vector are not the same length - r

I'm trying to plot bacterial growth rates in R using a premade script. Basically I am attempting to use a function to give me the steepest slope between a set of points. I'm using the following data frame "tmp":
> str(tmp)
'data.frame': 54 obs. of 10 variables:
$ Strain : Factor w/ 54 levels "11A023","11A045",..: 1 2 3 4 5 6 7 8 9 10 ...
$ 0 : num 0.048 0.05 0.047 0.053 0.051 0.051 0.041 0.05 0.049 0.045 ...
$ 21.5 : num 0.04 0.042 0.037 0.037 0.041 0.03 0.031 0.043 0.037 0.036 ...
$ 47.5 : num 0.027 0.041 0.032 0.035 0.034 0.026 0.02 0.042 0.034 0.03 ...
$ 71.5 : num 0.026 0.039 0.028 0.032 0.032 0.022 0.019 0.041 0.03 0.031 ...
$ 94.5 : num 0.025 0.037 0.027 0.026 0.03 0.017 0.015 0.037 0.028 0.024 ...
$ 117.8333333: num 0.023 0.031 0.026 0.035 0.029 0.017 0.017 0.034 0.027 0.022 ...
$ 144.5 : num 0.021 0.032 0.031 0.029 0.035 0.022 0.012 0.034 0.03 0.023 ...
$ 154.75 : num 0.022 0.032 0.031 0.033 0.042 0.026 0.016 0.041 0.036 0.025 ...
$ 194 : num 0.02 0.034 0.034 0.03 0.04 0.022 0.014 0.038 0.034 0.028 ...
And the following code:
tmp = read.csv("sorted_data.csv") #substitute your file name for 'sorted_data'
source("find_gr.R") #this command loads the script (find_gr) that contains the analysis functions (needs to be in the present working directory)
time <- seq(0,9.25) #edit as appropriate
#note that the growth rate output will be scaled by the time units you use here (per hour, per min, per century, etc.)
M = nrow(tmp)
N = ncol(tmp)
pdf("growth_rate_plots.pdf", paper="letter", width=7.5, height=10) #substitute your desired file name for 'growth_rate_plots'
growth.rates = NULL
for (i in 1:M) {
print(i)
gr <- findgr(tmp[i, 3:N], time, tmp[i, 2], int=12, r2=0.6) #3 in [i, 3:N] is the column number where the data starts;
#2 in [i, 2] is the column containg the label you want on the plot;
#int is number of points taken at one time as an interval to find the highest slope;
#vary (i.e. lower) r2, i.e. rsquared as needed, blanks can be a problem here
growth.rates <- rbind(growth.rates, gr)
}
dev.off()
When I run the code, I get the following error:
Error: Your data and time are not the same length.
Error in findgr(tmp[i, 3:N], time, tmp[i, 2], int = 12, r2 = 0.6) :
I believe this refers to the vector 'time' created. My dataframe is length 9 or 10 (not sure if I count $Strain in length). I have tried creating a time vector with varying lengths, but always get this error returned.
Is there anything I am doing wrong? What should I be looking for?
Much thanks for any help, I am a complete beginner at this.
**Scripts were obtained from https://www.princeton.edu/genomics/botstein/protocols/

If you open the script find_gr.R . First lines says:
findgr = function(x, t, plottitle, int=15, r2.cutoff=0.6) {
...
#are x and t the same length?
if (length(x) != length(t)) {
cat("Error: Your data and time are not the same length.\n")
stop()
}
Length of x and t has to be the same. Have a look what are you putting there. You are putting:
gr <- findgr(tmp[i, 3:N], time, ....
Time should be:
time <- seq(0, length(tmp[i, 3:N])-1)
-1 because sequence starts from 0.
However in my case (I generated some data) it produces some other errors. I hope this gives you a starting point.

Related

Binning by equal standard deviation R

I have a vector containing some data, in particular
tau_3[p_3<3]
[1] 7.837 7.813 6.276 8.669 7.001 6.032 6.897 5.967 9.417 8.251 7.892 8.752 9.873 9.461 8.591 7.697 8.372 9.324 9.135 7.807
[21] 10.034 10.701 9.315 6.979 9.843 8.742 8.829 7.406 8.588 6.803 7.462 8.379 8.075 8.294 8.218
which has to be studied with respect to another set of datapoints
>p_3[p_3<3]
[1] 0.020 0.021 0.022 0.023 0.024 0.026 0.028 0.014 0.029 0.030 0.033 0.035 0.037 0.040 0.042 0.044 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085
[25] 0.090 0.100 0.110 0.120 0.130 0.150 0.160 0.190 0.200 0.230 0.240
I would like to divide the pressure p_3 data (the subset given above) it in such a way that each bin has, more or less, the same standard deviation for the decay time \tau_3 data that it contains. In particular, I should have a vector containing the breaks for such binned data.
I don't know of any package that could do this and I've been scratching my head on how to do it for hours. If you could give me a solution I would be very grateful.

How to calculate normalised ratio indices in all possible band combinations and then correlation with environmental variables in R?

I want to calculate normalised ratio and simple ratio indices in all possible band combinations and then want to correlate with environmental variables in R. Then I want to identify the best combination giving the highest correlation. The implementation is available in hsdar package in R. But it is very slow for a large dataset. I am attaching a small dataset here
Environmental_variable WV_400 WV_401 WV_402 WV_403 WV_404 WV_405
95.512 0.035 0.034 0.034 0.034 0.034 0.034
97.900 0.047 0.047 0.047 0.046 0.046 0.046
92.897 0.004 0.004 0.006 0.008 0.009 0.009
94.209 0.011 0.012 0.013 0.016 0.017 0.017
87.472 0.010 0.010 0.011 0.014 0.015 0.015
91.109 0.010 0.011 0.013 0.015 0.016 0.016
92.830 0.024 0.025 0.026 0.028 0.029 0.029
I am giving one example code from hsdar package for reference
library(hsdar)
data(spectral_data)
## Calculate normalised ratio indices in all possible combinations
nri_WV <- nri(spectral_data, recursive = TRUE)
## Build glm-models
glmnri <- glm.nri(nri_WV ~ chlorophyll, preddata = spectral_data)
## Return best 10 models
BM <- nri_best_performance(glmnri, n = 10, coefficient = "p.value")
Any help in the form of R code as a fast alternative to the hsdar package is highly appreciated.

Error in model.frame.default ..... : invalid type (list) for variable

I'm new to R and getting the above error when running a neural network with the code below.
It works up until the "neuralnet" step and the following erroris displaye, which I can't resolve and the solutions in other threads don't seem to be the same (full output report including data below):
"Error in model.frame.default(formula.reverse, data) :
invalid type (list) for variable 'TrainingOutput.Y'"
the only thing that I see that is wrong (but haven't a solution for) is that the header for the first column is preceeded by strange characters even though these are not in the csv file ("ï..")- but I would doubt that this has an effect.
any suggestions?
Code being used:
install.packages('neuralnet') # Install neuralnet
library(neuralnet) # Load neuralnet
#Read Output Data from CSV
TrainingOutput.Y <- read.csv("C:\\data\\OutputData.csv", header = T)
#Read Input Data from CSV
TrainingInput.X <- read.csv("C:\\data\\InputData.csv", header = T)
# Join the columns and coerce to dataframe
head(TrainingInput.X)
head(TrainingOutput.Y)
TrainingSet.XY <- as.data.frame(cbind(TrainingInput.X, TrainingOutput.Y))
head(TrainingSet.XY)
# Train neural network
net.ILB <- neuralnet(TrainingOutput.Y ~ TrainingInput.X,
TrainingSet.XY,
hidden = 1,
threshold = 0.0001)
The full output report is:
library(neuralnet) # Load neuralnet
#Read Output Data from CSV
TrainingOutput.Y <- read.csv("C:\\data\\OutputData.csv", header = T)
#Read Input Data from CSV
TrainingInput.X <- read.csv("C:\\data\\InputData.csv", header = T)
# Join the columns and coerce to dataframe
head(TrainingInput.X)
# ï..Poot Scharnier Begrenzer Koppeling geleiders totalitems
# 1 0.114 0.036 0.036 0.016 0.016 0.443
# 2 0.025 0.009 0.009 0.008 0.008 0.193
# 3 0.000 0.016 0.016 0.008 0.008 0.123
# 4 0.050 0.017 0.017 0.001 0.001 0.359
# 5 0.070 0.006 0.006 0.004 0.004 0.268
# 6 0.004 0.008 0.008 0.002 0.002 0.061
head(TrainingOutput.Y)
# ï..Hours
# 1 0.66783333333
# 2 0.20643333333
# 3 0.22733566667
# 4 0.65986666667
# 5 0.16406666667
# 6 0.05576666667
TrainingSet.XY <- as.data.frame(cbind(TrainingInput.X, TrainingOutput.Y))
head(TrainingSet.XY)
# ï..Poot Scharnier Begrenzer Koppeling geleiders totalitems ï..Hours
# 1 0.114 0.036 0.036 0.016 0.016 0.443 0.66783333333
# 2 0.025 0.009 0.009 0.008 0.008 0.193 0.20643333333
# 3 0.000 0.016 0.016 0.008 0.008 0.123 0.22733566667
# 4 0.050 0.017 0.017 0.001 0.001 0.359 0.65986666667
# 5 0.070 0.006 0.006 0.004 0.004 0.268 0.16406666667
# 6 0.004 0.008 0.008 0.002 0.002 0.061 0.05576666667
# Train neural network
net.ILB <- neuralnet(TrainingOutput.Y ~ TrainingInput.X,
TrainingSet.XY,
hidden = 1,
threshold = 0.0001)
Error in model.frame.default(formula.reverse, data) :
invalid type (list) for variable 'TrainingOutput.Y'
You shouldn't be passing data.frames in a formula. Also, you are going to want to look into where those weird characters are coming from in your variable names. That doesn't seem right. (Maybe your CSV has a byte-order-marker? Not sure what the encoding might be.) You can "clean" the names with
names(TrainingInput.X)[1]<-"Poot"
names(TrainingOutput.Y)[1]<-"Hours"
and then your neural net call should look like this
net.ILB <- neuralnet(Hours ~ Poot + Scharnier + Begrenzer + Koppeling + geleiders + totalitems,
TrainingSet.XY,
hidden = 1,
threshold = 0.0001)'
This formula means we want to model Hours based on all the other columns in the TrainingSet.XY data.frame.

interpolate data series with R

I am having trouble interpolating the values of two data series. I have a reference time in first column. The second column is time linked for values of P130. I want to interpolate new values of P130 (third column) according to reference time.
The reference time and timeP130 have the first and last value the same and they are all in variable steps, so there is no pattern.
Reference_time timeP130 P130 results
0.0001 0.0001 0.2194 0.2194
0.000694 0.003 0.25 0.22552
0.00138889 0.0035 0.26 0.23164
0.00208333 0.006 0.24 0.23776
0.00277778 0.009 0.245 0.24388
0.003 0.009 0.255 0.25
0.00416667 0.0125 0.27 ETC
0.00486111 0.015 0.21
0.00555556 0.018 0.20
0.00625 0.0208 0.2194
0.00694444 0.021 0.2194
0.00763889 0.0211 0.2194
0.00833333 0.0215 0.2194
0.00902778 0.022 0.2195
0.00972222 0.0327 0.2591
0.0104167 0.0433 0.3664
0.0111111 0.0839 0.4068
0.0118056 2.5 0.4087
0.0125 0.27
0.0141944
0.0158889
0.0165833
0.0182778
2.5 0.4087

find the index of max value in data frame and add the value

This is my data frame:
>head(dat)
geno P1 P2 P3 P4 dif
1 G1 0.015 0.007 0.026 0.951 0.001
2 G2 0.008 0.006 0.015 0.970 0.001
3 G3 0.009 0.006 0.017 0.968 0.000
4 G4 0.011 0.007 0.017 0.965 0.000
5 G5 0.013 0.005 0.021 0.961 0.000
6 G6 0.009 0.006 0.007 0.977 0.001
Here, I need to find max in each row and add dat$dif to the max.
when i used which.max(dat[,-1]), I am getting error:
Error in which.max(dat[,-1]) :
(list) object cannot be coerced to type 'double'
A previous answer (by Scriven) gives most of it but as others have stated, it incorrectly includes the last column. Here is one method that works around it:
idx <- (! names(dat) %in% c('geno','dif'))
dat$dif + apply(dat[,idx], 1, max)
# 1 2 3 4 5 6
# 0.952 0.971 0.968 0.965 0.961 0.978
You can easily put the idx stuff directly into the dat[,...] subsetting, but I broke it out here for clarity.
idx can be defined by numerous things here, such as "all but the first and last columns": idx <- names(dat)[-c(1, ncol(dat))]; or "anything that looks like P#": idx <- grep('^P[0-9]+', names(dat)).
There's an app, eh function for that :-).
max.col finds the index of the maximum position for each row of a matrix. Take note, that as max.col expects a matrix (numeric values only) you have to exclude the “geno” column when applying this function.
sapply(1:6,function(x) dat[x,max.col(dat[,2:5])[x] +1]) + dat$dif
[1] 0.952 0.971 0.968 0.965 0.961 0.978

Resources