I have created the best fit for a non linear function. It seems to be working correctly:
#define a function
fncTtr <- function(n,d) (d/n)*((sqrt(1+2*(n/d))-1))
#fit
dFit <- nls(dData$ttr~fncTtr(dData$n,d),data=dData,start=list(d=25),trace=T)
summary(dFit)
plot(dData$ttr~dData$n,main="Fitted d value",pch=19,)
xl <- seq(min(dData$n),max(dData$n), (max(dData$n) - min(dData$n))/1000)
lines(xl,predict(dFit,newdata=xl,col=blue)
The plot for my observations are coming out correctly. I am having problems to display the best fit curve on my plot. I create the xl independent variable with 1000 values and I want to define the new values using the best fit. When I call the "lines" procedure, I get the error message:
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
If I try to execute only the predict function:
a <-predict(dFit,newdata=xl)
str(a)
I can see that xl has 1000 components but "a" has only 16 components. Shouldn't I have the same number of values in a?
data used:
n ttr d
1 35 0.6951 27.739
2 36 0.6925 28.072
3 37 0.6905 28.507
4 38 0.6887 28.946
5 39 0.6790 28.003
6 40 0.6703 27.247
7 41 0.6566 25.735
8 42 0.6605 26.981
9 43 0.6567 27.016
10 44 0.6466 26.026
11 45 0.6531 27.667
12 46 0.6461 27.128
13 47 0.6336 25.751
14 48 0.6225 24.636
15 49 0.6214 24.992
16 50 0.6248 26.011
Ok, I think I found the solution, however I'm not sure I would be able to explain it.
When calling predict.nls, what you're inputting to argument newdata has to be named according to the variable with which you're predicting (here n) and the name has to match that given in the original call to nls.
#Here I replaced dData$n with n
dFit <- nls(ttr~fncTtr(n,d),data=dData,start=list(d=25),trace=T)
plot(dData$ttr~dData$n,main="Fitted d value",pch=19,)
xl <- seq(min(dData$n),max(dData$n), (max(dData$n) - min(dData$n))/1000)
a <- predict(dFit,newdata=list(n=xl))
length(a)==length(xl)
[1] TRUE
lines(xl,a,col="blue")
Related
I am trying to run a function while sequentially adding sites (x+i) to a dataframe, which would result in the statistic plus the confidence intervals. For example, if I want to run a linear model with which I sequentially add a site to every iteration to better understand how the additional data from every site influences the fit. However, I want to include every possible site in each iteration to obtain the confidence interval for each iteration. In its current form, I am able to randomly sample a site, but not all possible sites for a given "x + i" iteration.
I know this particular issue could be addressed with the 'dredge' function. However, ideally I would set this up in a way so that I could easily [with some adjustment] replace the current linear model function with any other function (e.g., metaMDS, diversity).
I am sure there is a better way to perform this, but I am a relative newbie to these types of analyses. Any suggestions would be greatly appreciated!
Edit: I have been considering passing the below function through 'boot' although I haven't quite been able to get this loop to function in boot.
# data
set.seed(45)
dat <- data.frame(site=rep(LETTERS[1:6],3),mean=sample(1:20,18),rich=sample(5:32,18))
model<-lm(mean~rich,dat) # the full model
summary(model)
my_vec <- character() # Create empty character vector
my_site <- character() # Create empty character vector
for(i in seq(from=1, to=6, by=1)){ # increase number of sites at each iteration
dat_seq<-dat %>% subset(site %in% sample(levels(as.factor(site)), i)) # subset data based on number of sites
model<-lm(mean~rich,dat_seq)
result<-summary(model)$r.squared
my_out<-result
my_vec<-c(my_vec,my_out)
my_site<-c(my_site,i)
lm_results<-data.frame(sync=my_vec, site_no = my_site)
}
Something like this might help? Here I generate every combination of sites in the dataset (the combs list) then I lapply the model to the subset of the data corresponding to each element. The upper and lower CI and R^2 are returned.
x <- unique(dat$site)
combs <- do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
do.call(rbind, lapply( combs , function(x) {
dat2 = dat[dat$site %in% x,]
mod = lm(mean~rich, dat2)
data.frame(sites=paste(x, collapse=""),
lci=confint(mod)["rich",1],
uci=confint(mod)["rich",2],
r2=summary(mod)$r.squared)
})
)
sites lci uci r2
1 A -8.3174474 7.221600752 0.4453499992
2 B -5.5723683 5.818599482 0.0701472479
3 C -1.8397082 1.928749330 0.0826810176
4 D -3.5504781 2.253774792 0.8895987733
5 E -1.9782218 0.783889792 0.9679338880
6 F -0.3642690 0.202676480 0.9291569087
7 AB -1.0726850 0.631838143 0.1141900799
8 AC -1.0156746 0.486238667 0.1932050717
9 AD -1.3744991 0.089962986 0.5972134174
10 AE -1.3425429 0.359346030 0.3914262598
11 AF -1.2542336 1.094735972 0.0088070439
12 BC -0.3148719 0.536493520 0.1155061842
13 BD -0.8115027 0.263460008 0.3337377806
14 BE -1.0264258 0.376744253 0.2923566879
15 BF -1.1047222 0.961865064 0.0091250127
16 CD -0.9745928 0.341039802 0.3088694252
17 CE -0.9413738 0.549038074 0.1178103209
18 CF -0.8967742 1.165648399 0.0317149663
19 DE -0.8081655 -0.063530819 0.7253472880
20 DF -0.4928491 0.673804531 0.0443092831
21 EF -0.9565739 0.524655918 0.1407909531
22 ABC -0.5962015 0.353999681 0.0493374108
23 ABD -0.8365224 0.110852413 0.3191087122
24 ABE -0.8760695 0.210841908 0.2303024575
25 ABF -0.8266745 0.633602031 0.0137712837
26 ACD -0.9065180 0.066518021 0.3731538462
27 ACE -0.8472338 0.235549937 0.2031338155
28 ACF -0.7522162 0.720252734 0.0003762516
29 ADE -0.9661169 -0.041025998 0.4863258317
30 ADF -0.7657306 0.559208857 0.0190378530
31 AEF -0.8971295 0.489083497 0.0647322193
32 BCD -0.5771897 0.206912590 0.1511964736
33 BCE -0.5802808 0.341276672 0.0509875519
34 BCF -0.5806002 0.737926299 0.0112444750
35 BDE -0.6864459 0.004527069 0.4375645756
36 BDF -0.5930715 0.460544893 0.0124799554
37 BEF -0.8077064 0.411788016 0.0776553121
38 CDE -0.7399438 0.108174895 0.3071099077
39 CDF -0.5535068 0.623295610 0.0028013813
40 CEF -0.6905084 0.598692027 0.0040352416
41 DEF -0.5691343 0.342877359 0.0468583354
42 ABCD -0.6438371 0.095450002 0.2145588181
43 ABCE -0.6248798 0.195737009 0.1195408994
44 ABCF -0.5714679 0.519529413 0.0011238991
45 ABDE -0.7459710 -0.015192501 0.3500598278
46 ABDF -0.6397934 0.354865639 0.0391438801
47 ABEF -0.7297368 0.343203399 0.0605325928
48 ACDE -0.7739688 0.003126375 0.3281841191
49 ACDF -0.6236834 0.433241141 0.0158627591
50 ACEF -0.6696598 0.429949692 0.0230490498
51 ADEF -0.6839477 0.287476657 0.0763805047
52 BCDE -0.5735044 0.083072486 0.2169111169
53 BCDF -0.4853537 0.426339044 0.0020758928
54 BCEF -0.5621108 0.444630022 0.0067151679
55 BDEF -0.5715836 0.240391871 0.0762941714
56 CDEF -0.5364817 0.363030081 0.0181252387
57 ABCDE -0.6208064 0.020647714 0.2391257190
58 ABCDF -0.5292293 0.315066335 0.0225784375
59 ABCEF -0.5621816 0.333684980 0.0228222717
60 ABDEF -0.6093804 0.195345360 0.0867885013
61 ACDEF -0.5890752 0.262323665 0.0502230537
62 BCDEF -0.4898635 0.265972273 0.0305394982
63 ABCDEF -0.5239122 0.198342387 0.0539903463
I am trying to create a loop to use compare_means (ggpubr library in R) across all columns in a dataframe and then select only significant p.adjusted values, but it does not work well.
Here is some code
head(df3)
sampleID Actio Beta Gammes Traw Cluster2
gut10 10 2.2 55 13 HIGH
gut12 20 44 67 12 HIGH
gut34 5.5 3 89 33 LOW
gut26 4 45 23 4 LOW
library(ggpubr)
data<-list()
for (i in 2:length(df3)){
data<-compare_means(df3[[i]] ~ Cluster2, data=df3, paired = FALSE,p.adjust.method="bonferroni",method = "wilcox.test")
}
Error: `df3[i]` must evaluate to column positions or names, not a list
I would like to create an output to convert in dataframe with all the information contained in compare_means output
Thanks a lot
Try this:
library(ggpubr)
data<-list()
for (i in 2:(length(df3)-1)){
new<-df3[,c(i,"Cluster2")]
colnames(new)<-c("interest","Cluster2")
data<-compare_means(interest ~ Cluster2, data=new, paired = FALSE,p.adjust.method="bonferroni",method = "wilcox.test")
}
I use the caret package with multi-layer perception.
My dataset consists of a labelled output value, which can be either A,B or C. The input vector consists of 4 variables.
I use the following lines of code to calculate the class probabilities for each input value:
fit <- train(device~.,data=dataframetrain[1:100,], method="mlp",
trControl=trainControl(classProbs=TRUE))
(p=(predict(fit,newdata=dataframetest,type=("prob"))))
I thought that the class probabilities for each record must sum up to one. But I get the following:
rowSums(p)
# 1 2 3 4 5 6 7 8
# 1.015291 1.015265 1.015291 1.015291 1.015291 1.014933 1.015011 1.015291
# 9 10 11 12 13 14 15 16
# 1.014933 1.015206 1.015291 1.015291 1.015291 1.015224 1.015011 1.015291
Can anybody help me because I don't know what I did wrong.
There's probably nothing wrong, it just seems that caret returns the values of the neurons in the output layer without converting them to probabilities (correct me if I'm wrong). When using the RSNNS::mlp function outside of caret the rows of the predictions also don't sum to one.
Since all output neurons have the same activation function the outputs can be converted to probabilities by dividing the predictions by the respective row sum, see this question.
This behavior seems to be true when using method = "mlp" or method = "mlpWeightDecay" but when using method = "nnet" the predictions do sum to one.
Example:
library(RSNNS)
data(iris)
#shuffle the vector
iris <- iris[sample(1:nrow(iris),length(1:nrow(iris))),1:ncol(iris)]
irisValues <- iris[,1:4]
irisTargets <- iris[,5]
irisTargetsDecoded <- decodeClassLabels(irisTargets)
iris2 <- splitForTrainingAndTest(irisValues, irisTargetsDecoded, ratio=0.15)
iris2 <- normTrainingAndTestSet(iris2)
set.seed(432)
model <- mlp(iris2$inputsTrain, iris2$targetsTrain,
size=5, learnFuncParams=c(0.1), maxit=50,
inputsTest=iris2$inputsTest, targetsTest=iris2$targetsTest)
predictions <- predict(model,iris2$inputsTest)
head(rowSums(predictions))
# 139 26 17 104 54 82
# 1.0227419 1.0770722 1.0642565 1.0764587 0.9952268 0.9988647
probs <- predictions / rowSums(predictions)
head(rowSums(probs))
# 139 26 17 104 54 82
# 1 1 1 1 1 1
# nnet example --------------------------------------
library(caret)
training <- sample(seq_along(irisTargets), size = 100, replace = F)
modelCaret <- train(y = irisTargets[training],
x = irisValues[training, ],
method = "nnet")
predictionsCaret <- predict(modelCaret,
newdata = irisValues[-training, ],
type = "prob")
head(rowSums(predictionsCaret))
# 122 100 89 134 30 86
# 1 1 1 1 1 1
I don't know how much flexibility the caret package offers in these choices, but the standard way to make a neural net produce outputs which sum to one is to use the softmax function as the activation function in the output layer.
I'm trying to obtain a surface plot from data frame AAA:
j a m p o f
13929 0.86739583 19 165.83 0.1588727 13.24444
13930 0.63166667 19 178.19 0.6105804 12.68333
13932 0.90212963 17 157.77 0.3345627 12.52222
13933 0.80152778 68 146.19 0.1219885 12.35000
13934 0.75784722 62 134.88 0.1531627 12.36667
13935 0.57763889 66 123.80 0.4093869 12.47500
13936 0.56201389 88 112.87 0.9095722 12.45833
13937 0.51680556 26 102.03 0.8494420 12.37500
13938 0.46093333 28 91.20 0.9153419 12.21111
13939 0.16645833 24 80.30 0.8309784 12.04444
13940 0.15451389 36 69.23 2.2847927 12.15556
13941 0.51347222 134 57.92 2.9551087 12.42500
13942 0.33763889 128 46.31 3.5784096 12.53333
13943 0.12937500 38 34.33 3.7371723 12.47778
13944 0.42760870 63 22.00 4.7831677 12.46667
13945 0.09962121 8 9.36 4.8281897 12.30000
13950 0.97901515 18 57.70 0.0000000 12.15833
13951 0.85333333 14 71.07 0.0000000 12.48333
13952 0.92811594 14 84.28 10.0444672 12.49167
13953 0.84812500 42 97.29 7.8020987 12.51667
My code:
require(fields)
fitx <- Tps( AAA[, 4:6], AAA$a)
out.p <- predict.surface(fitx, xy = c(4,5))
plot.surface(out.p, type="p")
However, it doesn't run through. Apparently, the grid is insufficient to represent the data and it's not able to get the predict.surface.
In the Tps function, your x matrix is AAA[, 4:6] and hence has three columns.
But in the predict.surface function, you specified xy = c(4,5). The values passed to the xyparamter are relative to the matrix in your fitx object. Since the matrix used for creating fitx with the predict.surface function has three columns, you can't refer to the 4th and 5th column. Instead, the columns 4 and 5 of your original data.frame AAA correspond to columns 1 and 2 in fitx.
You might wish to try:
library(fields)
fitx <- Tps(AAA[, 4:6], AAA$a)
out.p <- predict.surface(fitx, xy = c(1,2)) # Note the different argument passed to `xy`
plot.surface(out.p, type="p")
I have a dataset that looks like so:
x y
1 0.0000 0.4459183993
2 125.1128 0.4068805502
3 250.2257 0.3678521348
4 375.3385 0.3294434397
5 500.4513 0.2922601919
6 625.5642 0.2566381551
7 750.6770 0.2229130927
8 875.7898 0.1914207684
9 1000.9026 0.1624969456
10 1126.0155 0.1364773879
11 1251.1283 0.1136978589
12 1376.2411 0.0944717371
13 1501.3540 0.0786550515
14 1626.4668 0.0656763159
15 1751.5796 0.0549476349
16 1876.6925 0.0458811131
17 2001.8053 0.0378895151
18 2126.9181 0.0304416321
19 2252.0309 0.0231041362
20 2377.1438 0.0154535572
21 2502.2566 0.0070928195
22 2627.3694 -0.0020708606
23 2752.4823 -0.0119351534
24 2877.5951 -0.0223944877
25 3002.7079 -0.0332811155
26 3127.8208 -0.0442410358
27 3252.9336 -0.0548855203
...
Full data available here.
It's easier to see visually by plotting x and y with a zero intercept line:
ggplot(dat,aes(x,y)) + geom_line() + geom_hline(yintercept=0)
You can see the plot here (if you don't want to download the data and plot it yourself.)
I want to pick out 'patches' defined as the distance along x from when the line goes above zero on the y till it goes below zero. This will always happen at least once (since the line starts above zero), but can happen many times.
Picking out the first patch is easy.
patch1=dat[min(which(dat$y<=0.000001)),]
But how would I loop through and pick up subsequent patches?
Here's a complete working solution:
# sample data
df <- data.frame(x=1:10, y=rnorm(10))
# find positive changes in "y"
idx <- which(c(FALSE, diff(df$y > 0) == 1))
# get the change in "x"
patches <- diff(c(0, df[idx, "x"]))