Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
set.seed(1234)
dataPartition <- sample(2,nrow(data),replace=TRUE,prob=c(0.7,0.3))
trainData <- data[dataPartition ==1,]
testData <- [dataPartition ==2,]
It partition your data into two groups.
sample(2,nrow(data),replace=TRUE,prob=c(0.7,0.3))
You sample a vector in the length of your matrix which is composed of 1 and 2 with probability of 0.7 and 0.3.
trainData <- data[dataPartition ==1,]
testData <- data[Partition ==2,] ## Fixed the brackets
This is just to divide your data into two in order to be able (i presume) validate a model.
Here is a more detailed answer to why divide your data into train and test
https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am trying to make a histogram of grades. Here are my variables.
> grade <- factor(c("A","A","A","B","A","A","A","A","B","A","C","B","B","B"))
> numberBook <- c(53,42,40,40,39,34,34,30,28,24,22,21,20,16)
But when I plot it, I get an error message.
> hist(numberBook~grade)
Error in hist.default(numberBook ~ grade) : 'x' must be numeric
What can I do?
I'm not sure why you've got multiple letters so I've guessed that you want a total of all the A, B and Cs. This may not be quite right. I've recreated your data like this using rep and summing the counts of grades (could be wrong)
data <-c(rep("A",(53+42+40+34+34+30+28+22)), rep("B",(39+24+20+16+22)),rep("C",22))
Then I can plot the data using barplot:
barplot(prop.table(table(data)))
Barplot is probably what you want here.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have a number of the vector with the numbers.
test <- 0.495
vector <- c(0.5715122, 2.2860487, 5.1436096, 9.1441949)
This vector is the need to take an approximate number to the number 0.495.
Help me.
If I've understood correctly, you want to extract the value from a vector that is closest to your test value.
vector[which.min(abs(vector - test))]
#[1] 0.5715122
If two different values could be closest, you could do this:
vector <- c(0.5715122, 2.2860487, 5.1436096, 9.1441949, 0.4184878)
tol <- sqrt(.Machine$double.eps)
vector[which(abs(vector - test) - min(abs(vector - test)) < tol)]
#[1] 0.5715122 0.4184878
tol is a tolerance accounting for floating point accuracy and usually chosen based on help(".Machine").
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to generate a plot looking like this:
Could you give me a hint how top achieve that? I startet with:
T1 <- c(23.2,34.5,76.3,65.8,12.6)
T2 <- c(15.6,12.4,21.8,20,5.2)
T3 <- c(15.6,12.4,21.8,20,5.2)
A <- gl(5,1,5,labels=c("Mähen","Wenden","Schwaden","Pressen","Abtransport"))
data <- cbind(T1,T2,T3)
rownames(data) <- levels(A)
barplot(x1,names.arg=levels(A))
barplot(x3,names.arg=levels(A))
#barplot(t(data),beside=F, ylim=c(0,100),legend.text=colnames(data),
barplot(t(data),beside=F, legend.text=colnames(data),
col=c("grey50","grey80"),ylab="Arbeitszeitbedarf [h/ha]")
This is somewhat like you requested... except for all the missing values that you do not provide.
png('rplot2.png'); par(mar=c(5,4,4,5) )
data <- cbind('T1 - Grundzeit'=T1,'T2 - Hilfszeit'=T2)
rownames(data) <- levels(A)
barplot(t(data),beside=F, legend.text=colnames(data),
col=c("grey50","grey80"),ylab="Arbeitszeitbedarf [h/ha]",
args.legend=list(inset=4,x=7,y=70))
dev.off()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Does anybody know the examples on how to run paired ttest in Matlab/R/SAS or Python/Java on many columns (I have 1139 variables) in all combinations or selected respective columns in a loop.
thank you
MATLAB Solution:
If I understand correctly, you're just looking for a way to feed ttest with two different columns from your input matrix everytime. You can get all possible combinations of column pairs using nchoosek:
pairs = nchoosek(1:size(X, 2), 2);
Now you can iterate over these indices, each time invoking ttest with a different pair:
for idx = transpose(pairs)
h = ttest(X(:, idx(1)), X(:, idx(2)));
%// Do something with the result...
end
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a table with two variables.The data is from NMR.So when I plot I get a spectrum.I found the peaks in plot.But I need to know how to list the values of peak and store them into a variable.Anyone please help.
An easy implementation based on Brian Ripley's post at R-help:
peaks <- function(x, halfWindowSize) {
windowSize <- halfWindowSize * 2 + 1
windows <- embed(x, windowSize)
localMaxima <- max.col(windows, "first") == halfWindowSize + 1
return(c(rep(FALSE, halfWindowSize), localMaxima, rep(FALSE, halfWindowSize)))
}
Example:
x <- c(1,3,1,3,1)
peaks(x, 1)
## [1] FALSE TRUE FALSE TRUE FALSE