Generate vector (data) of a normal distribution with outliers? [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Improve this question
In R, how do you generate a vector (data) with outliers? Great if the data is "acceptable" normal distributed.

#DWin is right that this depends on what you mean by "outlier". For the record, I use the same definition that he is using, so I would use (have used) something like the code he, and #Ferdinand.kraft, list. Others sometimes mean a datum more extreme than you might typically find. This is tricky to define for a simulation study, but a common definition is a point more than 1.5 times the interquartile range past the 1st (3rd) quartile. Here is a simple way to find that (I'm sure there will be more efficient ways):
flag <- 0
while(flag==0){
X <- rnorm(N)
bp <- boxplot(X, plot=FALSE)
if(length(bp$out)!=0){
flag <- 1
}
}

This really depends on the definition of "outlier";
c(rnorm(100), 100, -100) # an egregious example
plot(density( c( rnorm(90), rnorm(5, 1) ) ) ) # not as egregious

Related

Trying to find the win/loss probability based on the previous result [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I am new to data science and I came across this exercise that I can't figure out.
I have a data set containing binary data, which represents win and loss for the team. I wanna find out the probability for the win and loss depending on the result of the previous game.
Something like this.
win loss
prev win ? ?
prev loss ? ?
I am not asking for the code here. Though it would be helpful if you do. I just want to understand how to go about doing it.
You can generate that prop.table by comparing the result to the lagged (previous) result:
library(dplyr)
results <- data.frame(results = rbinom(100,1, 0.5)) %>%
mutate(prev_result = lag(results))
prop.table(table(results$prev_result, results$results))

I'm using set.seed() but getting different answers in each run [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I thought if I used set.seed() inside a function then every time I ran that function the same seed would be used and I would get the same quasi random output. Take the following example:
my_fun <- function(n, v1, v2){
set.seed = 42
return(runif(n, v1, v2))
}
my_fun(1,2,3)
#> [1] 2.078126
my_fun(1,2,3)
#> [1] 2.918556
my_fun(1,2,3)
#> [1] 2.189768
I was expecting to get the same result every time I ran that function with the same inputs. Can you give me some education on why I don't?
set.seed() is a function expecting a parameter equal to the value you want to seed the pseudorandom number generator(prng) with. The seed is the value used to start the number generation from. Most prng will use the current time as default, but when you pass it a seed you are determining the starting value and therefore all values to come after it as well.
So you need to call it like
set.seed(42) to set your seed appropriately
Here is another question that gives a good response on what this function is actually doing https://stats.stackexchange.com/questions/86285/random-number-set-seedn-in-r

svm in R, train data set [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
More a general question, but since I am using R -> tags
My training data set is 15,000 entries big from which around 20 i would like to use for positive data set -> building up the svm. I wanted to use the remaining resampled dataset as my negative dataset, but i was wondering, it might be better to take the same size (around 20) as the negative data set, otherwise it's highly imbalanced? Is there an easy approach to pool then the classifiers (ensemble based) in R after 1000 rounds of resampling? (or even with the e1071 package)
Followup question: I would like to calculate a score for each prediction afterwards, is it fine just to take the probabilities times 100??
Thx
You can try "class weight" approach in which the smaller class gets more weight, thus taking more cost to mis-classify the positive labelled class.

Mathematical induction proofs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
For my theory of computation class, we are supposed to do some review/practice problems to work off the rust and make sure we are ready for the course. Some of the problems are induction proofs. I did this at one time, but apparently it has completely escaped me. I've watched a couple tutorials, but still can't do problem 'a'. If anyone can walk me through the first problem I'm pretty sure I could figure out the second one on my own. Any help would be appreciated!
First verify it holds for n = 1.
Then assume it is true for n = x ( the sum of the first x squares ) and then try to compute the sum of the the first x + 1 squares. You know the result for the first x, you just add the last square to that sum. From there it should be easy.
And you posted on the wrong site.

Finding i values where Y = a number [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm a student just learning how to use R and thus far I've made a bit of progress. I'm snagging at a question which asks: For what values of i does Y equal 3?
the data set: c(3,5,2,3,5,4,4,2,3,5)
If I understand your question correctly, you want the index, i inside the data set (in this case, a vector) Y such that Y[i]=3?
Then you just need to use the which function. For more information, make sure you try reading the help files, which you can invoke using the command ?which or help(which)
Now, some code:
# Your data
Y <- c(3,5,2,3,5,4,4,2,3,5)
# Find the index where Y is equal to 3
which(Y==3, arr.ind=TRUE)
And welcome to SO. This is a pretty common question for beginners, so next time, make sure you Google or search around for a solution to elementary problems such as these. Have a good day.

Resources