How do I analyze movement between points in R? [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
So I have a lot of points, kind of like this:
animalid1;A;time
animalid1;B;time
animalid1;C;time
animalid2;A;time
animalid2;B;time
animalid2;A;time
animalid2;B;time
animalid2;C;time
animalid3;A;time
animalid3;B;time
animalid3;C;time
animalid3;B;time
animalid3;A;time
What I want to do is to first of all make R understand that the points A,B,C are connected. Then I want to get comparisons of movement from A to C and how long time it takes, how many steps were used, etc. So maybe I have a movement sequence like ABC on 20 animals and then ABABC on 10 animals and then ABCBA on 5 animals. I want to get some sort of statistical test done to see if the total time is different between these groups, and so on.
I bet this has been done before. But my Google skills are not good enough to find it.

Look at the msm package (msm stands for Multi State Model). Given observations of states at different times it will estimate probabilities of transitions and average time in the different states.

Related

Interpolation and forecasting out of 2 values in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a vector of yearly population data from 1980 to 2020 with only two values (years 2000 and 2010) and I need to predict the missing data.
My first thought was to use na.approx to fill in the missing data between 2000 and 2010 and then use the ARIMA model. However, as the population is declining, in the remote future its values would become negative, which is illogical.
My second thought was to use differences of logarithms between the sample data dividing it by 10(since there is a 10 year gap between the actual values) and using it as a percentage change to predict the missing data.
However, I am new to R and statistics so I am not sure if this is the best way to get the predictions. Any ideas would be really appreciated.
Since the line that the two data points provides does not make intuitive sense, I would recommend just using the average of the two unless you can get additional data. If you are able to get either more yearly data, or even expected variation values, then you can do some additional analysis. But for now, you're kinda stuck.

Constructing a transfer function [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
How would I go about creating the below transfer function?
It would take two parameters:
position of bulge in range (-1.0, +1.0)
sharpness of bump
That picture only demonstrates movement in the first parameter.
I can think of a few possible approaches:
figure out a formula
Bezier curves?
start with a few points and do some kind of chain-links type physics dynamical simulation, where each link exerts a force of its neighbours, and the end links are held low, a particular link is held high
something like the above but starting out with a crude shape and filtering out high frequencies
However I can't see any simple way to set out on any of the above approaches.
Can anyone see a clean way to crack it?
Looks like a normal distribution with variable skew to me. I would look for something like that before I'd go for Bezier curves.

svm in R, train data set [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
More a general question, but since I am using R -> tags
My training data set is 15,000 entries big from which around 20 i would like to use for positive data set -> building up the svm. I wanted to use the remaining resampled dataset as my negative dataset, but i was wondering, it might be better to take the same size (around 20) as the negative data set, otherwise it's highly imbalanced? Is there an easy approach to pool then the classifiers (ensemble based) in R after 1000 rounds of resampling? (or even with the e1071 package)
Followup question: I would like to calculate a score for each prediction afterwards, is it fine just to take the probabilities times 100??
Thx
You can try "class weight" approach in which the smaller class gets more weight, thus taking more cost to mis-classify the positive labelled class.

R normalization with all samples, or just the part that i need? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am using the edgeR and Limma packages to analyse a RNA-seq count data table.
I only need a subset of the data file, therefore my question is: Do I need to normalize my data within all the samples, or is it better to subset my data first and normalize the data then.
Thank you.
Regards Lisanne
I think it depends on what you want to proof/show. If you also want to take into account your "darkcounts" than you should normalize it at first such that you also take into account the percentage in which your experiment fails. Here your total number of experiments ( good and bad results) sums up to one.
If you want to find out the distribution of your "good events" than you should first produce your subset of good samples and normalize afterwards. In this case your number of good events sums up to 1
So once again, it depends on what you want to proof. As a physicist I would prefer the first method since we do not remove bad data points.
Cheers TL

Uniform Random Selection with Replacement [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
Suppose you have a deck of 100 cards, with the numbers 1-100 on one side. You select a card, note the number, replace the card, shuffle, and repeat.
Question #1: How many cards (on average) must you select to have drawn the same card twice? Why?
Question #2: How many cards (on average) must you select to have drawn all of the cards at least once? Why?
(thanks, it has to do with random music playlists and making the option to not repeat the shuffle, as it were)
Q1: Relates to Birthday paradox problem
As you see in the collision problem section(in wikipedia link above), your question maps exactly.
Cast as a collision problem
The birthday problem can be generalized as follows: given n random integers drawn from a discrete uniform distribution with range [1,d], what is the probability p(n;d) that at least two numbers are the same? (d=365 gives the usual birthday problem.)
You have a range [1,100] from which you select random cards. The probability of collision(two selected cards are the same) is given as p(n;d) = ...
Further down, we have formula for average/expected number of selections as
Q(100) gives your answer.

Resources