Report weight from row in R - r

I am a beginner in R who got this question:
One of the functions we will be using often is sample(). Read the help file for sample() using ?sample. Now take a random sample of size 1 from the numbers 13 to 24 and report back the weight of the mouse represented by that row. Make sure to type set.seed(1) to ensure that everybody gets the same answer.
I tried this:
set.seed(1)
i <- sample( 13:24, 1)
dat$Bodyweight[i]
And got the answer 25.34. But apparently, that's wrong. What am I doing wrong?!

you need to detect which col u want to choose from first
set.seed(1)
sample(data$Bodyweight[13:24] ,1 )

Related

Getting a fixed number of zeroes randomly in a matrix in R

I have a matrix of size 2000x50. For the 50 places in each of the 2000 rows, I want exactly 6 of them to be 1 and the remaining 44 to be 0. This distribution needs to be random across each row. I have tried using the sample, rbinom functions but none of them seem to be helping. It is also possible that I might not be using them correctly. All thoughts and inputs regarding this will be appreciated.
Thank you.
Edit- Initially I wanted those 6 numbers to be one but now I want them to be sampled randomly from a gamma distribution with both shape and scale= 4. How do I make changes to the suggestions below to incorporate this? I am very new to R and these basic things seem to be troubling me. Thanks once again.
This will create the object you requested:
do.call("cbind", lapply(1:2000, function(x) sample(c(rep(1, 6), rep(0, 44)))))
Based on the approach by #Gki in the comments, you can generate a matrix via replicate + t, i.e.,
m <- t(replicate(2000,sample(rep(0:1, c(44,6)))))

How to extract top features by CATScore in r?

I am running a machine learning algorithm that uses CAT score for feature selection as
library(sda)
train1<- data.matrix(train, rownames.force = NA)
ranking.LDA = sda.ranking(train1[,1:lengthvar], train1[,lengthtrain], diagonal=FALSE)
topfs<-which(ranking.LDA[,"score"] >2)
My question is how to ask the CAT score to give me for example top 20 features? The only way I could extract features was setting a threshold, but this way, it gives me various number of features for different data set. What I want is always having eg. top 20 (or any other number) features.
Thanks in advance for your valuable contribution.
ranking.LDA gives a list of numbers.Hence we use a list function.
#As ranking.LDA gives a ranking of predictors we directly extract column names using this ranking.
colnames(train1[,ranking.LDA[1:20]])

Bourdet Derivative in R with Smoothing Window

I am calculating pressure derivatives using algorithms from this PDF:
Derivative Algorithms
I have been able to implement the "two-points" and "three-consecutive-points" methods relatively easily using dplyr's lag/lead functions to offset the original columns forward and back one row.
The issue with those two methods is that there can be a ton of noise in the high resolution data we use. This is why there is the third method, "three-smoothed-points" which is significantly more difficult to implement. There is a user-defined "window width",W, that is typically between 0 and 0.5. The algorithm chooses point_L and point_R as being the first ones such that ln(deltaP/deltaP_L) > W and ln(deltaP/deltaP_R) > W. Here is what I have so far:
#If necessary install DPLYR
#install.packages("dplyr")
library(dplyr)
#Create initial Data Frame
elapsedTime <- c(0.09583, 0.10833, 0.12083, 0.13333, 0.14583, 0.1680,
0.18383, 0.25583)
deltaP <- c(71.95, 80.68, 88.39, 97.12, 104.24, 108.34, 110.67, 122.29)
df <- data.frame(elapsedTime,deltaP)
#Shift the elapsedTime and deltaP columns forward and back one row
df$lagTime <- lag(df$elapsedTime,1)
df$leadTime <- lead(df$elapsedTime,1)
df$lagP <- lag(df$deltaP,1)
df$leadP <- lead(df$deltaP,1)
#Calculate the 2 and 3 point derivatives using nearest neighbors
df$TwoPtDer <- (df$leadP - df$lagP) / log(df$leadTime/df$lagTime)
df$ThreeConsDer <- ((df$deltaP-df$lagP)/(log(df$elapsedTime/df$lagTime)))*
((log(df$leadTime/df$elapsedTime))/(log(df$leadTime/df$lagTime))) +
((df$leadP-df$deltaP)/(log(df$leadTime/df$elapsedTime)))*
((log(df$elapsedTime/df$lagTime))/(log(df$leadTime/df$lagTime)))
#Calculate the window value for the current 1 row shift
df$lnDeltaT_left <- abs(log(df$elapsedTime/df$lagTime))
df$lnDeltaT_right <- abs(log(df$elapsedTime/df$leadTime))
Resulting Data Table
If you look at the picture linked above, you will see that based on a W of 0.1, only row 2 matches this criteria for both the left and right point. Just FYI, this data set is an extension of the data used in example 2.5 in the referenced PDF.
So, my ultimate question is this:
How can I choose the correct point_L and point_R such that they meet the above criteria? My initial thoughts are some kind of while loop, but being an inexperienced programmer, I am having trouble writing a loop that gets anywhere close to what I am shooting for.
Thank you for any suggestions you may have!

Divide column values within a vector

I'm not sure if my title is properly expressing what I'm asking. Once I'm done writing, it'll make sense. Firstly, I just started learning R, so I am a newbie. I've been reading through tutorial series and PDF's I've found online.
I'm working on a data set and I created a data frame of just the year 2001 and the DAM value Bon. Here's a picture.
What I want to do now is create a matrix with 3 columns: Coho Adults, Coho Jacks and the third column the ratio of Coho Jacks to Adults. This is what I'm having trouble with. The ratio between Coho Jacks to Adults.
If I do a line of code like this I get a normal output.
(cohoPassage <- matrix(fishPassage1995BON[c(5,6, 7)], ncol = 3))
The values are 259756, 6780 114934.
I'm figuring in order to get the ratio, I should divide column 5 and column 6's values. So basically 259756/6780 = 38.31
I've tried many things like:
(cohoPassage <- matrix(fishPassage1995BON[c(5,6, 5/6)], ncol = 3))
This just outputs the value of the fifth column instead of dividing for some reason
I've tried this:
matrix(fishPassage1995BON[c(5,6)],fishPassage1995BON[,5]/fishPassage1995BON[,6], ncol = 3)
Which gives me an incorrect output
I decided to break down the problem and divide the fifth and sixth columns separately and it gave the correct ratio.
If I create a matrix like this
matrix(fishPassage1995BON[,5]/fishPassage1995BON[,6])
It outputs the correct ratio of 38.31209. But when I try to combine everything, I just keep getting errors.
What can I do? Any help would be appreciated. Thank you.

R: Sample into bins of predefined sizes (partition sample vector)

I'm working on a dataset that consists of ~10^6 values which clustered into a variable number of bins. In the course of my analysis, I am trying to randomize my clustering, but keeping bin size constant. As a toy example (in pseudocode), this would look something like this:
data <- list(c(1,5,6,3), c(2,4,7,8), c(9), c(10,11,15), c(12,13,14));
sizes <- lapply(data, length);
for (rand in 1:no.of.randomizations) {
rand.data <- partition.sample(seq(1,15), partitions=sizes, replace=F)
}
So, I am looking for a function like "partition.sample" that will take a vector (like seq(1,15)) and randomly sample from it, returning a list with the data partitioned into the right bin sizes given already by "sizes".
I've been trying to write one such function myself, since the task seems to be not so hard. However, the partitioning of a vector into given bin sizes looks like it would be a lot faster and more efficient if done "under the hood", meaning probably not in native R. So I wonder whether I have just missed the name of the appropriate function, or whether someone could please point me to a smart solution that is around :-)
Your help & time are very much appreciated! :-)
Best,
Lymond
UPDATE:
By "no.of.randomizations" I mean the actual number of times I run through the whole "randomization loop". This will, later on, obviously include more steps than just the actual sampling.
Moreover, I would in addition be interested in a trick to do the above feat for sampling without replacement.
Thanks in advance, your help is very much appreciated!
Revised: This should be fairly efficient. It's complexity should be primarily in the permutation step:
# A single step:
x <- sample( unlist(data))
list( one=x[1:4], two=x[5:8], three=x[9], four=x[10:12], five=x[13:16])
As mentioned above the "no.of.randomizations" may have been the number of repeated applications of this proces, in which case you may want to wrap replicate around that:
replic <- replicate(n=4, { x <- sample(unlist(data))
list( x[1:4], x[5:8], x[9], x[10:12], x[13:15]) } )
After some more thinking and googling, I have come up with a feasible solution. However, I am still not convinced that this is the fastest and most efficient way to go.
In principle, I can generate one long vector of a uniqe permutation of "data" and then split it into a list of vectors of lengths "sizes" by going via a factor argument supplied to split. For this, I need an additional ID scheme for my different groups of "data", which I happen to have in my case.
It becomes clearer when viewed as code:
data <- list(c(1,5,6,3), c(2,4,7,8), c(9), c(10,11,15), c(12,13,14));
sizes <- lapply(data, length);
So far, everything as above
names <- c("set1", "set2", "set3", "set4", "set5");
In my case, I am lucky enough to have "names" already provided from the data. Otherwise, I would have to obtain them as (e.g.)
names <- seq(1, length(data));
This "names" vector can then be expanded by "sizes" using rep:
cut.by <- rep(names, times = sizes);
[1] 1 1 1 1 2 2 2 2 3 4 4 4 5
[14] 5 5
This new vector "cut.by" can then by provided as argument to split()
rand.data <- split(sample(1:15, 15), cut.by)
$`1`
[1] 8 9 14 4
$`2`
[1] 10 2 15 13
$`3`
[1] 12
$`4`
[1] 11 3 5
$`5`
[1] 7 6 1
This does the job I was looking for alright. It samples from the background "1:15" and splits the result into vectors of lengths "sizes" through the vector "cut.by".
However, I am still not happy to have to go via an additional (possibly) long vector to indicate the split positions, such as "cut.by" in the code above. This definitely works, but for very long data vectors, it could become quite slow, I guess.
Thank you anyway for the answers and pointers provided! Your help is very much appreciated :-)

Resources