for loop i+1 in R - r

I'm trying to iterate through a matrix called XY with 50 rows and 100 columns (divided into 50 pairs of X and Y values descending alongside each other) with a for loop:
for (i in 1:50){
slope=atan(((XY[i+1,2]-XY[i,2])/(XY[i+1,1]-XY[i,1]))*100)
}
So as you can see on top XY[i+1,2]-XY[i,2]), I'm trying to take y the i-th value and subtract it from the next one and iterate through the entire list for each consecutive pair of descending values and then divide that by that by the corresponding x increments to get the slope and convert that into an angle using atan(()*100).
Unfortunately it keeps telling me that XY[i+1,2] is "out of bounds" and I'm pretty sure I have equal brackets on each side of the equation.

Related

How to save values in Vector using R

I am supposed to find the mean and standard deviation at each given sample size (N), using the "FOR LOOP". I started writing the code as below, I am required to save all the means into vector "p". How do I save all the means into one vector?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( N in sample.sizes ){
x <- rnorm(3,mean=0,sd=1)
mean.sds[i]
}
mean(x)
Actually you are doing many thing wrong?
If you are using variable N in for loop, you are not using it anywhere
for (N in 'some_vector') actually means N will take that value one by one. So N in sample sizes will first take, 3 then 10 then 50 and so on.
Now where does i come into picture?
You are calculating x for each iteration of N. In fact you are not using N anywhere in the loop?
first x will return 3 values. In the next line you intend to store these three values in just ith value of mean.sds where i is unknown and storing three values into one value, as it is, is not logically possible.
Do you want this?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( i in seq_along(sample.sizes )){
x <- rnorm(sample.sizes[i], mean=0, sd=1)
mean.sds[i] <- mean(x)
}
mean.sds
[1] 0.6085489531 -0.1547286299 0.0052106559 -0.0452804986 -0.0374094936 0.0005667246
I replaced N with seq_along(sample.sizes) which will give iterations equal to the number of that vector. Six in this example.
I passed each ith element to first argument of rnorm to generate these many random values.
Stored each random value into single vector. calculated its mean (one value only) and stored in ith value of your empty vector.

Find column with values closest to vector

I have a vector containing times in milliseconds looking like this;
vector <- c(667753, 671396, 675356, 679286, 683413, 687890, 691742,
695651, 700100, 704552, 708832, 713117, 717082, 720872, 725002, 729490,
733824, 738233, 742239, 746092, 750003, 754236, 867342, 870889, 873704,
876617, 879626, 882595, 885690, 888602, 891789, 894717, 897547, 900797,
903615, 906646, 909624, 912613, 915645, 918566, 921792, 924625, 927538,
930721, 933542)
Now i want to look into a large data frame with a lot of time columns and search for a single column that contains time values being closest (row-wise) to my vector time values.
The data.frame containing all the columns is of the same number of rows. So lets say my vector has 240 elements, then every column in the larger data.frame consists of 240 rows.
Any idia how to do this ?
You can calculate the euclidean distance from your vector and each column of the dataframe and then check which column has the smallest distance:
which.min(sapply(1:ncol(dataFrame), function(i) sqrt(sum((t(v)-dataFrame[,i])^2))))
The above returns the index of the column with the lowest distance.
Where dataFrame is the data frame containing columns of different times(so we compare each column to the vector v) and v is the vector.
The following is just the square root of the sum of squared distances (euclidean distance):
sqrt(sum((t(v)-dataFrame[,i])^2)))
You can also use the following as a distance measure:
abs(t(v)-dataFrame[,i])
EDIT
As Evan Friedland pointed out you can actually just use:
which.min(colSums(abs(v-dataFrame)))
or
which.min(sqrt(colSums((t(v)-dataFrame)^2)))

Iterating a vector over a list in R

I am dealing with some computational feature extracting problem from RNA data, and I found myself unable to deal with this question:
I have n sequences (say two for example) from which I obtained an iterated statistic i times (kind of doing a Monte Carlo iteration for analizing distribution of obtained statistics compared with original).
Example:
Say we iterate 10 times
n <- 10
I got a vector of 20 values with all the iterations, but this vector corresponds to two different sequences, so I must divide this vector in two equal parts (the iterations are ordered 1:10 - 1:10 for each sequence).
MFEit <- c(10, 12, 34, 32, 12 .....) ## vector of length 20
MFEit.split <- split(MFEit, ceiling(MFEit.along/n5))
This generates a list of two items each with 10 values, named $1 and $2
On the other hand I have a vector of two values which are the original statistics, each corresponding to each original sequence
MFE <- c(25, 15)
What I want to do is to know how many values of first item in the list MFEit.split, are equal or less than the first value of MFE, and, iteratively, how many values of second item in the list MFEit.split, are equal or less than the second value of MFE, and so on, provided that I would have more than two values or items.
I know how to do it one by one, say:
R <- length(subset(MFEit.split$`1`, MFEit.split$`1`<=MFE[1]))
R <- length(subset(MFEit.split$`2`, MFEit.split$`1`<=MFE[2]))
But... how to include this into a loop so that I can get iteratively each comparison, no matter how many MFE values or items in the list I have?
The desired output would be a vector called R, with n values corresponding to each comparison.
Any help?...

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

How can I skip increments in R 'for' loop?

I need to find stretches of values above 0 in a numeric vector where there are at least 10 members within each region. I do not want to check every single position as it would be very time intensive (vector is over 10 million).
Here is what I'm trying to do (very preliminary as I can't figure out how to skip increments in for loop):
1. Check if x[i] (start position) is positive.
a) if positive, check to see if x[i+10] (end position) is positive (since we want at least length 10 of positive integers)
* if positive, check every position in between to see if positive
* if negative, move to x[i+11], skip positions (e.g. new start position is x[i+12]) in between start & end positions since we would not get >10 members if negative end position is included.
x <- rnorm(50, mean=0, sd=4)
for(i in 1:length(x)){
if(x[i]>0){ # IF START POSITION IS POSITIVE
flag=1
print(paste0(i, ": start greater than 1"))
if(x[i+10]>0){ # IF END POSITION POSITIVE, THEN CHECK ALL POSITIONS IN BETWEEN
for(j in i+1:i+9){
if(x[j]>0){ # IF POSITION IS POSITIVE, CHECK NEXT POSITION IF POSITIVE
print(paste0(j, ": for j1"))
}else{ # IF POSITION IS NEGATIVE, THEN SKIP CHECKING & SET NEW START POSITION
print(paste0(j, ": for j2"))
i <- i+11
break;
}
}
}else{ # IF END POSITION IS NOT POSITIVE, START CHECK ONE POSITION AFTER END POSITION
i <- i+11
}
}
}
The issue I have is that even when I manually increment i, the for loop i value masks the new set value. Appreciate any insight.
I dunno if this approach is as efficient as Curt F's, but how about
runs <- rle(x>0)
And then working with the regions defined by runs$lengths>10 & runs$values ==TRUE ?
Here is a solution that finds stretches of ten positive numbers in a vector of length ten million. It does not use the loop approach suggested in the OP.
The idea here is to take the cumulative sum of the logical expression vec>0. The difference between position n and n-10 will be 10 only if all values of the vector at positions between n-10 and n are positive.
filter is an easy and relatively fast way to calculate these differences.
#generate random data
vec <- runif(1e7,-1,1)
#cumulative sum
csvec <- cumsum(vec>0)
#construct a filter that will find the difference between the nth value with the n-10th value of the cumulative sign vector
f11 <- c(1,rep(0,9),-1)
#apply the filter
fv <- filter(csvec, f11, sides = 1)
#find where the difference as computed by the filter is 10
inds <- which(fv == 10)
#check a few results
> vec[(inds[1]-9):(inds[1])]
[1] 0.98457526 0.03659257 0.77507743 0.69223183 0.70776891 0.34305865 0.90249491 0.93019927 0.18686722 0.69973176
> vec[(inds[2]-9):(inds[2])]
[1] 0.0623790 0.8489058 0.3783840 0.8781701 0.6193165 0.6202030 0.3160442 0.3859175 0.8416434 0.8994019
> vec[(inds[200]-9):(inds[200])]
[1] 0.0605163 0.7921233 0.3879834 0.6393018 0.2327136 0.3622615 0.1981222 0.8410318 0.3582605 0.6530633
#check all the results
> prod(sapply(1:length(inds),function(x){prod(sign(vec[(inds[x]-9):(inds[x])]))}))
[1] 1
I played around with system.time() to see how long the various steps took. On my not-very-powerful laptop the longest step was filter(), which took just over half a second for a vector of length ten million.
Vectorised solution using only basic commands:
x <- runif(1e7,-1,1) # generate random vector
y <- which(x<=0) # find boundaries i.e. negatives and zeros
dif <- y[2:length(y)] - y[1:(length(y)-1)] # find distance in boundaries
drange <- which(dif > 10) # find distances more than 10
starts <- y[drange]+1 # starting positions of sequence
ends <- y[drange+1]-1 # last positions of sequence
The first range you want is from x[starts[1]] to x[ends[1]] , etc.

Resources