if-else is not producing the expected output in R - r

I have a string:
> all_scn[1]
[1] "Cars_20160601_01.hdf5"
I want to use it to repeat some numbers based on a variable last_step:
> last_step
[1] 439
if-else statement:
> ifelse(substring(all_scn[1], 1, 1)=="C",
rep(seq(0, last_step-1, 1), 13),
rep(seq(0, last_step-1, 1), 12))
[1] 0
But you see that instead of repeating a numeric vector of 0:438, 13 times, it just produces zero. Outside ofifelse I get following:
> rep(seq(0, last_step-1, 1), 13)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[30] 29 30 31 32 . . . (I truncated the output due to space limitation)
What am I doing wrong?

From help("ifelse"):
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
This means that if the shape of test is a vector with just one element, the output will be a vector with just one element. That is the case with your test.
substring(all_scn[1], 1, 1) == "C"
#[1] TRUE
In cases like this, you don't need to vectorize because there is nothing to vectorize. All you need a simple if/else.
if(substring(all_scn[1], 1, 1) == "C")
rep(seq(0, last_step-1, 1), 13)
else
rep(seq(0, last_step-1, 1), 12)

You are using the ifelse in a wrong way. It doesnot work like you have guessed. You imagined ifelse(condition, result_if _true, result_if_false) just like in excel. But it works differently in R.
Take the following example from R documentation:
> x <- c(6:-4)
> x
[1] 6 5 4 3 2 1 0 -1 -2 -3 -4
> sqrt(ifelse(x >= 0, x, NA))
[1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NaN NaN NaN NaN
Warning message:
In sqrt(x) : NaNs produced
You can see NaNs (Not a Number) produced, because sqrt() of a negativve number is an imaginary number, hence NaN. It produced an error. Now lets see the same for ifelse.
> sqrt(ifelse(x >= 0, x, NA))
[1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NA NA NA NA
See, no errors.
The solution for you to is to use simple if...else condition.
if (condition){
statement if true } else {
statement if false }

Related

From a sequence of numbers, how do I find an immediate smaller (and an immediate bigger) number than a particular random number, In R?

So I have 10 increasing sequence of numbers, each of them look like (say x(i) <- c(2, 3, 5, 6, 8, 10, 11, 17) for i ranging from 1 to 10 ) and I have a random sampling number say p=9.
Now for each sequence x(i), I need to find the number immediately smaller than p and immediately bigger than p, and then for each i (from 1 to 10) , I need to take the difference of these two numbers and store them in a string.
For the x(i) that I have given here, the immediate smaller number than p=9 would be 8 and the immediate bigger number than p=9 would be 10, the difference of these would be (10-8)=2.
I am trying to get a code that would create a string of these differences, where first number of the string would mean the difference for i=1, second number would mean the difference for i=2 and so on. The string would have i numbers.
I am relatively new to R, so anywhere connected to loops throws me off a little bit. Any help would be appreciated. Thanks.
EDIT: I am putting the code I am working with for clarification.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #SpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
Spike_times <- function(i) {
c(dt*which( SpikeMat[i, ]==1))}
set.seed(4)
RT <- runif(1, 0 , 2)
for (i in 1:nTrials){
The explanation for this code, is mentioned in my previous question. I have 20 (number of trials aka nTrials) strings with name Spike_times(i) here. Each Spike_times(i) is a string of time stamps between o and 2 seconds where spikes occurred and they have different number of entries. Now I have a random time sample in the form of RT, which is a random number between 0 and 2 seconds. Say RT is 1.17 seconds and Spike_times(i) are the sequence of increasing times stamps between 0 and 2 seconds.
Let me give you an example, Spike_times(3) looks like 0.003 0.015 0.017 ... 1.169 1.176 1.189 ... 1.985 1.990 1.997 then I need a code that picks out 1.169 and 1.176 and gives me the difference of these entries 0.007 and stores it in another string say W as the third entry c(_, _, 0.007, ...) and does this for all 20 strings Spike_times(i) and gives me W with 20 entries.
I hope my question is clear enough. Please let me know if I need to correct something.
This approach should do what you want. I am making a function that extracts the desired result from a single sequence and then applying it to each sequence. I am assuming here that your sequences are row-vectors and are stacked in a matrix. If your actual data structure is different the code can be adapted, but you need to indicate how your sequences are actually stored.
x <- matrix(rep(c(2,3,5,6,8,10,11,17), 10), nrow=10, byrow = T)
x
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 2 3 5 6 8 10 11 17
#> [2,] 2 3 5 6 8 10 11 17
#> [3,] 2 3 5 6 8 10 11 17
#> [4,] 2 3 5 6 8 10 11 17
#> [5,] 2 3 5 6 8 10 11 17
#> [6,] 2 3 5 6 8 10 11 17
#> [7,] 2 3 5 6 8 10 11 17
#> [8,] 2 3 5 6 8 10 11 17
#> [9,] 2 3 5 6 8 10 11 17
#> [10,] 2 3 5 6 8 10 11 17
set.seed(123)
p = sample(10, 1)
# write a function to do what you want on one sequence:
# NOTE: If p appears in the sequence I assume you want the
# closest numbers not equal to p! If you want the closest
# numbers to p including p itself change the less than/
# greater than to <= / >=
get_l_r_diff <- function(row, p) {
temp <- row - p
lower <- max(row[temp < 0])
upper <- min(row[temp > 0])
upper - lower
}
apply(x, 1, function(row)get_l_r_diff(row, p))
#> [1] 3 3 3 3 3 3 3 3 3 3
apply(x, 1, function(row) get_l_r_diff(row, 9))
#> [1] 2 2 2 2 2 2 2 2 2 2
# if the result really needs to be a string
paste(apply(x, 1, function(row) get_l_r_diff(row, 9)), collapse = "")
#> [1] "2222222222"
For your case you can just apply the two functions to your indices:
spikes <- sapply(1:20, function(i){get_l_r_diff(Spike_times(i), RT)})
By making a small change to your Spike_times function you can do this with sapply returning a vector of all calculated values
Spike_times <- function(i) {
x <- c(dt*which( SpikeMat[i, ]==1))
min(x[x > RT]) - max(x[x < RT])
}
set.seed(4)
RT <- runif(1, 0 , 2)
results <- sapply(1:20, Spike_times)

How to sort odd and even numbers of an array in a specific format

I have a vector like this
seq_vector <- c(3,12,5,9,11,8,4,6,7,11,15,3,9,10,12,2)
I want to format them in descending order of odd numbers, followed by ascending order of even numbers. Output of above seq_vector will be
new_seq_vector <- c(15,11,11,9,9,7,5,3,3,2,4,6,8,10,12,12)
Can you please help me with the logic of the same?
Try x[order(x*v)] where v is -1 for odd, +1 for even.
Thanks to #lmo for this:
x[order( x*(-1)^x )]
# [1] 15 11 11 9 9 7 5 3 3 2 4 6 8 10 12 12
So v = (-1)^x here.
Some other ways to build v: #d.b's (-1)^(x %% 2); and mine, 1-2*(x %% 2).
(Thanks #d.b) If x contains negative integers, an additional sorting vector is needed:
# new example
x = c(2, 5, -15, -10, 1, -3, 12)
x[order(v <- (-1)^x, x*v)]
# [1] 5 1 -3 -15 -10 2 12
Take modulus by 2 (%% 2) to determine the odd and even elements and sort accordingly.
c(sort(seq_vector[seq_vector %% 2 == 1], decreasing = TRUE), #For odd
sort(seq_vector[seq_vector %% 2 == 0])) #For even
#[1] 15 11 11 9 9 7 5 3 3 2 4 6 8 10 12 12
Use an auxiliary function.
is.odd <- function(x) (x %% 2) == 1
result <- c(sort(seq_vector[is.odd(seq_vector)], decreasing = TRUE),
sort(seq_vector[!is.odd(seq_vector)]))
result

finding the length and positions of sub-series within a series of numbers

I have a vector made of 0 and non-zero numbers. I would like to know the length and starting-position of each of the non-zero number series:
a = c(0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 2.6301334 1.8372030 0.0000000 0.0000000 0.0000000 1.5632647 1.1433757 0.0000000 1.5412216 0.8762267 0.0000000 1.3087967 0.0000000 0.0000000 0.0000000)
based on a previous post it is easy to find the starting positions of the non-zero regions:
Finding the index of first changes in the elements of a vector in R
c(1,1+which(diff(a)!=0))
However I cannot seem to configure a way of finding the length of these regions....
I have tried the following:
dif=diff(which(a==0))
dif_corrected=dif-1 # to correct for the added lengths
row=rbind(postion=seq(length(a)), length=c(1, dif_corrected))
position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
length 1 0 0 0 0 2 0 0 2 2 1 0 0 1 0
NOTE: not all columns are displayed ( there are actually 20)
Then I subset this to take away 0 values:
> row[,-which(row[2,]==0)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
position 1 6 9 10 11 14 19
length 1 2 2 2 1 1 2
This seems like a decent way of coming up with the positions and lengths of each non-zero series in the series, but it is incorrect:
The position 9 (identified as the start of a non-zero series) is a 0 and instead 10 and 11 are non-zero so I would expect the position 10 and a length of 2 to appear here....
The only result that is correct is position 6 which is the start of the first non-zero series- it is correctly identified as having a length of 2- all other positions are incorrect.
Can anyone tell me how to index correctly to identify the starting-position of each of the non-zero series and the corresponding lengths?
NOTE I only did this in R because of the usefulness of the which command but it would also be good to know how to do this numpy and create a dictionary of positions and length values
It seems like rle could be useful here.
# a slightly simpler vector
a <- c(0, 0, 1, 2, 0, 2, 1, 2, 0, 0, 0, 1)
# runs of zero and non-zero elements
r <- rle(a != 0)
# lengths of non-zero elements
r$lengths[r$values]
# [1] 2 3 1
# start of non-zero runs
cumsum(r$lengths)[r$values] - r$lengths[r$values] + 1
# [1] 3 6 12
This also works on vectors with only 0 or non-0, and does not depend on whether or not the vector starts/ends with 0 or non-0. E.g.:
a <- c(1, 1)
a <- c(0, 0)
a <- c(1, 1, 0, 1, 1)
a <- c(0, 0, 1, 1, 0, 0)
A possibly data.table alternative, using rleid to create groups, and .I to get start index and calculate length.
library(data.table)
d <- data.table(a)
d[ , .(start = min(.I), len = max(.I) - min(.I) + 1, nonzero = (a != 0)[1]),
by = .(run = rleid(a != 0))]
# run start len nonzero
# 1: 1 1 2 FALSE
# 2: 2 3 2 TRUE
# 3: 3 5 1 FALSE
# 4: 4 6 3 TRUE
# 5: 5 9 3 FALSE
# 6: 6 12 1 TRUE
If desired, the runs can then easily be sliced by the 'nonzero' column.
For numpy this is a parallel method to #Maple (with a fix for arrays ending with a nonzero):
def subSeries(a):
d = np.logical_not(np.isclose(a, np.zeros_like(a))).astype(int)
starts = np.where(np.diff(np.r_[0, d, 0]) == 1))
ends = np.where(np.diff(np.r_[0, d, 0]) == -1))
return np.c_[starts - 1, ends - starts]
Definition:
sublistLen = function(list) {
z_list <- c(0, list, 0)
ids_start <- which(diff(z_list != 0) == 1)
ids_end <- which(diff(z_list != 0) == - 1)
lengths <- ids_end - ids_start
return(
list(
'ids_start' = ids_start,
'ids_end' = ids_end - 1,
'lengths' = lengths)
)
}
Example:
> a <- c(-2,0,0,12,5,0,124,0,0,0,0,4,48,24,12,2,0,9,1)
> sublistLen(a)
$ids_start
[1] 1 4 7 12 18
$ids_end
[1] 1 5 7 16 19
$lengths
[1] 1 2 1 5 2

How do you write a matrix using a "for" loop in R?

So for data evaluation that I am doing at the moment I want to write a matrix using a "for" loop.
Let's say I have random numbers between 0 and 100:
E <- runif(100, 0, 100)
t <- 0 #start
for(t in 0:90) {
D <- length(E[E >= t, E < (t + 10)])
t = t + 10
}
So what I want to do is write "D" into a matrix at each iteration with "t" in one column and "D" in the other.
I've heard that you should avoid loops in R, but I don't know an alternative.
Rather than using a loop, you can do this with sapply, which operates on each item in a sequence and stores the result in a vector, and then cbind to create the matrix:
E <- runif(100, 0, 100)
t <- seq(0, 90, 10)
D <- sapply(t, function(ti) {
sum(E >= ti & E < (ti + 10))
})
cbind(t, D)
#> t D
#> [1,] 0 11
#> [2,] 10 12
#> [3,] 20 14
#> [4,] 30 11
#> [5,] 40 9
#> [6,] 50 12
#> [7,] 60 7
#> [8,] 70 7
#> [9,] 80 6
#> [10,] 90 11
Note that I also used sum(E >= ti & E < (ti + 10)) rather than length(length(E[E >= ti & E < (ti + 10)])), as a slightly shorter way of finding the number of items in E that were greater than t but less than t + 10.
It seems that you want to bin your variable into categories - this is exactly what cut does:
E <- runif(100, 0, 100)
table(cut(E, breaks = seq(0,100,10), right=FALSE))
#> [0,10) [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) [70,80) [80,90)
#> 10 10 7 10 8 10 12 11 10
#>[90,100)
#> 12
If you don't want to see categories labels, remove table call; if you want it in "tabular" format, wrap it in as.matrix.
Please note that if you are doing it for plotting purposes, then both hist and ggplot will do it automatically for you:
hist(E, breaks = seq(0,100,10))
library("ggplot2")
ggplot(data.frame(var=E), aes(x=var)) + geom_histogram(binwidth = 10)

cbind() time series without NAs

I've observed that for many operators on overlapping time series, the result is given only for the overlapping portion, which is nice:
> (ts1 <- ts(1:5, start=1, freq=3))
Time Series:
Start = c(1, 1)
End = c(2, 2)
Frequency = 3
[1] 1 2 3 4 5
> (ts2 <- ts((7:3)^2, start=2, freq=3))
Time Series:
Start = c(2, 1)
End = c(3, 2)
Frequency = 3
[1] 49 36 25 16 9
> ts1 + ts2
Time Series:
Start = c(2, 1)
End = c(2, 2)
Frequency = 3
[1] 53 41
However, this doesn't seem to be the case with cbind(). While the output is aligned properly, NAs are created for the non-overlapping data:
> (mts <- cbind(ts1, ts2))
Time Series:
Start = c(1, 1)
End = c(3, 2)
Frequency = 3
ts1 ts2
1.000000 1 NA
1.333333 2 NA
1.666667 3 NA
2.000000 4 49
2.333333 5 36
2.666667 NA 25
3.000000 NA 16
3.333333 NA 9
Is there a way to perform that cbind() without creating the rows with NA in them? Or if not, what's a good way to take the result and strip off the rows with the NAs? It's not a simple matter of subscripting, because then it loses its timeseries nature:
> mts[complete.cases(mts),]
ts1 ts2
[1,] 4 49
[2,] 5 36
Maybe something with window(), but calculating the start & end times for the window seems a little yucky. Any advice is welcome.
Why not just na.omit the result?
> na.omit(cbind(ts1,ts2))
Time Series:
Start = c(2, 1)
End = c(2, 2)
Frequency = 3
ts1 ts2
2.000000 4 49
2.333333 5 36
If you want to avoid na.omit, stats:::cbind.ts calls stats:::.cbind.ts, which has a union argument. You could set that to FALSE and call stats:::.cbind.ts directly (after creating appropriate arguments):
> stats:::.cbind.ts(list(ts1,ts2),list('ts1','ts2'),union=FALSE)
Time Series:
Start = c(2, 1)
End = c(2, 2)
Frequency = 3
ts1 ts2
2.000000 4 49
2.333333 5 36
But the na.omit solution seems a tad easier. ;-)

Resources