Grouping values of a vector - vector

Lets say - I have the two vectors:
a numbers=[14 14 2 25 25 14 14 14 2 23 23 23]:
b frequency_of_the_numbers_above=[2 1 2 3 1 3];
c new_numbers=[14 14 14 14 14 2 2 2 2 25 25 25 25 25 25 25 14 14 14 14 14 2 2 2 2 23 23 23 23 ];
b describes how often a value in vector a appears.For example:number 14 two times, that's why first number in b is 2 ,number 2 in vector a one time that's why second number in vector b is 1 etc.
what I now want is to adapt the vector b to vector c so that the result should look like:
new_numbers[14 14 14 14 14 2 2 2 2 25 25 25 25 25 25 25 14 14 14 14 14 2 2 2 2 23 23 23 23 ];
frequency_numbers_for_new_numbers=[2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 1 1 1 1 3 3 3 3];

your question doesn't seem to fit what's in the vectors you posted. What b seems to be is how many of a given number appears in a in a ROW. As there are 5 14s in a. And your result seems to be whatever number was in b repeated for the number of that element in c is that what you want? Does a and c always contain the same numbers in the same order?
Well I'll write the code to do what I described (in java)
int index = 0;
int currentVal = c[0];
int[] result = new int[c.length];
for(int i = 0; i < result.length; i++) {
if(c[i] != currentVal) {
index++; //if the number changed go to the next index for b
currentVal = c[i]; //update the currentVal to the new value
}
result[i] = b[index]; //copy the value from b into the result vector
}

Related

R:How to apply a sliding conditional branch to consecutive values in the sequential data

I want to use conditional statement to consecutive values in the sliding manner.
For example, I have dataset like this;
data <- data.frame(ID = rep.int(c("A","B"), times = c(24, 12)),
+ time = c(1:24,1:12),
+ visit = as.integer(runif(36, min = 0, max = 20)))
and I got table below;
> data
ID time visit
1 A 1 7
2 A 2 0
3 A 3 6
4 A 4 6
5 A 5 3
6 A 6 8
7 A 7 4
8 A 8 10
9 A 9 18
10 A 10 6
11 A 11 1
12 A 12 13
13 A 13 7
14 A 14 1
15 A 15 6
16 A 16 1
17 A 17 11
18 A 18 8
19 A 19 16
20 A 20 14
21 A 21 15
22 A 22 19
23 A 23 5
24 A 24 13
25 B 1 6
26 B 2 6
27 B 3 16
28 B 4 4
29 B 5 19
30 B 6 5
31 B 7 17
32 B 8 6
33 B 9 10
34 B 10 1
35 B 11 13
36 B 12 15
I want to flag each ID by continuous values of "visit".
If the number of "visit" continued less than 10 for 6 times consecutively, I'd attach "empty", and "busy" otherwise.
In the data above, "A" is continuously below 10 from rows 1 to 6, then "empty". On the other hand, "B" doesn't have 6 consecutive one digit, then "busy".
I want to apply the condition to next segment of 6 values if the condition weren't fulfilled in the previous segment.
I'd like achieve this using R. Any advice will be appreciated.

How can I create a new column with the same id every n rows in R?

I have a data frame where I want to create a new column in which to assign the same ID every 30 rows.
My data frame is from an experiment and I wish to create a new "bloc" column, so that every 30 rows it increments by 1
example:
col1 : response latency = 1,0002, 1.2566, ...30times, 1.5422, ...
col2 : difficulty = easy, hard, intermediate, ...
col3 : ID = 1, 2, 3, ...30times, 31, 32, ...
And I want a new column
new col : bloc = 1, 1, ...30times, 2, 2, ...30times, 3, 3, ...
Using 5 as an example, but this of course works the same for 30
df <- data.frame(rownum = 1:23)
bloc_len <- 5
df$bloc <-
rep(seq(1, 1 + nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
df
# rownum bloc
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1
# 6 6 2
# 7 7 2
# 8 8 2
# 9 9 2
# 10 10 2
# 11 11 3
# 12 12 3
# 13 13 3
# 14 14 3
# 15 15 3
# 16 16 4
# 17 17 4
# 18 18 4
# 19 19 4
# 20 20 4
# 21 21 5
# 22 22 5
# 23 23 5
You could also use %/% (same output)
df$bloc <-
1 + seq(0, nrow(df) - 1) %/% bloc_len
You can use rep(x, times) function to create the bloc you wished.
See the example above
set.seed(12345)
Create a random data set
data <- data.frame(
response_latency = abs(rnorm(90, 2, 1)),
difficulty = sample(c("easy", "hard", "intermediate"), 90, replace = TRUE),
ID = 1:90
)
head(data, n = 35)
response_latency difficulty ID bloc
1 1.8890497 intermediate 1 1
2 2.9996586 intermediate 2 1
3 3.0255886 hard 3 1
4 0.3949156 hard 4 1
5 2.0027199 easy 5 1
6 2.9580737 hard 6 1
7 1.3337903 intermediate 7 1
8 1.4844084 hard 8 1
9 1.3941750 hard 9 1
10 1.6923244 intermediate 10 1
11 1.8186642 easy 11 1
12 0.9167691 easy 12 1
13 2.5987185 easy 13 1
14 1.8345693 intermediate 14 1
15 0.9177725 hard 15 1
16 2.3445309 easy 16 1
17 2.5187724 hard 17 1
18 1.2220053 hard 18 1
19 2.1636086 hard 19 1
20 0.7847963 hard 20 1
21 1.3785363 hard 21 1
22 2.9451529 intermediate 22 1
23 2.3722482 intermediate 23 1
24 2.1812877 intermediate 24 1
25 0.1383615 easy 25 1
26 1.3996498 easy 26 1
27 3.7593749 hard 27 1
28 2.0056114 hard 28 1
29 3.2195714 hard 29 1
30 2.1481248 easy 30 1
31 3.2546741 intermediate 31 2
32 2.4221608 hard 32 2
33 2.0465687 intermediate 33 2
34 1.7649423 easy 34 2
35 1.7338255 hard 35 2
Here, to add the bloc column in your dataset, you can use the following code:
bloc <- c(rep(x = 1, times = 30), rep(x = 2, times = 30), rep(x = 3, times = 30))
data$bloc <- bloc
head(data,n=35)
The new dataset will be as follow.
response_latency difficulty ID bloc
1 1.8890497 intermediate 1 1
2 2.9996586 intermediate 2 1
3 3.0255886 hard 3 1
4 0.3949156 hard 4 1
5 2.0027199 easy 5 1
6 2.9580737 hard 6 1
7 1.3337903 intermediate 7 1
8 1.4844084 hard 8 1
9 1.3941750 hard 9 1
10 1.6923244 intermediate 10 1
11 1.8186642 easy 11 1
12 0.9167691 easy 12 1
13 2.5987185 easy 13 1
14 1.8345693 intermediate 14 1
15 0.9177725 hard 15 1
16 2.3445309 easy 16 1
17 2.5187724 hard 17 1
18 1.2220053 hard 18 1
19 2.1636086 hard 19 1
20 0.7847963 hard 20 1
21 1.3785363 hard 21 1
22 2.9451529 intermediate 22 1
23 2.3722482 intermediate 23 1
24 2.1812877 intermediate 24 1
25 0.1383615 easy 25 1
26 1.3996498 easy 26 1
27 3.7593749 hard 27 1
28 2.0056114 hard 28 1
29 3.2195714 hard 29 1
30 2.1481248 easy 30 1
31 3.2546741 intermediate 31 2
32 2.4221608 hard 32 2
33 2.0465687 intermediate 33 2
34 1.7649423 easy 34 2
35 1.7338255 hard 35 2

R: producing vector with number of times an element appears in sample

I have a homework problem where I have a sample of 30 men, a random sampling of 10 of them:
men
[1] 15 18 14 6 22 17 20 3 16 9
And From them, do 12 random samples and determine how many times each man appears.
The problem statement, verbatim, is "Perform 12 samples of 10 men from a population of size 30 and for each man, record the number
samples in which he appears."
I have attempted a loop for the problem that would produce a vector of 10 elements, each one lined up with the appropriate index.
mtimes<-rep(0,12)
> repeat{
+ mtimes[menind]<-sum(sample(pop1,12,replace = TRUE) == men[menind])
+ menind = menind + 1
+ if (menind == 10){
+ break
+ }
+ }
This resulted in a vector:
mtimes
[1] 0 0 1 0 0 0 0 0 0 0
It seems the 3rd man should not have appeared only once while no one else appeared in the samples.
You can use replicate and table here
set.seed(1)
table(replicate(n = 12, expr = sample(30, size = 10, replace = TRUE)))
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
# 3 2 3 5 2 2 5 5 3 3 6 7 4 5 8 2 1 3 2 9 3 7 2 8 3 3 5 3 3 3
I assume that by "men" you mean 1:30.
Another option would be to increase the size of the sample to 10*12 as in
set.seed(1)
table(sample(30, size = 10*12, replace = TRUE))

Creating Groups by Matching Values of Different Columns

I would like to create groups from a base by matching values.
I have the following data table:
now<-c(1,2,3,4,24,25,26,5,6,21,22,23)
before<-c(0,1,2,3,23,24,25,4,5,0,21,22)
after<-c(2,3,4,5,25,26,0,6,0,22,23,24)
df<-as.data.frame(cbind(now,before,after))
which reproduces the following data:
now before after
1 1 0 2
2 2 1 3
3 3 2 4
4 4 3 5
5 24 23 25
6 25 24 26
7 26 25 0
8 5 4 6
9 6 5 0
10 21 0 22
11 22 21 23
12 23 22 24
I would like to get:
now before after group
1 1 0 2 A
2 2 1 3 A
3 3 2 4 A
4 4 3 5 A
5 5 4 6 A
6 6 5 0 A
7 21 0 22 B
8 22 21 23 B
9 23 22 24 B
10 24 23 25 B
11 25 24 26 B
12 26 25 0 B
I would like to reach the answer to this without using a "for" loop becouse the real data is too large.
Any you could provide will be appreciated.
Here is one way. It is hard to avoid a for-loop as this is quite a tricky algorithm. The objection to them is often on the grounds of elegance rather than speed, but sometimes they are entirely appropriate.
df$group <- seq_len(nrow(df)) #assign each row to its own group
stop <- FALSE #indicates convergence
while(!stop){
pre <- df$group #group column at start of loop
for(i in seq_len(nrow(df))){
matched <- which(df$before==df$now[i] | df$after==df$now[i]) #check matches in before and after columns
group <- min(df$group[i], df$group[matched]) #identify smallest group no of matching rows
df$group[i] <- group #set to smallest group
df$group[matched] <- group #set to smallest group
}
if(identical(df$group, pre)) stop <- TRUE #stop if no change
}
df$group <- LETTERS[match(df$group, sort(unique(df$group)))] #convert groups to letters
#(just use match(...) to keep them as integers - e.g. if you have more than 26 groups)
df <- df[order(df$group, df$now),] #reorder as required
df
now before after group
1 1 0 2 A
2 2 1 3 A
3 3 2 4 A
4 4 3 5 A
8 5 4 6 A
9 6 5 0 A
10 21 0 22 B
11 22 21 23 B
12 23 22 24 B
5 24 23 25 B
6 25 24 26 B
7 26 25 0 B

Smart way to convert polars to Cartesian coordinates with numpy

I have an array of Cartesian coordinates produced from polars in a usual way:
for k in range(0, Phi_term):
for j in range(0, R_term):
X[k,j] = R[j]*np.cos(phi[k]);
Y[k,j] = R[j]*np.sin(phi[k]);
The problem is that the zeroth element of such an array corresponds to the origin of the polar circle. I would like to have an array of the same elements but starting in the top right corner. For example, elements in the current array distribute in the following way (for the upper half):
11 10 9 6 7 8
14 13 12 3 4 5
17 16 15 0 1 2
(imagine it's a circle). What I want to get is the grid starting with the zeroth element:
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
though preserving the values, i.e. the value of the 11th element of the initial array is now the value of the 0th element of the new array.
Is there any smart way to perform such a transformation in numpy?
def quasiCartesianOrder(arr, R_term, Phi_term):
# deal with odd phi count by starting from top of top spike.
rhsOddOffset = 0
if Phi_term % 2 == 1:
rhsOddOffset = R_term
for r in xrange(0, R_term):
yield (Phi_term + 1)/2 * R_term - r - 1
# 'rectangular' section, starting down 11 o'clock and up 1 o'clock.
phiBaseLeft = Phi_term / 2 + rhsOddOffset/R_term
phiBaseRight = Phi_term / 2
for phiLine in xrange(0, Phi_term / 2):
# down 11
base = (phiBaseLeft + phiLine) * R_term - 1
for idx in xrange(base + R_term, base, -1):
yield idx
# up 1
base = (phiBaseRight - phiLine ) * R_term
for idx in xrange(base - R_term, base):
yield idx
Behaviour:
11
10
9
14 13 12 6 7 8
17 16 15 3 4 5
20 19 18 0 1 2
Becomes
0
1
2
3 4 5 6 7 8
9 10 11 12 13 14
15 16 17 18 19 20
Result
11 10 9 14 13 12 6 7 8 17 16 15 3 4 5 20 19 18 0 1 2
The style is a generator, so that you can iterate. If you just want the indices themselves, call list with the returned generator, and you should be able to use that with numpy's index arrays stuff.

Resources