unexpected result of eval parse - r

I have a set of identically dimensioned tables for 322 areas. I need to sum these tables to 29 higher level areas and where each higher level area has varying numbers of the lower level area.
I am proposing to compute the sum in a loop, so the first task is to determine the number of lower level areas to be summed (each of which have character identifiers).
So for example the first list of lower level areas is a list of four:
lad_list_black
[1] "00CW" "00CU" "00CS" "00CR"
The higher area lists are differentiated by the last term -- in this case "black". The next is "bucks" (ie lad_list_bucks), etc.
I was proposing to use a loop which counted the number of lower level areas in its first step -- something like
nam <- c("black","bucks")
lad_list_black <- c("00CW","00CU","00CS","00CR")
for(i in 1:1){
eval(parse(text=length(paste("lad_list_",nam[i],sep=""))))}
but when I tested it outside the loop, the result was:
eval(parse(text=length(paste("lad_list_",nam[1],sep=""))))
[1] 1
which is not correct, since:
length(lad_list_black)
[1] 4

Related

R: Rank cells in a list of matrices based on cell position

I have a list of matrices containing association measurements between GPS tracked animals. One matrix in the list is observed association rates, the others are association rates for randomized versions of the GPS tracking trajectories. For example, I currently have 99 permutations of randomized tracking trajectories resulting in a list of 99 animal association matrices, plus the observed association matrix. I am expecting that for the animals that belong to the same pack, the observed association rates will be higher than the randomized association rates. Accordingly, I would like to determine the rank of the observed rates compared to the randomized rates for each dyad (cell). Essentially, I am doing a rank-permutation test. However, since I am only really concerned with determining if the observed association data is greater than the randomized trajectory association data, any result just giving the rank of the observed cells is sufficient.
ls <- list(matrix(10:18,3,3), matrix(18:10,3,3))
I've seen using sapply can get the ranks of particular cells. Could I do the following for all cells and take the final number in the resulting vector to get the rank of the cell in that position in the list (knowing the position of the observed data in the list of matrices, e.g. last).
rank(sapply(ls, '[',1,1))
The ideal result would be a matrix of the same form as those in the list giving the rank of the observed data, although any similar solutions are welcome. Thanks in advance.
You can proceed that way, but there are cleaner and quicker methods to get what you want.
Here's some code that would take your ls produce a 3x3 matrix with the following properties:
if the entry in ls[[1]] is greater than the corresponding entry of ls[[2]], record a 1
if the entry in ls[[1]] is less than the corresponding entry of ls[[2]], record a 2
if the entries are equal, record a 1.5
result <- 1 * (ls[[1]] > ls[[2]]) + 2 * (ls[[1]] < ls[[2]]) + 1.5 * (ls[[1]] == ls[[2]])
How it works: when we do something like ls[[1]] > ls[[2]], we are ripping out the matrices of interest and directly comparing them. The result of this bit of code is a T/F-populated matrix, which is secretly coded as a 0/1 matrix. We can then multiply it by whatever coefficient we want to represent that situation.

Is there a way in R to draw from a list without replacement with the drawn value is not available for subsequent draws?

I would like to populate a matrix with values from user defined lists (this part is not important yet). The selection from these lists should be random without replacement until the list is exhausted.
For example, if we have list_1 <- c(1,2,3) and list_2 <- c('a', 'b', 'c')
The (1,1) element of the matrix is drawn from the list_1 with the 2 assigned. The (1,2) element draws from list_1 and is assigned 3 (2 is not available as it has already been assigned). The (1,3) element draws from list_2 and is assigned 'b'. The element (2,1) also draws from list_1 and is assigned 2 as this is the only remaining element in list_1. This would continue until all elements within the matrix have been assigned a value. There are an equal number of elements in the matrix as there are in the lists (in total).
Given the structure of the matrix, I am not able to simply use the C() function to combine a number of vectors and randomise within each vector.
As I am a R novice, please forgive me if the above explanation is not clear.
Thanks in advance for any help.
EDIT: The rationale for the specific position within the matrix is that the matrix represents a experimental design. Each position in the matrix might represent a small plot of soil. Each list might represent a different type of seed and the elements within each list variations on the seed type. List_1 might all flower seeds (Rose, Daisy, etc.) and List_2 might contain all herb seeds (Parsley, Origano, etc.). I want to constrain the type of seed that can be planted in a particular plot. The seed that is actually allocated to that plot, is then randomly selected from the seed list. Some plots are allowed to contain a randomly allocated flower seed, but not a herb seed.
As I don't want to end up with all roses, for example, once a rose seed has been allocated to a plot, that seed option should be removed from the list.
For example (1,1) might be a 'flower' plot, a seed will be randomly selected from the flower list, after this process, and a daisy seed is assigned to (1,1). We then move on to the next 'flower' plot, but now the daisy seed is not available for assignment as it is not in the list (as it has already been assigned to (1,1)).
I hope this makes more sense.

How to obtain the maximum sum of the array with the following condition?

Suppose the problem posed is as follows:
On Mars there lives a colony of worms. Each worm is represented as elements in an 1D array. Worms decide to eat each other but any worm can eat only its nearest neighbour. Each worm has a preset amount of energy(i.e the value of the element). On Mars, the laws dictate that when a worm i with energy x eats another worm with energy y, the i-th worm’s final energy becomes x-y. A worm is allowed to have negative energy levels.
Find the maximum value of energy of the last standing worm.
Sample data:
0,-1,-1,-1,-1 has answer 4.
2,1,2,1 has answer 4.
What will be the suitable logic to address this problem?
This problem has a surprisingly simple O(N) solution.
If any two members in the array have different signs, the answer is then sum of absolute values of all elements.
To see why, imagine a single positive value in the array, all other elements are negative (Example 1). Now the best strategy would be keeping this value positive and gradually eating all neighbors away to increase this positive value. The position of the positive value doesn't matter. The strategy is same in case of a single negative element.
In more general case, if an array of size N have values of different signs, we can always find an array of size N-1 with different signs, because there must be a pair of neighbors with different sign, which we can combine to form a number of any sign we prefer.
For example with this array : [1,2,-5,4,-10]
we can combine either (2,-5) or (4,-10). Lets combine (4,-10) to get [1,2,-5,-14]
We can only take (2,-5) now. So our array now is : [1,-7,-14]
Again only (1,-7) possible. But this time we have to keep combined value positive. So we are left with: [8,-14]
Final combining gives us 22, sum of all absolute values.
In case of all values with same sign, our first move would be to produce an opposite sign combining a neighbor pair with as little "cost" as possible. Intuitively, we don't want to waste two big numbers on this conversion. If we take x,y neighbor pair, when combined the new value (of opposite sign) will be abs(x-y). Since result is simply sum of absolute values, we can interpret it as - "loosing" abs(x) and abs(y) from maximum possible output and "gaining" abs(x-y) instead. So the "cost" for using this pair for sign conversion is abs(x)+abs(y)-abs(x-y). Since we need to minimise this cost, we choose from initial array neighbor pair that have lowest such value.
So if we take the above array but now all values are positive [1,2,5,4,10]:
"cost" of converting (1,2) to -1 is 1+2-abs(-1)=2.
"cost" of converting (2,5) to -3 is 2+5-abs(-3)=4.
"cost" of converting (5,4) to -1 is 5+4-abs(-1)=8.
"cost" of converting (4,10) to -6 is 4+10-abs(-6)=8.
So, we take and convert pair (1,2) to -1. Then just sum absolute values of resultant array to get 20. Notice that this value is exactly 2 less than our previous example.

How to perform the same operation(s) on all elements in a list

I have a large list with 317 elements. Each element contains a varying number of cases. These elements all have the exact same categories, but they have different numbers for all of them.
Each element has five categories:
Location
Species 1 count
Species 2 count
Species 3 count
Total species count
I originally had a dataframe that had all of the records in one, but I split it based on location as I am trying to find the proportion of the three species for each site (hence the 317 elements. There were 317 different locations so it split them into that)
I just want to perform the same operation on every element, receiving a number for each of them. I don't know how to calculate the proportion, but I do not need help with that. I just want to perform the same function on every single element in the list I have.
This is the code so far that I want to execute for every single element. I need to add the proportions code, but I will do that when I find out how to work it out.
##df = name of the large list
df$location <- df$location[!( ((df$species1) + (df$species2) + (df$species3)) != (df$totalSpecies) ),]
##remove any records where the three species do not equal the total
Thank you in advance!

COUNTIF where criterion is a specific sequence of cells

I'm doing some work with arithmetic sequences modulo P, in which the sequences become periodic under the modulo. My worksheet generates a sequence mod P with the first term being 0, the second term being a number K (referencing another cell), and the following terms following the recurrence relation. The period of the sequence (number of values before it repeats itself) is related to the ratio P/K, s, for example, if P=2 and K=1, I get the sequence {0,1,1,0,1,1,0,1,1,...}, which has a period of 3, so when P/K=2, the period is 3.
I currently have a formula which uses the COUNTIF function to count the number of zeroes in the range, which is then divided out of the total range, currently an arbitrary size of 120, and this gives me the correct period for many ratios of P/K. Most of the time, however, the sequence generated exhibits semi-periodicity and sometimes even quasi-periodicity, such as in the case of K=1 and modulo 9: {0,1,1,2,3,5,8,4,3,7,1,8,0,8,8,7,6,4,1,5,6,2,8,1,...}, where P/K=9, the period is 24, and the semi-period is 12 (because of the 0,8,8,... part of the sequence). In such cases, my current COUNTIF formula thinks the full period is 12, even though it should be 24, because it counts the zeroes which define the semi-period.
What I would like to do is adjust the formula so that instead of the criterion for counting being 0, it would only count triplet sequences of cells in the pattern 0,K,K.
My current formula:
=QUOTIENT(120,(COUNTIF(B2:DQ2,0)))
So if I have =QUOTIENT(120,(COUNTIF(B2:DQ2,*X*))) I want the "X", which is currently 0, to reference a specific sequence of cells, namely the first three of the overall series, so something like: =QUOTIENT(120,(COUNTIF(B2:DQ2,(0,C2,D2)))) although obviously that criterion is not in remotely the correct syntax.
I'm not well-versed in writing macros, so that would probably be out of the question.
I would do this with four helper rows plus the final formula. Someone more clever than I am might be able to do it in one cell with an array formula; but compared to array formulas I think the helper rows are easier to understand and, if desired, tweak.
Once this is set up, if you're always going to use three as your criterion, you can hide the helper rows (to hide a row, right-click on the gray number label on the left side of the spreadsheet, and choose "hide").
So your sequence is in row 2, starting in column B. We'll set up the first helper row in row 3, starting in column C. In cell C3 put the formula =C2=$B$2. This will evaluate to FALSE, which is equivalent to 0. Copy and paste that formula all the way to cell DQ3 (or however many columns you want to run it). Cells below a sequence number equal to the first number in the sequence will evaluate to TRUE, which is equivalent to 1.
The next two helper rows are very similar. In cell D4 put the formula =D2=$C$2 and copy and paste to cell DQ4. This row tests which cells are equal to the second number in the sequence.
In cell E5 put the formula =E2=$D$2 and copy and paste to cell DQ5, showing which cells are equal to the third number in the sequence.
The last helper row is a little different, so I left an empty row after the first three helpers. In cell E7 I put the formula =SUM(C3,D4,E5); copy and paste that over to column DQ. This counts how many matches were found in the previous three helper rows. If all three match, the result of this formula will be 3 and your criterion for determining the period will have been fulfilled.
Now to show the period: in the cell you want to have this number, put the formula =MATCH(3,E7:DQ7,0). This searches the last (fourth) helper row looking for a cell that is equal to 3. (Obviously you could modify this method to match only the first two sequence numbers, or to match more than 3, and then you'd adjust the first parameter in the MATCH formula.) The last parameter in this MATCH formula is 0 because the helper row is not sorted. The return value is the index of the first match: a match in E7 would be index 1, a match in E8 would be index 2, etc.
I tested this in LibreOffice 4.4.4.3.

Resources