Loop error (List Index out of Range) - python-3.6

def clean_data(self):
"""Limit samples to greater than the sequence length and fewer
than N frames. Also limit it to classes we want to use."""
data_clean = []
for item in self.data:
if int(item[3]) >= self.seq_length and int(item[3]) <= self.max_frames \
and item[1] in self.classes:
data_clean.append(item)
return data_clean
When I am trying to run this module it gives me the error of LIST INDEX OUT OF RANGE. Anyone, please help me to solve this issue. Thanks in advance.

len(item) > 3 must be true for each item in self.data or item[3] will fail with your reported error. Check the length of your items and either fix them or add code to skip the if if the length is insufficient.

Related

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Creating nested lists in a loop in R

This bit of code does what I want it to do, but generates a warning for every iteration of the loop:
library(epiR)
cccList <- list()
for (i in 3:ncol(dfData)){
tmpvar <- paste("cccIntactVs.", i, sep = "")
assign(
tmpvar,
epi.ccc(
dfData[2:nrow(dfData),2],
dfData[2:nrow(dfData),i],
ci = "z-transform",
conf.level = 0.95,
rep.measure = FALSE
)
)
cccList[i] <- get(paste0("cccIntactVs.", i))
}
I get this warning every time the output of epi.ccc() is added to cccList():
Warning in cccList[i] <- get(paste0("cccIntactVs.", i)) :
number of items to replace is not a multiple of replacement length
Is there a more proper way of accomplishing this? The output of epi.ccc() is a list of 7 elements. Since the output is the same length each time and I'm only adding to the list, why is it complaining about mismatched lengths or replacement?
You want to use [[i]] instead of [i]
Basically, [ means you want to replace a certain part of a list with different content, and the replacement needs to have just as many items as the number of slots you are trying to replace.
OTOH, using [[ means you want to put everything you are assigning into one slot, which it seems you want to do.
An example of what happens:
myList <- list(1,2,3,4,5,6,7)
myList[3:5] <- c(11, 12, 13)
myList[[6]] <- c(14, 15, 16)
Here, 11-13 are distributed among slots 3 through 5: 3 replacement items in 3 slots.
And 14-16 are placed in one slot: This slot now contains a length-3 vector.
Now what happens if we try this?
myList[1] <- c(17,18,19)
We tell R it should distribute 3 items over one slot. It tries the best it can, which is not much: it discards everything but the first item. But luckily it warns you. If you really just want to assign just the first item, you can use
myList[1] <- c(17,18,19)[1]
But that's not really useful, there's no use in supplying 18 and 19.
You could make it a list of length one:
myList[1] <- list(c(17,18,19))
But generally, if you want to put it in one slot, using [[ is the way to go.
And as a sidetrack: why was it build this way?
The reason is that [ can give you access to multiple slots, and you might not know which ones or how many beforehand. What should happen if I try this?
someVar <- readLines(somefile) # someVar happens to be c(1, 2) instead of having length 1
myList[someVar] <- 21:23
Put 21:23 in both slots? Put 21 in the first slot and the rest (22, 23) in slot 2?
Using [[ means you are sure only one slot is used, and you're not unexpectedly overwriting anything.
cccList[i] <- get(paste0("cccIntactVs.", i)) will trigger this warning if get(paste0("cccIntactVs.", i)) is not of length 1.
Using get(paste0("cccIntactVs.", i))[1] should solve it but if you didn't expect get(paste0("cccIntactVs.", i)) to have a length superior to 1 you likely have a mistake somewhere else in your code, despite the result looking fine to you now.

How to write a function for and for loop with embedded if else statement?

I have just started using R for a course I'm taking and it asked for integer values > 0 (argument1) and which will multiply values <25 and >75 by a set multiplier (argument 2) and the other elements by a different multiplier (argument 3).
I already have the previous h and s values:
h=sample(1:100,40)
s=c()
for(i in 1:100){if(h[i]<25){s[i]<-h[i]*10}
else if(h[i]>75){s[i]<-h[i]*10}
else{s[i]<-h[i]*0.1}}
Error in if (h[i] < 25) { : missing value where TRUE/FALSE needed
The error message shows up in the above for loop but if I ignore it I still get the answer. I want but would not work in the function.
fun2<-function(x=s,arg1,arg2,arg3)
{w<-for(i in 1:100){if(h[i]>0){s[i]<-h[i]*arg1}else if(h[i]<25){s[i]<-
h[i]*arg2}
else if(h[i]>75){s[i]<-h[i]*arg2}
else{s[i]<-h[i]*arg3}}
return(w)}
fun2(arg1=10,arg2=3,arg3=10)
Error in if (h[i] > 0) { : missing value where TRUE/FALSE needed
I am unsure where to put the true/false statement in the equation.
Take a look at length(h).
You'll see that you are trying to loop over 100 indexes while having only 40 elements. If you replace your first line by h = sample(1:100,100), your first code should work.
As for your second attempt, you cannot assign a for-loop to a variable. Store your variable w before looping and the assign the new values that you are calculating, like this.
fun2<-function(x=s,arg1,arg2,arg3){
w<-s
for(i in 1:100){
if(h[i]>0){
w[i]<-h[i]*arg1
}
... # some if statement
return(w)
}
As a side note, your function won't give you the results you are looking for because your first if condition discards the following else if. I would remove the first if block and replace the next one by if(h[i] < 25 & h[i]> 0).

Creating a counter in R using a loop

I am a beginner at R and searched the forums and did not find an answer to this question. I am trying to create a loop in R that counts whether a condition is met between 2 rows in a dataframe. I understand that this is not an efficient way to do this but it is for a class assignment. My problem is that my code is creating an endless loop rather giving me the counter output and it is unclear to me how to fix it. I would greatly appreciate any suggestion. The code is below:
counter=0
for (i in 1:nrow(dataframe))
{if (dataframe$column1[i]>dataframe$column2[i]==TRUE)
{
counter=counter+1}
}
print(counter)
If you just want to know how many times your column 1 is higher than column 2, you don't have to use a loop :
counter <- sum(dataframe$column1>dataframe$column2)
sum(dataframe$column1>dataframe$column2) gives you a vector of length nrow(dataframe) with TRUE and FALSE when the condition is verified, and R do this element by element with vectores.
Then when you sum it, TRUE is considered as a 1 and FALSE as a 0. So it gives you how many times the condition is verified beetween the two columns.

R: sum of unknown number of matrices

I am trying to write a loop that will summarize my set of matrices that all start with the same name plus a number (e.g. "day11"). However, in each run of the loop the number of matrices varies.
Without the loop it can be done once like this:
combmat<-(day1+day3+day4+day5+day6+day8+day9+day10+day11+day12+day13+day14+day15+day16+day17+day18+day19+day20+day22+day23+day24+day25+day26+day27+day28+day29)
I have tried
sum(list=ls(pattern="^day"))
without any luck ...
Thank you!
Maybe something like
day1<-matrix(c(1:4),2,2)
day2<-matrix(c(1:4),2,2)
day3<-matrix(c(1:4),2,2)
day4<-matrix(c(1:4),2,2)
list=ls(pattern="^day")
res<-lapply(list,"get")
do.call("sum",res)
> do.call("sum",res)
[1] 40
will work for you
get returns the value of a named object. So get("x") would return the variable x

Resources