for loop in R using if & print [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Maybe I'm thinking too hard on this but I need to create a for loop & if statement to find the highest value in my data set. We also have to write a print statement that prints it out & the day. There's 93 rows & 4 columns in the initial matrix. Column 4 has the needed data. The days are in column 1.
I don't know programming at all. So far this is what I got:
I created a vector out of the column with the data:
only.data <- c(data[,4])
Here's my feeble attempt at a for & if statement:
for (counter in 1:93) {
if (only.data >= data[,4])
print (only.data)
}
How do I get it to spit out the highest value using this method? It prints the max value 93 times and that's not what I want. Do I need to create the only.data vector or can I use the original matrix? I also need to print out the corresponding date next to the highest value.
ps - I know I can use the max function which is much quicker but that's not the assignment.

It seems like you are cheating, thus I won't post a full solution here, but only point you in the right direction
data[,4] is already a vector and there is no reason whatsoever to use c() on it. There is also no reason to save it in a new object only.data, although it potentially can make your loop faster as it won't need to index in each loop.
The idea of a loop is that you will use an index in it (although you don't have to, but there is no real reason not to). Thus, you are specifying the index in for(). Although you specified an index (counter), you haven't used it, thus your loop prints only.data regardless of anything you are doing.
All your if doing is to check if only.data >= only.data in every iteration (which is obviously unnecessary)
To calculate the maximum in a loop is not such an obvious thing, as you comparing a single value in each iteration, thus you''ll need some strategy. For example, you could create a dummy variable which will be compared in each iteration against only.data[counter] to check if it's bigger, and then be replaced in case it's not
To illustrate my last point, consider a toy example
set.seed(1)
only.data <- sample(10,10)
only.data
#[1] 3 4 5 7 2 8 9 6 10 1
You can see that the maximum value is in the 9th position, now we will assign the first value of this vector to a dummy variable and will try to use a for loop in order to find the maximum
dummy <- only.data[1]
dummy
## [1] 3
for (counter in only.data) {
if (counter > dummy) dummy <- counter
}
dummy
## [1] 10

Related

for and if cicle operations

Hi¡ I have a doubt and I hope someone can help me please, I have a dataframe in R and it makes a double cicle for and an if, the data frame has some values and then if the condition is True, it makes some operations, the problem is I can't understand neither the cicle and the operation the code makes under the condition.
I reply the code I have in a simpler one but the idea is the same. And if someone can explain me the whole operation please.
w<-c(2,5,4,3,5,6,8,2,4,6,8)
x<-c(2,5,6,7,1,1,4,9,8,8,2)
y<-c(2,5,6,3,2,4,5,6,7,3,5)
z<-c(2,5,4,5,6,3,2,5,6,4,6)
letras<-data.frame(w,x,y,z)
l=1
o=1
v=nrow(letras)
letras$op1<-c(1)
letras$op2<-c(0)
for (l in 1:v) {
for (o in 1:v) {
if(letras$x[o]==letras$y[l] & letras$z[l]==letras$z[o] & letras$w[l]){
letras$op1<-letras$op1+1
letras$op2<-letras$x*letras$y
}
}
}
The result is the following:
Thanks¡¡¡¡¡
This segment of code is storing values into vectors labeled w,x,y,z.
w<-c(2,5,4,3,5,6,8,2,4,6,8)
x<-c(2,5,6,7,1,1,4,9,8,8,2)
y<-c(2,5,6,3,2,4,5,6,7,3,5)
z<-c(2,5,4,5,6,3,2,5,6,4,6)
It then transforms the 4 vectors into a data frame
letras<-data.frame(w,x,y,z)
This bit of code isn't doing anything as far as I can tell.
l=1 #???
o=1 #???
This counts how many rows is in the letras data frame and stores to v, in this case 11 rows.
v=nrow(letras)
This creates new columns in letras dataframe with all ones in op1 and all zeros in op2
letras$op1<-c(1)
letras$op2<-c(0)
Here each for loop is acting as a counter, and will run the code beneath it iteratively from 1 to v (11), so 11 iterations. Each iteration the value of l will increase by 1. So first iteration l = 1, second l=2... etc.
for (l in 1:v) {
You then have a second counter, which is running within the first counter. So this will iterate over 1 to 11, exactly the same way as above. But the difference is, this counter will need to complete it's 1 to 11 cycle before the top level counter can move onto the next number. So o will effectively cycle from 1 to 11, for each 1 count of 1l. So with the two together, the inside for loop will count from 1 to 11, 11 times.
for (o in 1:v) {
You then have a logical statement which will run the code beneath if the column x and column y values are the same. Remember they will be calling different index values so it could be 1st x value vs the 2nd y value. There is an AND statement so it also needs the two z position values to be equal. and the last part letras$w[l] is always true in this particular example, so could possibly be removed.
if(letras$x[o]==letras$y[l] & letras$z[l]==letras$z[o] & letras$w[l]){
Lastly, is the bit that happens if the above statement is true.
op1 get's 1 added (remember this was starting from 1 anyway), and op2 multiplies x*y columns together. This multiplication is perhaps a little bit inefficient, because x and y do not change, so the answer will calculate the same result each time the the if statement evaluates TRUE.
letras$op1<-letras$op1+1
letras$op2<-letras$x*letras$y
}
}
}
Hope this helps.

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

Count duration of value in vector in R

I am trying to count the length of occurrances of a value in a vector such as
q <- c(1,1,1,1,1,1,4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,6,6,6,6,1,1,4,4,4)
Actual vectors are longer than this, and are time based. What I would like would be an output for 4 that tells me it occurred for 12 time steps (before the vector changes to 6) and then 3 time steps. (Not that it occurred 15 times total).
Currently my ideas to do this are pretty inefficient (a loop that looks element by element that I can have stop when it doesn't equal the value I specified). Can anyone recommend a more efficient method?
x <- with(rle(q), data.frame(values, lengths)) will pull the information that you want (courtesy of d.b. in the comments).
From the R Documentation: rle is used to "Compute the lengths and values of runs of equal values in a vector – or the reverse operation."
y <- x[x$values == 4, ] will subset the data frame to include only the value of interest (4). You can then see clearly that 4 ran for 12 times and then later for 3.
Modifying the code will let you check whatever value you want.

Return matching names instead of binary variables in R

I'm new here and diving into R, and I'm encountering a problem while trying to solve a knapsack problem.
For optimization purposes I wrote a dynamic program in R, however, now that I am at the point of returning the items, which I succeeded in, I only get the binary numbers saying whether the item has been selected or not (1 = yes). Like this:
Select
[1] 1 0 0 1
However, now I would like the Select function to return the names of values instead of these binary values. Underneath I created an example of what my problem looks like.
This would be the data and a related data frame.
items <- c("Glasses","gloves","shoes")
grams <- c(4,2,3)
value <- c(100,20,50)
data <- data.frame(items,grams,value)
Now, I created various functions, with the final one clarifying whether a product has been selected by 1 (yes) or 0 (no). Like above. However, I would really like for it to return the related name of the item. Is there a manner to go around this by linking back to the dataframe created?
So that it would say instead of (in case all products are selected)
Select
[1] 1 1 1
Select
[1] Glasses gloves shoes
I believe I would have to create a new function. But as I mentioned, is there a good way to refer back to the data frame to take related values from another column in the data frame in case of a 1 (yes)?
I really hope my question is more clear now and someone can direct me in the right direction.
Best, Berber
Lets say your binary vector is
idx <- [1, 0, 1, 0, 1]
just use,
items[as.logical(idx)]
will give you the name for selected items, and
items[!as.logical(idx)]
will give you name for unselected items

Conditional Label in R without Loops

I'm trying to find out the best (best as in performance) to having a data frame of the form getting a new column called "Season" with each of the four seasons of the year:
MON DAY YEAR
1 1 1 2010
2 1 1 2010
3 1 1 2010
4 1 1 2010
5 1 1 2010
6 1 1 2010
One straightforward to do this is create a loop conditioned on the MON and DAY column and assign the value one by one but I think there is a better way to do this. I've seen on other posts suggestions for ifelse or := or apply but most of the problem stated is just binary or the value can be assigned based on a given single function f based on the parameters.
In my situation I believe a vector containing the four stations labels and somehow the conditions would suffice but I don't see how to put everything together. My situation resembles more of a switch case.
Using modulo arithmetic and the fact that arithmetic operators coerce logical-values to 0/1 will be far more efficient if the number of rows is large:
d$SEASON <- with(d, c( "Winter","Spring", "Summer", "Autumn")[
1+(( (DAY>=21) + MON-1) %/% 3)%%4 ] )
The first added "1" shifts the range of the %%4 operationon all the results inside the parentheses from 0:3 to 1:4. The second subtracted "1" shifts the (inner) 1:12 range back to 0:11 and the (DAY >= 21) advances the boundary months forward one.
I'll start by giving a simple answer then I'll delve into the details.
I quick way to do this would be to check the values of MON and DAY and output the correct season. This is trivial :
f=function(m,d){
if(m==12 && d>=21) i=3
else if(m>9 || (m==9 && d>=21)) i=2
else if(m>6 || (m==6 && d>=21)) i=1
else if(m>3 || (m==3 && d>=21)) i=0
else i=3
}
This f function, given a day and a month, will return an integer corresponding to the season (it doesn't matter much if it's an integer or a string ; integer only allows to save a bit of memory but it's a technicality).
Now you want to apply it to your data.frame. No need to use a loop for this ; we'll use mapply. d will be our simulated data.frame. We'll factor the output to have nice season names.
d=data.frame(MON=rep(1:12,each=30),DAY=rep(1:30,12),YEAR=2012))
d$SEA=factor(
mapply(f,d$MON,d$DAY),
levels=0:3,
labels=c("Spring","Summer","Autumn","Winter")
)
There you have it !
I realize seasons don't always change a 21st. If you need fine tuning, you should define a 3-dimension array as a global variable to store the accurate days. Given a season and a year, you could access the corresponding day and replace the "21"s in the f function with the right calls (you would obviously add a third argument for the year).
About the things you mentionned in your question :
ifelse is the "functionnal" way to make a conditionnal test. On atomic variables it's only slightly better than the conditionnal statements but it is vectorized, meaning that if the argument is a vector, it will loop itself on its elements. I'm not familiar with it but it's the way to got for an optimized solution
mapply is derived from sapply of the "apply family" and allows to call a function with several arguments on vector (see ?mapply)
I don't think := is a standard operator in R, which brings me to my next point :
data.table ! It's a package that provides a new structure that extends data.frame for fast computing and typing (among other things). := is an operator in that package and allows to define new columns. In our case you could write d[,SEA:=mapply(f,MON,DAY)] if d is a data.table.
If you really care about performance, I can't insist enough on using data.table as it is a major improvement if you have a lot of data. I don't know if it would really impact time computing with the solution I proposed though.

Resources