Couldn't reduce the looping variable inside the "for" loop in R - r

I have a for loop to do a matrix manipulation in R. For some checks are true i need to come to the same row again., means i need to be reduced by 1.
for(i in 1:10)
{
if(some chk)
{
i=i-1
}
}
Actually i is not reduced for me. For an example in 5th row i'm reducing the i to 4, so again it should come as 5, but it is coming as 6.
Please advice.
My intention is:
Checking the first column values of a matrix, if I find any duplicate value, I take the second column value and append with the first row's second column and remove the duplicate row. So, when I'm removing a row I do not need increase the i in while loop. (This is just a map reduce method, append values of same key)

Variables in R for loops are read-only, you cannot modify them. What you have written would be solved completely differently in normal R code – the exact solution depending on the actual problem, there isn’t a generic, direct replacement (except by replacing the whole thing with a while loop but this is both ugly and probably unnecessary).
To illustrate this, consider these two typical examples.
Assume you want to filter all duplicated elements from a list. Instead of looping over the list and copying all duplicated elements, you can use the duplicated function which tells you, for each element, whether it’s a duplicate.
Secondly, you use standard R subsetting syntax to select just those elements which are not a duplicate:
x = x[! duplicated(x)]
(This example works on a one-dimensional vector or list, but it can be generalised to more dimensions.)
For a more complex case, let’s say that you have a vector of numbers and, for every even number in the vector, you want to double the preceding number (this is highly artificial but in signal processing you might face similar problems). In other words:
input = c(1, 3, 2, 5, 6, 7, 1, 8)
output = ???
output
# [1] 1 6 2 10 6 7 2 8
… we want to fill in ???. In the first step, we check which numbers are even:
even = input %% 2 == 0
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
Next, we shift the result down – because we want to know whether the next number is even – by removing the first element, and appending a dummy element (FALSE) at the end.
even = c(even[-1], FALSE)
# [1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
And now we can multiply just these inputs by two:
output = input
output[even] = output[even] * 2
There, done.

Related

Removing certain elements from a vector in R [duplicate]

This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Closed 5 years ago.
I have a vector called data which has approximately 35000 elements (numeric). And I have a numeric vector A. I want to remove every appearance of elements of A in the vector data. For example if A is the vector [1,2]. I want to remove all appearance of 1 and 2 in the vector data. How can I do that? Is there a built in function that does this? I couldn't think of a way. Doing it with a loop would take a long time I assume. Thanks!
There is this handy %in%-operator. Look it up, one of the best things I can think of in any programming language! You can use it to check all elements of one vector A versus all elements of another vector B and returns a logical vector that gives the positions of all elements in A that can be found in B. It is what you need! If you are new to R, it might seem a bit weird, but you will get very much used to it.
Ok, so how to use it? Lets say datvec is your numeric vector:
datvec = c(1, 4, 1, 7, 5, 2, 8, 2, 10, -1, 0, 2)
elements_2_remove = c(1, 2)
datvec %in% elements_2_remove
## [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
So, you see a vector that gives you the positions of either 1 or 2 in datvec. So, you can use it to index what yuo want to retain (by negating it):
datvec = datvec[!(datvec %in% elements_2_remove)]
And you are done!

Error when merging 2 dataframes and assigning values in "new" column

Below I have code to merge two data frames and assign values 3 and -1,
candidate_score<-merge(check7,anskey,by='Question.ID')
candidate_score$correct <- candidate_score$Selected.Option.ID == candidate_score$Correct.Option.ID
candidate_score$score <-
ifelse(candidate_score$correct== TRUE, 3,
ifelse(candidate_score$correct== FALSE, -1, ifelse(candidate_score$Correct.Option.ID == Full Marks ,3,NA)))
I am having student data, when am assigning marks 3,-1 according to candidate_score$score data frame its shown below the marks 3 is not assigned to Full Marks in correct.option.idcolumn according to my candidate_score$score code how can i achieve my desired output?
i want to also assign 3 marks wherever correct.option.id has Full Marks.
The second ifelse has 4 arguments. You need to decide whether the consequent of that conditional should be -1 or NA. There's no way of determining your intent from the material presented. Best would be a sample dataframe or vector and some description.
It's often easier to find errors if you insert space after commas and surround assignment operators with spaces as well. I've tried to edit the code in a more structured manner.
Responding to the request for clarification... this is the second ifelse call::
ifelse(candidate_score$correct== FALSE, # arg 1 (the condition)
-1 , # arg 2 (the consequent)
NA, # arg 3 (should be the alternative)
# and the following fourth argument causes an error.
ifelse(candidate_score$Correct.Option.ID == Full Marks ,3,NA))
Still not entirely clear what logical tests are to be applied but perhaps you want this:
candidate_score$score <-
ifelse(candidate_score$correct== TRUE | candidate_score$Correct.Option.ID == 'Full Marks',
3,
ifelse( candidate_score$correct== FALSE, -1,,NA))
You should also realize that the ==TRUE parts are not needed, since TRUE has the same value as TRUE==TRUE, and FALSE has the same value as FALSE==TRUE,

ifelse with for loop

I would like to traverse through rows of a matrix and perform some operations on data entries based on a condition.
Below is my code
m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
if (m[,colSums(!is.na(m)) > 1, drop = FALSE]){
for(i in 1:4){
a = which(m[i,] != "NA") - mean(which(!is.na(m[i,])))
for(j in 2:5){
b = which(m[j,] != "NA") - mean(which(!is.na(m[j,])))
prod(a,b)
}
}
}
I get a warning message as below in my "if" condition
Warning message:
In if (m[, colSums(!is.na(m)) > 1, drop = FALSE]) { :
the condition has length > 1 and only the first element will be used
I know it returns a vector and I should be using ifelse block. How to incorporate for loops inside ifelse block? It seems to be a basic question, I am new to R.
Based on your description, you want to check the number of non NA in matrix by column and then do something dependent on this results (that why you need "if"/"ifelse" statement). So, you can implemented as below, and write inner loops in a specific function.
yourFunc <- function(x, data) {
# do what your want / your loops on "data"
# sample, you can check the result in here
if(x > 1) 1
else 0
}
m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
# use "apply" series function in here
sapply(colSums(!is.na(m)), yourFunc, data=m)
#[1] 1 0 1 0
Actually, I think you need to re-organize your problem and optimize the code, the "ifelse with for loop" may be totally unnecessary.
As you are new to R, I assume that some of the terminology is maybe a bit
confusing. So here is a little explanation regarding the if statement.
Lets look at the if condition:
m[,colSums(!is.na(m)) > 1, drop = FALSE]
[,1] [,2]
[1,] 1 NA
[2,] 2 NA
[3,] NA 4
[4,] NA 5
[5,] 5 NA
This is nothing that if can work with as an if condition has to be
boolean (evaluate to TRUE/FALSE). So why the result? Well the result of
colSums(!is.na(m))
[1] 3 1 2 0
is a vector of counts of entries that are not NA! (= number of TRUE's in each column). Be carful as this is not the same as
colSums(m, na.rm = TRUE)
[1] 8 1 9 0
which returns a vector of sums over all five rows for each column, excluding NA's. My guess is that the latter is what you are looking for. In any case: be aware of the difference!
By asking which of those sums is greater than 1 you do get a boolean vector
colSums(!is.na(m)) > 1
[1] TRUE FALSE TRUE FALSE
However, using that boolean vector as a criteria for selecting columns, you correctly get a matrix which is obviously not boolean:
m[,colSums(!is.na(m)) > 1]
Note: drop = FALSE is unnecessary here as there are no dimensions to be dropped potentially. See ?[ or ?drop. You can verify this using identical:
identical(m[,colSums(!is.na(m)) > 1, drop = FALSE],
m[,colSums(!is.na(m)) > 1])
Now to the loop. You find tons of discussions on avoiding for loops and using the apply family of functions. I suspect you have to take some time togo through all that. Note however, that using apply - contrary to common belief - is not necessarily superior to a for loop in terms of speed, as it is actually just a fancy wrapper around a for loop (check the source code!). It is, however, clearly superior in terms of code clarity as it is compact and clear about what it is doing. So do try to use apply functions if possible!
In order to rewrite your loop it would be helpful if you could verbally
describe what you actually want to do, since I assume that what the loop
is doing right now is probably not what you want. As which() returns the index/posistion of an element in a vector or matrix what you are basically
doing is:
indices of the i'th row that are not NA (for a given column) - mean over these indices
While this is theoretically possible, this usually doesnt make much sense. So with all my notes at hand: clearly state your problem so we can think of a fix.

R in simple terms - why do I have to feel like such an idiot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
my question is simple... every reference I find in books and on the internet for learning R programming is presented in a very linear way with no context. When I try and learn things like functions, I see the code and my brain just freezes because it's looking for something to relate these R terms to and I have no frame of reference. I have a PhD and did a lot of statistics for my dissertation but that was years ago when we were using different programming languages and when it comes to R, I don't know why I can't get this into my head. Is there someone who can explain in plain english an example of this simple code? So for example:
above <- function(x, n){
use <- x > n
x[use]
}
x <- 1:20
above(x, 12)
## [1] 13 14 15 16 17 18 19 20
I'm trying to understand what's going on in this code but simply don't. As a result, I could never just write this code on my own because I don't have the language in my head that explains what is happening with this. I get stuck at the first line:
above <- function(x, n) {
Can someone just explain this code sample in plain English so I have some kind of context for understanding what I'm looking at and why I'm doing what I'm doing in this code? And what I mean by plain English is, walking through the code, step by step and not just repeating the official terms from R like vector and function and array and all these other things, but telling me, in a common sense way, what this means.
Since your background ( phd in statsitics) the best way to understand this
is in mathematics words.
Mathematically speaking , you are defining a parametric function named above that extracts all element from a vector x above a certain value n. You are just filtering the set or the vector x.
In sets notation you can write something like :
above:{x,n} --> {y in x ; y>n}
Now, Going through the code and paraphrasing it (in the left the Math side , in the right its equivalent in R):
Math R
---------------- ---------------------
above: (x,n) <---> above <- function(x, n)
{y in x ; y>n} <---> x[x > n]
So to wrap all the statments together within a function you should respect a syntax :
function_name <- function(arg1,arg2) { statements}
Applying the above to this example (we have one statement here) :
above <- function(x,n) { x[x>n]}
Finally calling this function is exactly the same thing as calling a mathematical function.
above(x,2)
ok I will try, if this is too detailed let me know, but I tried to go really slowly:
above <- function(x, n)
this defines a function, which is just some procedure which produces some output given some input, the <- means assign what is on the right hand side to what is on the left hand side, or in other words put everything on the right into the object on the left, so for example container <- 1 puts 1 into the container, in this case we put a function inside the object above,
function(x, n) everything in the paranthesis specifys what inputs the function takes, so this one takes two variables x and n,
now we come to the body of the function which defines what it does with the inputs x and n, the body of the function is everything inside the curley braces:
{
use <- x > n
x[use]
}
so let's explain that piece by piece:
use <- x > n
this part again puts whats on the right side into the object on the left, and what is happening on the right hand side? a comparison returning TRUE if x is bigger than n and FALSE if x is equal to or smaller then n, so if x is 5 and n is 3 the result will be TRUE, and this value will get stored inside use, so use contains TRUE now, now if we have more than one value inside x than every value inside x will get compared to n, so for example if x = [1, 2, 3] and n = 2
than we have
1 > 2 FALSE
2 > 2 FALSE
3 > 2 TRUE
, so use will contain FALSE, FALSE, TRUE
x[use]
now we are taking a part of x, the square brackets specify which parts of x we want, so in my example case x has 3 elements and use has 3 elements if we combine them we have:
x use
1 FALSE
2 FALSE
3 TRUE
so now we say I dont want 1,2 but i want 3 and the result is 3
so now we have defined the function, now we call it, or in normal words we use it:
x <- 1:20
above(x, 12)
first we assign the numbers 1 through 20 to x, and then we tell the function above to execute (do everything inside its curley braces with the inputs x = 1:20 and n = 12, so in other words we do the following:
above(x, 12)
execute the function above with the inputs x = 1:20 and n = 12
use <- 1:20 > 12
compare 12 to every number from 1:20 and return for each comparison TRUE if the number is in fact bigger than 12 and FALSE if otherwise, than store all the results inside use
x[use]
now give me the corresponding elements of x for which the vector use contains TRUE
so:
x use
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
8 FALSE
9 FALSE
10 FALSE
11 FALSE
12 FALSE
13 TRUE
14 TRUE
15 TRUE
16 TRUE
17 TRUE
18 TRUE
19 TRUE
20 TRUE
so we get the numbers 13:20 back as a result
I'll give it a crack too. A few basic points that should get you going in the right direction.
1) The idea of a function. Basically, a function is reusable code. Say I know that in my analysis for some bizarre reason I will often want to add two numbers, multiply them by a third, and divide them by a fourth. (Just suspend disbelief here.) So one way I could do that would just be to write the operation over and over, as follows:
(75 + 93)*4/18
(847 + 3)*3.1415/2.7182
(999 + 380302)*-6901834529/2.5
But that's tedious and error-prone. (What happens if I forget a parenthesis?) Alternatively, I can just define a function that takes whatever numbers I feed into it and carries out the operation. In R:
stupidMath <- function(a, b, c, d){
result <- (a + b)*c/d
}
That code says "I'd like to store this series of commands and attach them to the name "stupidMath." That's called defining a function, and when you define a function, the series of commands is just stored in memory---it doesn't actually do anything until you "call" it. "Calling" it is just ordering it to run, and when you do so, you give it "arguments" ---the stuff in the parentheses in the first line are the arguments it expects, i.e., in my example, it wants four distinct pieces of data, which will be called 'a', 'b', 'c', and 'd'.
Then it'll do the things it's supposed to do with whatever you give it. "The things it's supposed to do" is the stuff in the curly brackets {} --- that's the "body" of the function, which describes what to do with the arguments you give it. So now, whenever you want to carry that mathematical operation you can just "call" the function. To do the first computation, for example, you'd just write stupidMath(75, 93, 4, 18) Then the function gets executed, treating 75 as 'a', 83 as 'b', and so forth.
In your example, the function is named "above" and it takes two arguments, denoted 'x' and 'n'.
2) The "assignment operator": R is unique among major programming languages in using <- -- that's equivalent to = in most other languages, i.e., it says "the name on the left has the value on the right." Conceptually, it's just like how a variable in algebra works.
3) so the "body" of the function (the stuff in the curly brackets) first assigns the name "use" to the expression x > n. What's going on there. Well, an expression is something that the computer evaluates to get data. So remember that when you call the function, you give it values for x and n. The first thing this function does is figures out whether x is greater than n or less than n. If it's greater than n, it evaluates the expression x > n as TRUE. Otherwise, FALSE.
So if you were to define the function in your example and then call it with above(10, 5), then the first line of the body would set the local variable (don't worry right now about what a 'local' variable is) 'use' to be 'TRUE'. This is a boolean value.
Then the next line of the function is a "filter." Filtering is a long topic in R, but basically, R things of everything as a "vector," that is, a bunch of pieces of data in a row. A vector in R can be like a vector in linear algebra, i.e., (1, 2, 3, 4, 5, 99) is a vector, but it can also be of stuff other than numbers. For now let's just focus on numbers.
The wacky thing about R (one of the many wacky things about R) is that it treats a single number (a "scalar" in linear algebra terms) just as a vector with only one item in it.
Ok, so why did I just go into that? Because in lots of places in R, a vector and a scalar are interchangable.
So in your example code, instead of giving a scalar for the first argument, when we call the function we've given 'above' a vector for its first argument. R likes vectors. R really likes vectors. (Just talk to R people for a while. They're all obsessed with doing every goddmamn thing in terms of a vector.) So it's no problem to pass a vector for the first argument. But what that means is that the variable 'use' is going to be a vector too. Specifically, 'use' is going to be a vector of booleans, i.e., of TRUE or FALSE for each individual value of X.
To take a simpler version: suppose you said:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
when the code runs, the first thing it's going to do is define that 'use' variable. But x is a vector now, not a scalar (the c(5,10) code said "make a vector with two elements, and fill them with the numbers '5' and '10'), so R's going to go ahead and carry out the comparison for each element of x. Since 5 is less than 7 and 10 is greater than 7, use becomes the two item-vector of boolean values (FALSE, TRUE)
Ok, now we can talk about filtering. So a vector of boolean values is called a 'logical vector.' And the code x[use] says "filter x by the stuff in the variable use." When you tell R to filter something by a logical vector, it spits back out the elements of the thing being filtered which correspond to the values of 'TRUE'
So in the example just given:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
the value of myresult will just be 10. Why? Because the function filtered 'x' by the logical vector 'use,' 'x' was (5, 10), and 'use' was (FALSE, TRUE); since the second element of the logical was the only true, you only got the second element of x.
And that gets assigned to the variable myresult because myresult <- above(mynums, 7) means "assign the name myresult to the value of above(mynums, 7)"
voila.

remove first ocurrence data frame R

So I've been playing around with a data frame in R, although I'm still thinking too much in Python and cannot seem to find a solution for my problem.
I have a data frame and one of the column is an user id. I would like to remove all the first occurrence of a number, for instance:
1,2,3,4,3,4,2,1,3,4,6,7,7
I would like to have an output like this:
3,4,2,1,3,4,7
Where the first time the user_id appears I would remove it but keep all the others even if repeated.
With python I would probably use enumerate or loop over it. For R, I've seen some functions that seem cool but I'm not sure how to use it with the data frame, like rle.
Any pointers will be really helpful since right now I'm a bit lost about the best approach for this problem.
Thank you all
The function duplicated() is going to be helpful here:
x <- c(1,2,3,4,3,4,2,1,3,4,6,7,7)
> x[duplicated(x)]
[1] 3 4 2 1 3 4 7
This works because duplicated() returns a logical vector indicating whether that element is, well, duplicated:
duplicated(x)
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
You then use this logical vector to subset (extract) the values you want from x. But notice that in the extraction I keep all of the duplicated values, not remove them.
To remove all of the duplicated values (not what you want, but I illustrate regardless), try the negation:
x[!duplicated(x)]
[1] 1 2 3 4 6 7

Resources