Error when merging 2 dataframes and assigning values in "new" column - r

Below I have code to merge two data frames and assign values 3 and -1,
candidate_score<-merge(check7,anskey,by='Question.ID')
candidate_score$correct <- candidate_score$Selected.Option.ID == candidate_score$Correct.Option.ID
candidate_score$score <-
ifelse(candidate_score$correct== TRUE, 3,
ifelse(candidate_score$correct== FALSE, -1, ifelse(candidate_score$Correct.Option.ID == Full Marks ,3,NA)))
I am having student data, when am assigning marks 3,-1 according to candidate_score$score data frame its shown below the marks 3 is not assigned to Full Marks in correct.option.idcolumn according to my candidate_score$score code how can i achieve my desired output?
i want to also assign 3 marks wherever correct.option.id has Full Marks.

The second ifelse has 4 arguments. You need to decide whether the consequent of that conditional should be -1 or NA. There's no way of determining your intent from the material presented. Best would be a sample dataframe or vector and some description.
It's often easier to find errors if you insert space after commas and surround assignment operators with spaces as well. I've tried to edit the code in a more structured manner.
Responding to the request for clarification... this is the second ifelse call::
ifelse(candidate_score$correct== FALSE, # arg 1 (the condition)
-1 , # arg 2 (the consequent)
NA, # arg 3 (should be the alternative)
# and the following fourth argument causes an error.
ifelse(candidate_score$Correct.Option.ID == Full Marks ,3,NA))
Still not entirely clear what logical tests are to be applied but perhaps you want this:
candidate_score$score <-
ifelse(candidate_score$correct== TRUE | candidate_score$Correct.Option.ID == 'Full Marks',
3,
ifelse( candidate_score$correct== FALSE, -1,,NA))
You should also realize that the ==TRUE parts are not needed, since TRUE has the same value as TRUE==TRUE, and FALSE has the same value as FALSE==TRUE,

Related

What does index do in r?

I have a code I'm working with which has the following line,
data2 <- apply(data1[,-c(1:(index-1))],2,log)
I understand that this creates a new data frame, from the data1, taking column-wise values log-transformed and some columns are eliminated, but I don't understand how the columns are removed. what does 1:(index-1) do exactly?
The ":" operator creates an integer sequence. Because (1:(index-1) ) is numeric and being used in the second position for the extraction operator"[" applied to a dataframe, it is is referring to column numbers. The person writing the code didn't need the c-function. It could have been more economically written:
data1[,-(1:(index-1))]
# but the outer "("...")"'s are needed so it starts at 1 rather than -1
So it removes the first index-1 columns from the object passed to apply. (As MrFlick points out, index must have been defined before this gets passed to R. There's not default value or interpretation for index in R.
Suppose the index is 5, then index -1 returns 4 so the sequence will be from 1 to 4 i.e. and then we use - implies loop over the columns other than the first 4 columns as MARGIN = 2

Add a specified number of blank rows to a data table without overwriting the heading

Im trying to make a large blank data.table with a header row in order to add values in specific places once it is set up. I have been able to duplicate the first row and then clear every other row or every row, but what I'd like to do is clear every row after the header row. Some columns are numeric input and some are character input.
[input3]:
headers: header1 header2 header3..... header 60+
Values: NA NA NA ... NA
Duplicate row:
input3 <- input2[rep(1:nrow(input2), each = 2), ]
Clear every row:
input3[1:nrow(input3) %% 1 == 0, ] <- NA
But if I try to rewrite that as duplicating blank rows starting at row 2 (to preserve the header) I get this error:
input3[2:nrow(input3) %% 1 == 0, ] <- NA
"Error in [.data.table(x, i, which = TRUE) : i evaluates to a logical vector length 9 but there are 10 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle."
I need to be able to dynamically add rows while keeping the header as this is going to be a gigantic table I will export to another program.
Edit: this is different from this link in that I'm adding additional rows not specified originally in the data. Not just wiping rows.
Instead use
input3[c(FALSE,2:nrow(input3) %% 1 == 0,] <- NA
By using 2:nrow, you were explicitly giving a shortened vector. When that thing is a logical vector, it must be length 1 or the same as the number of rows. Period.
Though this has its problems and I discourage its use, perhaps you were expecting it to behave like this:
input3[which(2:nrow(input3) %% 1 == 0),] <- NA
The "good" of this is that the which(...) returns a vector of integer, so it does not need to be the same length as the number of rows in the frame/table.
From ?Extract (which includes [ and friends):
For '['-indexing only: 'i', 'j', '...' can be logical
vectors, indicating elements/slices to select. Such vectors
are recycled if necessary to match the corresponding extent.
'i', 'j', '...' can also be negative integers, indicating
elements/slices to leave out of the selection.
"Recycling" is why length 1 works: its logical value is used for all rows. If you use length 2 and there are an even number of rows (e.g., mtcars[c(T,F),]), then it will give every-other-row. On a similar vein, if you assume recycling and there are not an even multiple of rows (e.g., mtcars[c(T,F,F),]), then your assumptions start becoming less clear.
Add to that the behavior of data.table where it does not enforcing of this. Recycling can get you in trouble, so data.table doesn't encourage it.
library(data.table)
mt <- as.data.table(mtcars)
mt[c(T,F),] <- NA
# Error in `[.data.table`(x, i, which = TRUE) :
# i evaluates to a logical vector length 2 but there are 32 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
mt[c(1,3),] <- NA

R: Produce Index Values to Group Increasing Values in Vector

I have a list of increasing year values that occasionally has breaks in it and I want to create a grouping value for each unbroken sequence. Think of a vector like this one (missing 2005,2011):
x <- c(2001,2002,2003,2004,2006,2007,2008,2009,2010,2013,2014,2015,2016)
I would like to produce an equal length vector that numbers every value in a run with the same index to end up with something like this.
[1] 1 1 1 1 2 2 2 2 2 3 3 3 3
I would like to do this using best R practices so I am trying to avoid falling back to a for loop but I am not sure how to get from Vector A to Vector B. Does anyone have any suggestions?
Some things I know I can do:
I can flag the record before or after a gap as true with an ifelse
I can get the index of when the counter should change by wrapping that in a which statement
This is the code to do each
ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE)
which(ifelse(!is.na(lag(x)) & x == lag(x)+1, FALSE, TRUE))
I think there a couple solutions to this problem. One as d.b posted in the comment above that will produce a sequence that increments every time there is a break in the sequence.
cummax(c(1, diff(x)))
There is a similar solution that I chose to use with ifelse() flagging breaks and cumsum(). I chose this solution because additional information,like other vectors, can be included in the decision and diff seems to have problems with very erratic up and down values.
cumsum(ifelse(!is.na(lag(x)) & x == lag(x) + 1, FALSE, TRUE))

Couldn't reduce the looping variable inside the "for" loop in R

I have a for loop to do a matrix manipulation in R. For some checks are true i need to come to the same row again., means i need to be reduced by 1.
for(i in 1:10)
{
if(some chk)
{
i=i-1
}
}
Actually i is not reduced for me. For an example in 5th row i'm reducing the i to 4, so again it should come as 5, but it is coming as 6.
Please advice.
My intention is:
Checking the first column values of a matrix, if I find any duplicate value, I take the second column value and append with the first row's second column and remove the duplicate row. So, when I'm removing a row I do not need increase the i in while loop. (This is just a map reduce method, append values of same key)
Variables in R for loops are read-only, you cannot modify them. What you have written would be solved completely differently in normal R code – the exact solution depending on the actual problem, there isn’t a generic, direct replacement (except by replacing the whole thing with a while loop but this is both ugly and probably unnecessary).
To illustrate this, consider these two typical examples.
Assume you want to filter all duplicated elements from a list. Instead of looping over the list and copying all duplicated elements, you can use the duplicated function which tells you, for each element, whether it’s a duplicate.
Secondly, you use standard R subsetting syntax to select just those elements which are not a duplicate:
x = x[! duplicated(x)]
(This example works on a one-dimensional vector or list, but it can be generalised to more dimensions.)
For a more complex case, let’s say that you have a vector of numbers and, for every even number in the vector, you want to double the preceding number (this is highly artificial but in signal processing you might face similar problems). In other words:
input = c(1, 3, 2, 5, 6, 7, 1, 8)
output = ???
output
# [1] 1 6 2 10 6 7 2 8
… we want to fill in ???. In the first step, we check which numbers are even:
even = input %% 2 == 0
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
Next, we shift the result down – because we want to know whether the next number is even – by removing the first element, and appending a dummy element (FALSE) at the end.
even = c(even[-1], FALSE)
# [1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
And now we can multiply just these inputs by two:
output = input
output[even] = output[even] * 2
There, done.

Why does R need the name of the dataframe?

If you have a dataframe like this
mydf <- data.frame(firstcol = c(1,2,1), secondcol = c(3,4,5))
Why would
mydf[mydf$firstcol,]
work but
mydf[firstcol,]
wouldn't?
You can do this:
mydf[,"firstcol"]
Remember that the column goes second, not first.
In your example, to see what mydf[mydf$firstcol,] gives you, let's break it down:
> mydf$firstcol
[1] 1 2 1
So really mydf[mydf$firstcol,] is the same as
> mydf[c(1,2,1),]
firstcol secondcol
1 1 3
2 2 4
1.1 1 3
So you are asking for rows 1, 2, and 1. That is, you are asking for your row one to be the same as row 1 of mydf, your row 2 to be the same as row 2 of mydf and your row 3 to be the same as row 1 of mydf; and you are asking for both columns.
Another question is why the following doesn't work:
> mydf[,firstcol]
Error in `[.data.frame`(mydf, , firstcol) : object 'firstcol' not found
That is, why do you have to put quotes around the column name when you ask for it like that but not when you do mydf$firstcol. The answer is just that the operators you are using require different types of arguments. You can look at '$' to see the form x$name and thus the second argument can be a name, which is not quoted. You can then look up ?'[', which will actually lead you to the same help page. And there you will find the following, which explains it. Note that a "character" vector needs to have quoted entries (that is how you enter a character vector in R (and many other languages).
i, j, ...: indices specifying elements to extract or replace. Indices
are ‘numeric’ or ‘character’ vectors or empty (missing) or
‘NULL’. Numeric values are coerced to integer as by
‘as.integer’ (and hence truncated towards zero). Character
vectors will be matched to the ‘names’ of the object (or for
matrices/arrays, the ‘dimnames’): see ‘Character indices’
below for further details.
Nothing to add to the very clear explanation of Xu Wang. You might want to note in addition that the package data.table allows you to use notation such as mydf[firstcol==1,] or mydf[,firstcol], that many find more natural.

Resources