Get changing patterns with grep -- R - r

I want to grade several students exams using an answer key with grep. So for example, the student's answers were
A B B C E D D
and the key is
A D B C E CD ABD
I want to check to see if the student's answers are found in the corresponding position in the answer key (multiple letters indicate "or" not "and". So "C" or "D"). How would I got about that using grep?

Or we can use Map/mapply from base R
unname(mapply(grepl, answer, key))
#[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE
data
answer <- c("A", "B", "B", "C", "E", "D", "D")
key <- c("A", "D", "B", "C", "E", "CD", "ABD")

We can use the map2_lgl function from the purrr package with grepl. TRUE means the answer found to be matched with the key. FALSE means no match.
# Create example of answer and key
answer <- c("A", "B", "B", "C", "E", "D", "D")
key <- c("A", "D", "B", "C", "E", "CD", "ABD")
# Load packages
library(purrr)
# Check if answer is in key
map2_lgl(answer, key, ~grepl(.x, .y))
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE

Related

Merging two vectors, but keeping unique elements of only one vector

Maybe easy question, but i failed nevertheless.
I have two vectors:
v1 <- c("A", "B", "C", "F", "G")
v2 <- c( "B", "C", "F", "G", "H","I")
I want to merge v1 and v2 to obtain a vector which contains all common elements and all unique elements of v2, but does not include any unique elements of v1.
Essentially, remove all "FALSE" of
> v1 %in% v2
[1] FALSE TRUE TRUE TRUE TRUE
but keep all "FALSE" of
> v2 %in% v1
[1] TRUE TRUE TRUE TRUE FALSE FALSE
plus any common element.
Desired output:
c("B", "C", "F", "G", "H","I")
Thank you very much!

R - combining columns by specific conditions

I currently has a data frame as follow:
groups <- data.frame(name=paste("person",c(1:27),sep=""),
assignment1 = c("F","A","B","H", "A", "E", "D", "G", "I", "I", "E", "A", "D", "C", "F", "C", "D", "H", "F", "H", "G", "I", "G", "C", "B", "E", "B"),
assignment2 = c("H", "F", "F", "D", "E", "G", "A", "E", "I", "C", "A", "H", "G", "B", "I", "C", "E", "I", "C", "A", "B", "B", "G", "D", "H", "F", "D"),stringsAsFactors = FALSE)
It would looks like this:
I would like to create a list for each person that only contains the people he had already worked with. For example, person1 is on group F and H for 1st and 2nd assignment respectively and
The member of groups F on 1st assignment are {"person1","person15", "person19"}.
The member of groups D on 2nd assignment are {"person1","person12", "person25"}.
I would like to create a vector for person1 like
{"person15", "person19", "person12", "person25"}.
Any one knows a convenient way to do this in R?
Any help will be appreciated. Thanks in advance.
You could do this:
teammates <- lapply(1:nrow(groups), function(i) {
assig1 <- subset(groups, assignment1 == groups$assignment1[i])$name
assig2 <- subset(groups, assignment2 == groups$assignment2[i])$name
unq_set <- unique(c(assig1, assig2))
return(setdiff(unq_set, groups$name[i]))
})
This takes a vector of row indices, and for each one applies a function that a) gets the names of those where assignments 1 & 2 match the given row, b) gets the unique superset of these, c) returns that, less the name of the person around whom the group is built
The output is a list like this:
[[1]]
[1] "person15" "person19" "person12" "person25"
[[2]]
[1] "person5" "person12" "person3" "person26"
[[3]]
[1] "person25" "person27" "person2" "person26"
...and so on
For more brevity, the following is equivalent (though order inside list items may be different). Same logic as #user5219763's answer for subsetting, but the setdiff part is important
teammates <- lapply(1:nrow(groups), function(i) {
setdiff(
with(groups, name[assignment1 == assignment1[i] |
assignment2 == assignment2[i] ]),
groups$name[i])
})
Here's a solution using dplyr and tidyr:
library(dplyr)
library(tidyr)
groups %>%
gather(var, val, -name) %>%
unite(comb, var, val) %>%
left_join(.,., by = 'comb') %>%
group_by(name.x) %>%
summarise(out = list(name.y))
The heavy lifting is done using the left_join before that we are combining columns, so that we can merge on eg assignment1_f. The output contains itself, and is not corrected for dupes - that is up to you.
However, as #akrun says, if you are doing a lot of this stuff, use igraph
You could use is.element()
workedWith <- function(index,data=groups){
data[is.element(data[,2],data[index,2]) | is.element(data[,3],data[index,3]),1]
}
lapply(X = seq(1:nrow(groups)),FUN = workedWith)

Filtering only unique value from multiple column in R

I have data like this:
X <- data.frame(fac_1 = c("A", "B", "C", "X", "Y"), fac_2 = c("B", "X", "P", "Q", "C"), fac_3 = c("C", "P", "Q", "T", "U"))
fac_1 fac_2 fac_3
A B C
B X P
C P Q
X Q T
Y C U
I want only those alphabet which are common
(1) between fac_1 and fac_2 (like B,C,X) and
(2) all factors which are common among fac_1, fac_2 and fac_3 (like C only)
You can use intersect
intersect(intersect(X$fac_1, X$fac_2), X$fac_3)
#[1] "C"
intersect(X$fac_1, X$fac_2)
#[1] "B" "C" "X"
Alternatively, the function Reduce can be used as described by #docendo discimus at comments section.
Reduce(intersect, X)
#[1] "C"

Can R display how many changes were made to a variable like Stata does

When one is, e.g., replacing a variable in Stata, the Stata output will say that x real changes were made to the variable. This is very useful to know. Is there any similar functionality in R?
I think you could achieve the desired results by simply comparing newly created vectors and tabulating the results:
A <- c("A", "B", "C", "D")
B <- c("A", "C", "C", "E")
A == B
# OR
table(A == B)
In effect, you should be able to save your transformations as a new column/vector and then compare with the original object, summarising TRUE/FALSE values should provide you with the desired information on how many values were changed.
Full output
> A <- c("A", "B", "C", "D")
> B <- c("A", "C", "C", "E")
> A == B
[1] TRUE FALSE TRUE FALSE
> table(A == B)["TRUE"]
TRUE
2
> table(A == B)
FALSE TRUE
2 2

NAs in the dummies package

I am using R dummy.data.frame function in the dummies package to create dummy variables for the k levels of my factor. Unfortunately, my factor has NAs. When I use dummy.data.frame it creates k dummies with no NAs and a new dummy which flags with 1 the missing values.
However, I would like to still have the NAs in the k dummies and not a dummy for the missing values.
Is this possible with that function? Do you know any other functions that can help me?
I usually do this kind of things using the model.matrix(). Using that with the option na.action set to pass will retain the NAs in their correct places. This option does not seem to change the behavior of the function dummy(), so using model.matrix() might be your easiest bet. For example, for a single factor letters the following should do the trick:
options(na.action="na.pass")
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
model.matrix(~letters-1)
Or for several variables or columns of a data frame as well:
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
betters <- c( "a", "a", "c", "c", "c", "d", "d", "d", NA, "e", "e", "e" )
model.matrix(~letters+betters-1)
The important trick here really is to set the option na.action. After this dummy recoding, it is a good idea to return the option to its default value:
options(na.action="na.omit")

Resources