Can R display how many changes were made to a variable like Stata does - r

When one is, e.g., replacing a variable in Stata, the Stata output will say that x real changes were made to the variable. This is very useful to know. Is there any similar functionality in R?

I think you could achieve the desired results by simply comparing newly created vectors and tabulating the results:
A <- c("A", "B", "C", "D")
B <- c("A", "C", "C", "E")
A == B
# OR
table(A == B)
In effect, you should be able to save your transformations as a new column/vector and then compare with the original object, summarising TRUE/FALSE values should provide you with the desired information on how many values were changed.
Full output
> A <- c("A", "B", "C", "D")
> B <- c("A", "C", "C", "E")
> A == B
[1] TRUE FALSE TRUE FALSE
> table(A == B)["TRUE"]
TRUE
2
> table(A == B)
FALSE TRUE
2 2

Related

Can the "c" statement be used along with the "which" statement?

I am using the R programming language. I am interested in seeing whether the "c" statement can be used along with the "which" statement in R. For example, consider the following code (var1 and var2 are both "Factor" variables):
my_file
var1 var2
1 A AA
2 B CC
3 D CC
4 C AA
5 A BB
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
But this does not seem to be working.
I can run each of these conditions individually, e.g.
output <- my_file[which(my_file$var1 == "A" | my_file$var1 == "B" | my_file$var1 == "C"), ]
output1 <- output[which(output$var2 == "AA" | output$var2 == "CC" ), ]
But I would like to run them in a more "compact" form, e.g.:
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
Can someone please tell me what I am doing wrong?
Thanks
When you compare my_file$var1 == c("A", "B", "C"), the comparison will take place element-by-element, but because they are different lengths, the shorter will be repeated (with a warning because the repeating is incomplete.
c("A", "B", "D", "C", "A") == c("A", "B", "C", "A", "B") giving:
c(TRUE, TRUE, FALSE, FALSE, FALSE), then which will convert to c(1, 2).
The reason it works when you use one letter at a time is that the single element is repeated 5 times my_file$var1 == "A" leads to c("A", "B", "D", "C", "A") == c("A", "A", "A", "A", "A") and gives the result you expect.
#deschen is right, you should use %in%
output <- my_file[which(my_file$var1 %in% c("A", "B", "C") & !my_file$var2 %in% c("AA", "CC")), ]
As #deschen says in a comment, you should use %in% rather than ==. You can also (1) get rid of the which() (logical indexing works just as well here as indexing by position) and (2) use subset to avoid re-typing my_file.
output <- subset(my_file, var1 %in% c("A", "B", "C") &
!(var2 %in% c("AA", "CC")))
Alternatively, if you like the tidyverse, this would be:
library(dplyr)
output <- my_file %>% dplyr::filter(var1 %in% c("A", "B", "C"),
!(var2 %in% c("AA", "CC")))
(comma-separated conditions in filter() work the same as &).

Exclude common rows in tibbles [duplicate]

This question already has an answer here:
Using anti_join() from the dplyr on two tables from two different databases
(1 answer)
Closed 2 years ago.
I'm looking for a way to join two tibbles in a a way to leave rows only unique to the first first tibble or unique in both tibbles - simply those one that do not have any matched key.
Let's see example:
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
With common dplyr::join I am not able to get this:
A
1 d
2 e
Is there some way within dplyr to overcome it or in general in tidyverse to overcome it?
Use setdiff() function from dplyr library
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
C <- setdiff(A,B)
Just to add.
Setdiff(A,B) gives out those elements present in A but not in B.
dplyr::anti_join will keep only the rows that are unique to the tibble/data.frame of the first argument.
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
dplyr::anti_join(A, B, by = "A")
# A
# <chr>
# 1 d
# 2 e
A base R possibility (well except the tibble):
A[!A$A %in% B$A,]
returns
# A tibble: 2 x 1
A
<chr>
1 d
2 e

Generating distinct groups based on vector/column pairs in R

SEE UPDATE BELOW:
Given a data frame with two columns (x1, x2) representing pairs of objects, I would like to generate groups where all members of each group are paired with all other members in that group. Thus far, I have been able to generate groups by showing all items in x2 that are paired with each item in x1, but this leaves me with groups where a couple of members are only paired with one other group member. I'm having a hard time getting off the ground with this one... Thanks in advance for any help you may have. Please let me know if I should edit this post as I am new to Stack Overflow and new to R coding.
x1 <- c("A", "B", "B", "B", "C", "C", "D", "D", "D", "E", "E")
x2 <- c("A", "B", "C", "D", "B", "C", "B", "D", "E", "D", "E")
df <- data.frame(x1, x2)
I would like to go from this df, to an output that looks like df2.
group1 <- c("A")
group2 <- c("B", "C")
group3 <- c("B", "D")
group4 <- c("D", "E")
df2 <- data.frame(cbind.fill(group1, group2, group3, group4, fill = "NULL"))
UPDATE:
Given the following dataset....
x1 <- c("A", "B", "B", "B", "C", "C", "D", "D", "D", "E", "E", "B", "C", "F")
x2 <- c("A", "B", "C", "D", "B", "C", "B", "D", "E", "D", "E", "F", "F", "F")
df <- data.frame(x1, x2)
.... I would like to identify groups of x1/x2 where all objects within said group are connected to all other objects of that group.
This is what I have thus far (I'm sure this is riddled with best-practice errors, feel free to call them out. I'm eager to learn)...
n <- nrow(as.data.frame(unique(df$x1)))
RosterGuide <- as.data.frame(matrix(nrow = n , ncol = 1))
RosterGuide$V1 <- seq.int(nrow(RosterGuide))
RosterGuide$Object <- (unique(df$x1))
colnames(RosterGuide) <- c("V1","Object")
groups_frame <- matrix(, ncol= length(n), nrow = length(n))
for (loopItem in 1:nrow(RosterGuide)) {
object <- subset(RosterGuide$Object, RosterGuide$V1 == loopItem)
group <- as.data.frame(subset(df$x2, df$x1 == object))
groups_frame <- cbind.fill(group, groups_frame, fill = "NULL")
}
Groups <- as.data.frame(groups_frame)
Groups <- subset(Groups, select = - c(object))
colnames(Groups) <- RosterGuide$V1
This yields the data frame 'Groups'....
1 2 3 4 5 6
1 F D B B B A
2 NULL E D C C NULL
3 NULL NULL E F D NULL
4 NULL NULL NULL NULL F NULL
... which is exactly what I am looking for, except that if you look at the original df, objects F and D are never paired, rendering group 5 invalid. Also, objects B and E are never paired, rendering group 3 invalid. A valid output should look like this...
1 2 3 4 5
1 D B B B A
2 E D C C NULL
3 NULL NULL NULL F NULL
Question: is there some way that I can relate the groups listed above in the 'Groups' data frame to the original df to remove groups with invalid relationships? This really has me stumped.
For context: What I am really trying to do is group items based on pairwise connections derived from a network of nodes where not all nodes are connected.
Here is one way doing it in base R using apply and unique
df <- data.frame(x1, x2, stringsAsFactors = F)
df <- df[df$x1 != df$x2, ]
unique(t(apply(df, 1, sort)))
[,1] [,2]
3 "B" "C"
4 "B" "D"
9 "D" "E"
dplyr
df %>%
dplyr::filter(x1 != x2) %>%
dplyr::filter(!duplicated(paste(pmin(x1,x2), pmax(x1,x2), sep = "-")))
x1 x2
1 B C
2 B D
3 D E
data.table (there might be another better way)
library(data.table)
as.data.table(df)[, .SD[x1 != x2]][, .GRP, by = .(x1 = pmin(x1,x2), x2 = pmax(x1,x2))]
x1 x2 GRP
1: B C 1
2: B D 2
3: D E 3

For loop with factor data

I have two vectors of factor data with equal length. Just for examples sake:
observed=c("a", "b", "c", "a", "b", "c", "a")
predicted=c("a", "a", "b", "b", "b", "c", "c")
Ultimately, I am trying to generate a classification matrix showing the number of times each factor is correctly predicted. This would look like the following for the example:
name T F
a 1 2
b 1 1
c 1 1
Note that the tables() command doesn't work here because I have 11 different factors, and the output would be 11x11 instead of 11x2. My plan is to create three vectors, and combine them into a data frame.
First, a vector of the unique factor values in the existing vectors. This is simple enough,
names=unique(df$observed)
Next, a vector of values showing the number of correct predictions. This is where I am running into trouble. I can get the number of correct predictions for an individual factor like so:
correct.a=sum(predicted[which(observed == "a")] == "a")
But this is cumbersome to repeat time and time again, and then combine into a vector like
correct=c("correct.a", "correct.b", correct.c")
Is there a way to use a loop (or other strategy that you can think of) to improve this process?
Also note that the final vector I would create would be something like this:
incorrect.a=sum(observed == "a")-correct.a
t(sapply(split(predicted == observed, observed), table))
# FALSE TRUE
#a 2 1
#b 1 1
#c 1 1
I would suggest you use data.table for explicit clean way to define your results:
library(data.table)
observed=c("a", "b", "c", "a", "b", "c", "a")
predicted=c("a", "a", "b", "b", "b", "c", "c")
dt <- data.table(observed, predicted)
res <- dt[, .(
T = sum(observed == predicted),
F = sum(observed != predicted)),
observed
]
res
# observed T F
# 1: a 1 2
# 2: b 1 1
# 3: c 1 1

Get changing patterns with grep -- R

I want to grade several students exams using an answer key with grep. So for example, the student's answers were
A B B C E D D
and the key is
A D B C E CD ABD
I want to check to see if the student's answers are found in the corresponding position in the answer key (multiple letters indicate "or" not "and". So "C" or "D"). How would I got about that using grep?
Or we can use Map/mapply from base R
unname(mapply(grepl, answer, key))
#[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE
data
answer <- c("A", "B", "B", "C", "E", "D", "D")
key <- c("A", "D", "B", "C", "E", "CD", "ABD")
We can use the map2_lgl function from the purrr package with grepl. TRUE means the answer found to be matched with the key. FALSE means no match.
# Create example of answer and key
answer <- c("A", "B", "B", "C", "E", "D", "D")
key <- c("A", "D", "B", "C", "E", "CD", "ABD")
# Load packages
library(purrr)
# Check if answer is in key
map2_lgl(answer, key, ~grepl(.x, .y))
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE

Resources