Can the "c" statement be used along with the "which" statement? - r

I am using the R programming language. I am interested in seeing whether the "c" statement can be used along with the "which" statement in R. For example, consider the following code (var1 and var2 are both "Factor" variables):
my_file
var1 var2
1 A AA
2 B CC
3 D CC
4 C AA
5 A BB
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
But this does not seem to be working.
I can run each of these conditions individually, e.g.
output <- my_file[which(my_file$var1 == "A" | my_file$var1 == "B" | my_file$var1 == "C"), ]
output1 <- output[which(output$var2 == "AA" | output$var2 == "CC" ), ]
But I would like to run them in a more "compact" form, e.g.:
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
Can someone please tell me what I am doing wrong?
Thanks

When you compare my_file$var1 == c("A", "B", "C"), the comparison will take place element-by-element, but because they are different lengths, the shorter will be repeated (with a warning because the repeating is incomplete.
c("A", "B", "D", "C", "A") == c("A", "B", "C", "A", "B") giving:
c(TRUE, TRUE, FALSE, FALSE, FALSE), then which will convert to c(1, 2).
The reason it works when you use one letter at a time is that the single element is repeated 5 times my_file$var1 == "A" leads to c("A", "B", "D", "C", "A") == c("A", "A", "A", "A", "A") and gives the result you expect.
#deschen is right, you should use %in%
output <- my_file[which(my_file$var1 %in% c("A", "B", "C") & !my_file$var2 %in% c("AA", "CC")), ]

As #deschen says in a comment, you should use %in% rather than ==. You can also (1) get rid of the which() (logical indexing works just as well here as indexing by position) and (2) use subset to avoid re-typing my_file.
output <- subset(my_file, var1 %in% c("A", "B", "C") &
!(var2 %in% c("AA", "CC")))
Alternatively, if you like the tidyverse, this would be:
library(dplyr)
output <- my_file %>% dplyr::filter(var1 %in% c("A", "B", "C"),
!(var2 %in% c("AA", "CC")))
(comma-separated conditions in filter() work the same as &).

Related

Finding in which vector does the element belong to

suppose I have 3 vectors:
a = c("A", "B", "C")
b = c("D", "E", "F")
c = c("G", "H", "I")
then I have an element:
element = "E"
I want to find which list does my element belongs to. In this case, list b.
It will be appreciated if the solution to this problem is more general because my real data set have more than a hundred lists.
element = "E"
names(our_lists)[sapply(our_lists, `%in%`, x = element)]
# [1] "b"
Data
our_lists <- list(
a = c("A", "B", "C"),
b = c("D", "E", "F"),
c = c("G", "H", "I")
)
Using grep.
element <- "E"
l <- mget(c("a", "b", "c"))
names(l)[grep(element, l)]
# [1] "b"
If you keep the data in individual objects, you need to check for the element in each one individually. Get them in a list.
list_data <- mget(c('a', 'b', 'c'))
names(Filter(any, lapply(list_data, `==`, element)))
#[1] "b"
If all your vectors have the same length then a vectorised idea can be,
c('a', 'b', 'c')[ceiling(which(c(a, b, c) == 'E') / length(a))]
#[1] "b"
You can use dplyr::lst that creates named list from variable names. Then purrr::keep to keep only the vectors that contain your element.
require(tidyverse)
lst(a, b, c) %>%
keep(~ element %in% .x) %>%
names()
output:
[1] "b"

Using fct_relevel over a list of variables using map_at

I have a bunch of factor variables that have the same levels, and I want them all reordered similarly using fct_relevel from the forcats package. Many of the variable names start with the same characters ("Q11A" to "Q11X", "Q12A" to "Q12X", "Q13A" to "Q13X", etc.). I wanted to use the starts_with function from dplyr to shorten the task. The following error didn't give me an error, but it didn't do anything either. Is there anything I'm doing wrong?
library(dplyr)
library(purrr)
library(forcats)
library(tibble)
#Setting up dataframe
f1 <- factor(c("a", "b", "c", "d"))
f2 <- factor(c("a", "b", "c", "d"))
f3 <- factor(c("a", "b", "c", "d"))
f4 <- factor(c("a", "b", "c", "d"))
f5 <- factor(c("a", "b", "c", "d"))
df <- tibble(f1, f2, f3, f4, f5)
levels(df$f1)
[1] "a" "b" "c" "d"
#Attempting to move level "c" up before "a" and "b".
df <- map_at(df, starts_with("f"), fct_relevel, "c")
levels(df$f1)
[1] "a" "b" "c" "d" #Didn't work
#If I just re-level for one variable:
fct_relevel(df$f1, "c")
[1] a b c d
Levels: c a b d
#That worked.
I think you're looking for mutate_at:
df <- mutate_at(df, starts_with("f"), fct_relevel, ... = "c")
df$f1
[1] a b c d
Levels: c a b d

Counting on dataframe in R

I have a data frame like
A B
A E
B E
B C
..
I want to convert it to two dataframes
One is counting how many times A, B, C.. appear in the first column and other one is counting how many times A, B, B .. appear in the second column.
A 5
B 4
...
Could you give me some suggestions?
Thanks
Try plyr library:
library(plyr)
myDataFrame <- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
count(myDataFrame[,1]) ##prints counts of first column
count(myDataFrame[,2]) ##prints counts of second column
We can use lapply to loop over the columns, get the frequency with table, convert to data.frame and if needed as separate datasets, use list2env (not recommended)
list2env(setNames(lapply(df1, function(x)
as.data.frame(table(x))), paste0("df", 1:2)), envir=.GlobalEnv)
Alternatively, You could also use the dplyr library-
library("dplyr")
df<- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
names(df)<-c("V1","V2")
df <- tbl_df(df)
df %>% group_by(V1) %>% summarise(c1 = n()) ## for column 1
df %>% group_by(V2) %>% summarise(c1 = n()) ## for column 2

Can R display how many changes were made to a variable like Stata does

When one is, e.g., replacing a variable in Stata, the Stata output will say that x real changes were made to the variable. This is very useful to know. Is there any similar functionality in R?
I think you could achieve the desired results by simply comparing newly created vectors and tabulating the results:
A <- c("A", "B", "C", "D")
B <- c("A", "C", "C", "E")
A == B
# OR
table(A == B)
In effect, you should be able to save your transformations as a new column/vector and then compare with the original object, summarising TRUE/FALSE values should provide you with the desired information on how many values were changed.
Full output
> A <- c("A", "B", "C", "D")
> B <- c("A", "C", "C", "E")
> A == B
[1] TRUE FALSE TRUE FALSE
> table(A == B)["TRUE"]
TRUE
2
> table(A == B)
FALSE TRUE
2 2

Subset a data frame using OR when the column contains a factor

I would like to make a subset of a data frame in R that is based on one OR another value in a column of factors but it seems I cannot use | with factor values.
Example:
# fake data
x <- sample(1:100, 9)
nm <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
fake <- cbind(as.data.frame(nm), as.data.frame(x))
# subset fake to only rows with name equal to a or b
fake.trunk <- fake[fake$nm == "a" | "b", ]
produces the error:
Error in fake$nm == "a" | "b" :
operations are possible only for numeric, logical or complex types
How can I accomplish this?
Obviously my actual data frame has more than 3 values in the factor column so just using != "c" won't work.
You need fake.trunk <- fake[fake$nm == "a" | fake$nm == "b", ]. A more concise way of writing that (especially with more than two conditions) is:
fake[ fake$nm %in% c("a","b"), ]
Another approach would be to use subset() and write
fake.trunk = subset(fake, nm %in% c('a', 'b'))

Resources