Method to extract all existing combination in two columns [duplicate] - r

This question already has answers here:
R equivalent of SELECT DISTINCT on two or more fields/variables
(4 answers)
Closed 2 years ago.
I prepared a simple code for my question due to the original data volume is huge.
df <- data.frame(X=c(0,0,1,1,1,1),Y=c(0,0,0,0,1,1),Z=c(1.5,2,5,0.7,3.5,4.2))
I'm trying to extract all actually existing combinations in columns X and Y. So the expected result will be (0,0),(1,0),(1,1).
But, If I utilize expand.grid, it will return all available combinations mathematically with elements 0 & 1. So (0,1) will be included in the result
So my question is how to extract only actually existing combinations in two different columns?
Any opinion is welcome!

We can subset the relevant columns and then use unique over it.
unique(df[c('X', 'Y')])
# X Y
#1 0 0
#3 1 0
#5 1 1
Or in dplyr, use distinct
library(dplyr)
df %>% distinct(X, Y)

Related

Combine vectors in R using vectorization where values only sum if they are not equal [duplicate]

This question already has answers here:
How can I take pairwise parallel maximum or minimum between two vectors?
(3 answers)
Closed 2 years ago.
I have two vectors that I need to add together, but only in instances where their corresponding values are not equal. Ex:
aa <- c(1,0,0,1,0)
bb <- c(0,1,1,1,0)
I want to generate a combined vector like so:
aa <- c(1,1,1,1,0)
How might I go about doing this, particularly with vectorization?
Looks like you are trying to implement the OR gate .You can use pmax :
pmax(aa, bb)
#[1] 1 1 1 1 0

How to add a new column to calculate mean for each group using dplyr in R [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a table with 2 columns.
Type: 1 or 2 or 3 or 4
Data: corresponding data (there are multiple data for each type)
Now I want to create a third column that contains means of data each type i.e., all the rows with type 1 have the same mean value. I think I should do it with mutate function but not sure how to proceed.
data %>% mutate(meanData = ifelse(...))
Can somebody help?
Thank you in advance.
We can do a group by operation
library(dplyr)
data <- data %>%
group_by(Type) %>%
mutate(meanData = mean(Data))

How to write for loop that extracts value of variable based on similarity of another variable of two datasets (in R)? [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 3 years ago.
I have a dataset which only contains one variable variable ("text") and a second dataset which is made up of a subset of this variable in dataset one and a new variable which is called "code".
dat1<-tibble(text=c("book","chair","banana","cherry"))
dat2<-tibble(text=c("banana","cherry"),code=c(1,NA))
What I would like to get at is a for loop that yields the value of "code" for every row (i) where dat1$text is the same as dat2$text and 0 otherwise. The ultimate goal is a vector c(0,0,1,NA) that I could then add back to the first dataset.
However, I don't know how to select the row corresponding to i in the for loop that would get me the value of "code" that I need to arrive at this vector. Also, even if I knew, how to extract these values, I'm not sure this whole thing would work, let alone maintain the order that I need (c(0,0,1,NA)).
for (i in dat2$text) {
ifelse(i==dat1$text, print(dat[...,2]), print(0))
}
Does anyone know how to fix that?
We can match text column of both the dataframe, replace the NA match as 0 or corresponding code value.
inds <- match(dat1$text, dat2$text)
dat1$out <- ifelse(is.na(inds), 0, dat2$code[inds])
dat1
# A tibble: 4 x 2
# text out
# <chr> <dbl>
#1 book 0
#2 chair 0
#3 banana 1
#4 cherry NA
We can do a join
library(dplyr)
dat2 %>%
mutate(code = replace_na(code, 0)) %>%
right_join(dat1)

R count number of variables with value ="mq" per row [duplicate]

This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Closed 4 years ago.
I have a data frame with 70variables, I want to create a new variable which counts the number of occurrences where the 70 variables take the value "mq" on a per row basis.
I am looking for something like this:
[ID] [Var1] [Var2] [Count_mq]
1. mq mq 2
2. 1 mq 1
3. 1 7 0
I have found this solution:
count_row_if("mq",DT)
But it gives me a vector with those values for the whole data frame and it is quite slow to compute.
I would like to find a solution using the function apply() but I don't know how to achieve this.
Best.
You can use the 'apply' function to count a particular value in your existing dataframe 'df',
df$count.MQ <- apply(df, 1, function(x) length(which(x=="mq")))
Here the second argument is 1 since you want to count for each row. You can read more about it from https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/apply
I assume the name of dataset is DT. I'm a bit confused what you really want to get but this is how I understand. Data frame consists of 70 columns and a number of rows that some of them have observations 'mq'.
If I get it right, please see the code below.
apply(DT, function(x) length(filter(DT,value=='mq')), MARGIN=1)

How to get a list of all the different values within a variable [duplicate]

This question already has answers here:
Extracting unique rows from a data table in R [duplicate]
(2 answers)
Closed 4 years ago.
I've always wondered if there's a command in R that gives you the entire domain of values within a variable.
For example, let's say I have the following data.table:
dt
Household Number_of_children
1 0
2 3
3 3
Is there a command along the lines of summary() or str() that would return the list 0, 3?
I believe summary and str only do that when your variable is a character string. I don't know how to do this when your variable is an integer, numeric, etc.
You can use unique() function, with column/vector as input.
unique(dt[,'Number_of_children'])

Resources