setdiff two single column data frames [duplicate] - r

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 7 years ago.
I'm having a problem with a very simple issue and I don't know how to sort it out. Here's the deal. I have two one column data frames
a <- data.frame(C=c("c1","c2","c3","c4","c5","c6","c7","c8"))
b <- data.frame(C=c("c1","c4","c5","c8"))
I would like to get one column dataframe with the entries that do NOT appear in b but they are in a. ie. a dataframe with "c2","c3","c6","c7".
I tried
c <- setdiff(a,b)
but I got the a dataframe and also with
c <- merge(a,b,all.x=TRUE)
I don't get what I want it. so do you know where I am wrong?

We can use anti_join
library(dplyr)
anti_join(a,b)
Or
data.frame(C= setdiff(a[,1], b[,1]))

Related

Select only numeric variables of a data frame in R [duplicate]

This question already has answers here:
Why does apply convert logicals in data frames to strings of 5 characters?
(2 answers)
Selecting only numeric columns from a data frame
(12 answers)
Closed 2 years ago.
I know that the question is very easy, but I have a more specific one:
I have a data frame, with 50 variables (numeric and non-numeric) and 5000 observations.
Now what I want to do is create another data frame containing only the numerica variables of the original one.
On this website I found the solution of my problem, that is:
numeric_variables<-unlist(lapply(original_data,is.numeric))
X<-original_data[numeric_variables]
But I was wondering: why if I try like this, it does not work instead? what's wrong?
numeric_variables2<-apply(original_data,2,is.numeric)
x<-original_data[numeric_variables2]
try this :
names_num <- names(which(sapply(df, is.numeric)))
df_num <- df[, names_num]

Difference between two lists to create a dataset [duplicate]

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 5 years ago.
I have a dataset, like this mushrooms <- read.csv("mushrooms.csv") and now I already have a mushrooms.training_set which is 1/3 of the whole dataset. For both variables, typeof() returns list.
Now, I want to select the rows in the original dataset mushrooms, that are not in the mushrooms.training_set. How would I do this? I have tried the following:
mushrooms[c(!mushrooms.training_set),] but this returns something in the order of 64K rows.
mushrooms[!mushrooms.training_set,]
mushrooms[!duplicated(mushrooms.training_set)]
Who helps me out?
From where you are in the question, you can use dplyr::setdiff:
library(dplyr)
mushroooms.test = setdiff(mushrooms, mushrooms.training_set)
But most of the time it's easier to create the test set using at the same time as the training set. Lots of examples here at How to split data into training and test sets?

Extra Observations When Merging Datasets [duplicate]

This question already has an answer here:
Why does merge result in more rows than original data?
(1 answer)
Closed 3 years ago.
I am trying to merge two datasets (Datasets A and B) but when I merge Dataset A (407 Obs) with Dataset B (1462 Obs) I merged them by:
C <- merge(A, B, by=ID, all.x=TRUE)
It creates 416 observations in Dataset C.
Is there a reason why?
See Why does the result from merge have more rows than original file?.
Looks like there were multiple matches in the ID column. Since you didn't specify what your goal is, I recommend going through the full documentation:
https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/merge

trying to isolate two columns in a data frame in r [duplicate]

This question already has answers here:
Extracting specific columns from a data frame
(10 answers)
Closed 6 years ago.
Does anybody know of a way to create a new data frame that contains the information of specific columns from a master data frame that has multiple columns? I have a master dataframe and I'm trying to run various tests (regression, ANOVA...etc.,) on specific columns in the data frame. Any suggestions would be greatly appreciated.
if you want to choose columns 3,12 and 15 from the old DF:
newDF <- oldDF[,c(3,12,15)]
if you want to remove columns 3,12 and 15 from the old DF:
newDF <- oldDF[,-c(3,12,15)]

Reogranizing the data frame [duplicate]

This question already has answers here:
Moving columns within a data.frame() without retyping
(17 answers)
Closed 9 years ago.
I'd like to reorganize my data frame. I just wanted to move the last column into first place and the rest leave in the same order. I used function subset to do it. It works but it would be painful if I have like 100 columns or so.
Is there any easier way to do it ?
tbl_comp <- subset(tbl_comp, select=c("Description","Meve_mean","Mmor_mean", "Mtot_mean", "tot_meanMe", "tot_meanMm", "tot_sdMe", "tot_sdMm", "Wteve_mean", "Wtmor_mean", "Wttot_mean", "tot_meanwte", "tot_meanwtm", "tot_sdwte", "tot_sdwtm"))
Try this
tbl_comp <- subset(tbl_comp, select=c(Description , Meve_mean:tot_sdwtm))
tbl_comp <- cbind(tbl_comp[ncol(tbl_comp)], tbl_comp[-ncol(tbl_comp)])
will do the trick.

Resources