Remove rows that exist in another dataframe using two columns - r

Using this:
list_one[!list_one$letters %in% list_two$letters,]
it is possible to remove rows from one dataframe that exist in another.
How can we transform this command in other to check two columns from both data frames.
example
list_one[![list_one$letters, list_one$id] %in% [list_two$letters, list_two$letters],]

Related

Performing tidyverse functions in multiple data frames with unique names using a loop in R

In R I have several dataframes with unique names. These names are also available in a column named as names. I need to run a loop like this
for(i in 1:length(names)) {
}
in which I will select the dataframes and do following things
rename the first column to series
remove NAs from series
filter(!is.na(series))
use the following command to name the similar names unique
group_by(series) %>%
mutate(series1 = if(n( ) > 1) {paste0(series, row_number( ))}
else {paste0(series)}) %>% ungroup(series)
then use gather command to convert columns into rows and some rows into columns
5 and add a column in the dataframe which has repeated values of unique dataframe name with header label.
The problem is that my try like following in a loop is not working
colnames(names[i])[1] <- "series"

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

R: multiple merge with big data frames

I have two big dataframes: DBa and DBb. All colums of DBb are in DBa.
I want to merge these two dataframes by all DBb's colums.
I'm trying:
new <- merge(DBa, DBb, by=colnames(DBb))
but it gives me the error:
Elements listed in `by` must be valid column names in x and y
How can I do it?
I don't think you are looking to merge the data frames, you should put them on top of each other with rbind. With merge you will put two data frames next to eachother, and you only need one common column (the key) which should be unique otherwise the results will be a mess.
So use row bind (rbind). The columns must be in the same order and one data frame must not have more columns than the other.
new_data <- rbind(data1, data2)

Add a new row to R dataframe - but if does not exist already?

Is there any possiblity to add a new row that does not exist already in a dataframe? I create a big dataframe (100k records) with different combinations of the variables (randomly selected) and I want to add them to the existing dataframe with a condition: all they must be different (at least one variable must be different).
You can use the following if, for example, the dataframe, df1, may or may not exist and you want want append dataframe, dr2 to it:
rbind(if(exists("df1")) df1, df2)
But this can often be avoided by simply creating an empty data frame first. For example if your dataframe contains a single columns x which is of type character, than you could probably do:
df1 <- data.frame(x = character())
rbind(df1, df2)

How to use the names in a list to access a dataframe column

I have two lists and a dataframe. The columns in the dataframe have the same names as the entries in the list. The dataframe has other columns as well, other than the ones specified in the lists
category.list <- c('Reserve_Book','choicepriv_and_points','Latency_freeze_load','signin','gift_card','mystery_gift','credit_card','call_support','account')
crosstab.list <- c('browser','OS','Device','comment_cat','comment_focus','recommend')
Now, how do I iterate through the elements in the list and use them to access the dataframe columns?
Below is the code, I am trying but I am getting errors while trying to access the dataframe column via the iterator variable.
for (i in category.list){
for (j in crosstab.list){
ftable(dataframe[j]~dataframe[i])
}
}
Specifically to your question, your dataframe references need to specify both which columns are desired and which rows.
ftable(dataframe[j]~dataframe[i])
needs to be
ftable(dataframe[,j]~dataframe[,i])
Note the addition of commas

Resources