Combining Two Data Frames Horizontally in R - r

I would like to combine two data frames horizontally in R.
These are my two data frames:
dataframe 1:
veg loc quantity
carrot sak three
pepper lon two
tomato apw five
dataframe 2:
seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli
I want the outcome to be one data frame that looks like this:
veg quantity
carrot three
pepper two
tomato five
eggplant eleven
potato six
zucchini four
broccoli two

The question says "horizontally" but from the sample output it seems that what you meant was "vertically".
Now, assuming the input shown reproducibly in the Note at the end, rbind them like this. No packages are used and no objects are overwritten.
sel <- c("veg", "quantity")
rbind( df1[sel], df2[sel] )
If you like you could replace the first line of code with the following which picks out the common columns giving the same result for sel.
sel <- intersect(names(df1), names(df2))
Note
Lines1 <- "veg loc quantity
carrot sak three
pepper lon two
tomato apw five"
Lines2 <- "seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli"
df1 <- read.table(text = Lines1, header = TRUE, strip.white = TRUE)
df2 <- read.table(text = Lines2, header = TRUE, strip.white = TRUE)

You can do it like this:
library (tidyverse)
df1 <- df1%>%select(veg, quantity)
df2 <- df2%>%select(veg, quantity)
df3 <- rbind(df1, df2)

Related

Convert list of data.frames to a single data.frame maintaining structure [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 2 years ago.
I'm trying to re-format a list containing multiple dataframes into one data frame. I've read around and can't find the specific synthax I need to achieve this.
I have a list:
list1 = list(df1 = data.frame(bread = c("sourdough","baguette","boule","multigrain"), cheese = c("parmigiano","cheddar","mozzarella","stilton")),
df2 = data.frame(bread = c("toast","brioche","focaccia","whole wheat"), cheese = c("gorgonzola","camembert","gouda","feta")))
and I require the dataframe to be stacked vertically with an additional column representing the list element name from which they came, as in the following example:
df = data.frame(breads = c("sourdough","baguette","boule","multigrain","toast","brioche","focaccia","whole wheat"),
cheese = c("parmigiano","cheddar","mozzarella","stilton","gorgonzola","camembert","gouda","feta"),
factor = rep(c("df1","df2"),each = 4))
Very simple, but can't get my head around it.
you can use
do.call(rbind,list1)
bread cheese
df1.1 sourdough parmigiano
df1.2 baguette cheddar
df1.3 boule mozzarella
df1.4 multigrain stilton
df2.1 toast gorgonzola
df2.2 brioche camembert
df2.3 focaccia gouda
df2.4 whole wheat feta
edit:
if you want explicit "From" col
new_df <- do.call(rbind,list1)
new_df$From <- sub("\\..*$","",rownames(new_df))
bread cheese From
df1.1 sourdough parmigiano df1
df1.2 baguette cheddar df1
df1.3 boule mozzarella df1
df1.4 multigrain stilton df1
df2.1 toast gorgonzola df2
df2.2 brioche camembert df2
df2.3 focaccia gouda df2
df2.4 whole wheat feta df2
You can try this:
#Bind
DF <- do.call(rbind,list1)
#Create factor
DF$factor <- rownames(DF)
rownames(DF)<-NULL
DF$factor <- gsub("\\..*","",DF$factor)
DF
bread cheese factor
1 sourdough parmigiano df1
2 baguette cheddar df1
3 boule mozzarella df1
4 multigrain stilton df1
5 toast gorgonzola df2
6 brioche camembert df2
7 focaccia gouda df2
8 whole wheat feta df2

Mutate Column with Match Against List Variable Values

I would like to create a new column in my dataframe that corresponds to values in list variables.
My dataframe includes many rows with a 'product names' column. My intention is to create a new column that allows me to sort products into categories.
Sample code -
library(dplyr)
products <- c('Apple', 'orange', 'pear',
'carrot', 'cabbage',
'strawberry', 'blueberry')
df <- data.frame(products)
ls <- list(Fruit = c('Apple', 'orange', 'pear'),
Veg = c('carrot', 'cabbage'),
Berry = c('strawberry', 'blueberry'))
test <- df %>%
mutate(category = products %in% ls)
I hope that illustrates what I'm trying to do. By creating the list, I've basically got a register of products and their categories which could change over time.
Is there a solution to this using a list, or am I over-complicating it and not seeing the wood for the trees?
edit - It might help to let you know that I'm working with 100s of products.
stack the list and then join with the data frame:
df %>%
left_join(stack(ls), by = c('products' = 'values')) %>%
rename(category = ind)
# products category
#1 Apple Fruit
#2 orange Fruit
#3 pear Fruit
#4 carrot Veg
#5 cabbage Veg
#6 strawberry Berry
#7 blueberry Berry

Count elements in dataframe column, then create separate columns in R

I am struggling for a few days with a solution myself. Hope you can help.
I checked the following already:
Counting the number of elements with the values of x in a vector
Split strings in a matrix column and count the single elements in a new vector
http://tidyr.tidyverse.org/reference/separate.html
Count of Comma separate values in r
I have a dataframe as follows:
df<-list(column=c("apple juice,guava-peach juice,melon apple juice","orange juice,pineapple strawberry lemon juice"))
df<-data.frame(df)
I want to separate each element separated by "," in its own column. Number of columns must be based on the maximum number of elements in each row in column
column1 column2 column3
apple juice guava-peach juice melon apple juice
orange juice pineapple strawberry lemon juice NA
I tried using
library(tidyverse)
library(stringr)
#want to calculate number of columns needed and the sequence
x<-str_count(df$column)
results<-df%>%separate(column,x,",")
Unfortunately I am not getting what I wish to.
Thank you for your help.
Do you mean this?
library(splitstackshape)
library(dplyr)
df %>%
cSplit("column", ",")
Output is:
column_1 column_2 column_3
1: apple juice guava-peach juice melon apple juice
2: orange juice pineapple strawberry lemon juice <NA>
Sample data:
df <- structure(list(column = structure(1:2, .Label = c("apple juice,guava-peach juice,melon apple juice",
"orange juice,pineapple strawberry lemon juice"), class = "factor")), .Names = "column", row.names = c(NA,
-2L), class = "data.frame")

Group words (from defined list) into themes in R

I am new to Stackoverflow and trying to learn R.
I want to find a set of defined words in a text. Return the count of these words in a table format with the associated theme I have defined.
Here is my attempt:
text <- c("Green fruits are such as apples, green mangoes and avocados are good for high blood pressure. Vegetables range from greens like lettuce, spinach, Swiss chard, and mustard greens are great for heart disease. When researchers combined findings with several other long-term studies and looked at coronary heart disease and stroke separately, they found a similar protective effect for both. Green mangoes are the best.")
library(qdap)
**#Own Defined Lists**
fruit <- c("apples", "green mangoes", "avocados")
veg <- c("lettuce", "spinach", "Swiss chard", "mustard greens")
**#Splitting in Sentences**
stext <- strsplit(text, split="\\.")[[1]]
**#Obtain and Count Occurences**
library(plyr)
fruitres <- laply(fruit, function(x) grep(x, stext))
vegres <- laply(veg, function(x) grep(x, stext))
**#Quick check, and not returning 2 results for** "green mangoes"
grep("green mangoes", stext)
**#Trying with stringr package**
tag_ex <- paste0('(', paste(fruit, collapse = '|'), ')')
tag_ex
library(dplyr)
library(stringr)
themes = sapply(str_extract_all(stext, tag_ex), function(x) paste(x, collapse=','))[[1]]
themes
#Create data table
library(data.table)
data.table(fruit,fruitres)
Using the respective qdap and stringr packages I am unable to obtain a solution I desire.
Desired solution for fruits and veg combined in a table
apples fruit 1
green mangoes fruit 2
avocados fruit 1
lettuce veg 1
spinach veg 1
Swiss chard veg 1
mustard greens veg 1
Any help will be appreciated. Thank you
I tried to generalize for N number of vectors
tidyverse and stringr solution
library(tidyverse)
library(stringr)
Create a data.frame of your vectors
data <- c("fruit","veg") # vector names
L <- map(data, ~get(.x))
names(L) <- data
long <- map_df(1:length(L), ~data.frame(category=rep(names(L)[.x]), type=L[[.x]]))
# You may receive warnings about coercing to characters
# category type
# 1 fruit apples
# 2 fruit green mangoes
# 3 fruit avocados
# etc
To count instances of each
long %>%
mutate(count=str_count(tolower(text), tolower(type)))
Output
category type count
1 fruit apples 1
2 fruit green mangoes 2
3 fruit avocados 1
4 veg lettuce 1
# etc
Extra stuff
We can add another vector easily
health <- c("blood", "heart")
data <- c("fruit","veg", "health")
# code as above
Extra output (tail)
6 veg Swiss chard 1
7 veg mustard greens 1
8 health blood 1
9 health heart 2

Ordering data frame in R

I have the following data frame structure:
Animal Food
1 cat fish, milk, shrimp
2 dog steak, poo
3 fish seaweed, shrimp, krill, insects
I would like to reorganize it so that the rows are in descending order of number of factors in the "Food" column:
Animal Food
1 fish seaweed, shrimp, krill, insects
2 cat fish, milk, shrimp
3 dog steak, poo
Is there an R function that can help me with that?
Thanks
You can use count.fields to figure out how many items there are in each "food" row and order by that.
count.fields(textConnection(mydf$Food), ",")
# [1] 3 2 4
Assuming your data.frame is called "mydf":
mydf[order(count.fields(textConnection(mydf$Food), ","), decreasing=TRUE),]
# Animal Food
# 3 fish seaweed, shrimp, krill, insects
# 1 cat fish, milk, shrimp
# 2 dog steak, poo
make a new variable and sort by that, edit: thanks to Ananda and alexis
df$nFood<-length(unlist(strsplit(df$Food, ",", fixed=T)))
df$nFood<-sapply(strsplit(df$Food, ","), length)
You can order the frame according to the results of your counting function:
animals = data.frame( rbind(c("cat","fish, milk, shrimp"),
c("dog","steak, poo"),
c("fish","seaweed, shrimp, krill, insects")))
colnames(animals) = c("Animal","Food")
animals[order(sapply(animals$Food, function(x) { length(strsplit(as.character(x),split=",")[[1]]) })), ]
I put in the as.character because it defaults to a factor, you probably don't need it (quicker) alternatively you can use stringsAsFactors=FALSE when creating the data frame.

Resources