Ordering data frame in R - r

I have the following data frame structure:
Animal Food
1 cat fish, milk, shrimp
2 dog steak, poo
3 fish seaweed, shrimp, krill, insects
I would like to reorganize it so that the rows are in descending order of number of factors in the "Food" column:
Animal Food
1 fish seaweed, shrimp, krill, insects
2 cat fish, milk, shrimp
3 dog steak, poo
Is there an R function that can help me with that?
Thanks

You can use count.fields to figure out how many items there are in each "food" row and order by that.
count.fields(textConnection(mydf$Food), ",")
# [1] 3 2 4
Assuming your data.frame is called "mydf":
mydf[order(count.fields(textConnection(mydf$Food), ","), decreasing=TRUE),]
# Animal Food
# 3 fish seaweed, shrimp, krill, insects
# 1 cat fish, milk, shrimp
# 2 dog steak, poo

make a new variable and sort by that, edit: thanks to Ananda and alexis
df$nFood<-length(unlist(strsplit(df$Food, ",", fixed=T)))
df$nFood<-sapply(strsplit(df$Food, ","), length)

You can order the frame according to the results of your counting function:
animals = data.frame( rbind(c("cat","fish, milk, shrimp"),
c("dog","steak, poo"),
c("fish","seaweed, shrimp, krill, insects")))
colnames(animals) = c("Animal","Food")
animals[order(sapply(animals$Food, function(x) { length(strsplit(as.character(x),split=",")[[1]]) })), ]
I put in the as.character because it defaults to a factor, you probably don't need it (quicker) alternatively you can use stringsAsFactors=FALSE when creating the data frame.

Related

How can I rename multiple string in same column with another name in R

The following names are in a column. I want to retain just five distinct names, while replace the rest with others. how do I go about that?
df <- data.frame(names = c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm',
'Image Comics',NA,'Icon Comics',
'SyFy','Hanna-Barbera','George Lucas','Team Epic TV','South Park',
'HarperCollins','ABC Studios','Universal Studios','Star Trek','IDW Publishing',
'Shueisha','Sony Pictures','J. K. Rowling','Titan Books','Rebellion','Microsoft',
'J. R. R. Tolkien'))
If I am understanding you correctly, use %in% and ifelse. Here, I chose the first five names as an example. I also created it in a new column, but you could just overwrite the column as well or create a vector:
df <- data.frame(names = c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm',
'Image Comics',NA,'Icon Comics',
'SyFy','Hanna-Barbera','George Lucas','Team Epic TV','South Park',
'HarperCollins','ABC Studios','Universal Studios','Star Trek','IDW Publishing',
'Shueisha','Sony Pictures','J. K. Rowling','Titan Books','Rebellion','Microsoft',
'J. R. R. Tolkien'))
fivenamez <- c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm')
df$names_transformed <- ifelse(df$names %in% fivenamez, df$names, "Other")
# names names_transformed
# 1 Marvel Comics Marvel Comics
# 2 Dark Horse Comics Dark Horse Comics
# 3 DC Comics DC Comics
# 4 NBC - Heroes NBC - Heroes
# 5 Wildstorm Wildstorm
# 6 Image Comics Other
# 7 <NA> Other
# 8 Icon Comics Other
# 9 SyFy Other
If you want to keep NA values as NA, just use df$names_transformed <- ifelse(df$names %in% fivenamez | is.na(df$names), df$names, "Other")
You can also use something like case when. The following code will keep marvel, dark horse, dc comics, JK Rowling and George Lucas the same and change all others to "Other". It functionally the same as u/jpsmith, but (in my humble opinion) offers a little more flexibility because you can change multiple things a bit more easily or make different comics have the same name should you choose to do so.
df = df %>%
mutate(new_names = case_when(names == 'Marvel Comics' ~ 'Marvel Comics',
names == 'Dark Horse Comics' ~ 'Dark Horse Comics',
names == 'DC Comics' ~ 'DC Comics',
names == 'George Lucas' ~ 'George Lucas',
names == 'J. K. Rowling' ~ 'J. K. Rowling',
TRUE ~ "Other"))

match products in a list in R

I have to classify a list of products like these:
product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg) cow','chicken breast','noodles','salad','chicken salad with egg'))
Based on the words included in each element of this vector:
product_to_match<-c('cow meat','deer meat','cow milk','chicken breast','chicken egg salad','anana')
I would have to match all the words of each product product_to_match, into each element of the dataframe.
I am not sure what is the best way to do this, in order to classify each product into a new column, in order to have something like this:
product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg)
cow','chicken breast','noodles','salad','chicken salad with egg'),class=c(NA,'cow meat','chicken
breast',NA,NA,'chicken egg salad'))
Notice that 'anana' did not match with 'banana', eventhough the characers are included in the string but not the word. I am not sure how to do this.
Thank you.
Perhaps this could help
q <- outer(
strsplit(product_to_match, "\\s+"),
strsplit(product_list$product, "\\s+"),
FUN = Vectorize(function(x, y) all(x %in% y))
)
product_list$class <- product_to_match[replace(colSums(q * row(q)), colSums(q) == 0, NA)]
such that
> product_list
product class
1 banana from ecuador 1 unit <NA>
2 argentinian meat (1 kg) cow cow meat
3 chicken breast chicken breast
4 noodles <NA>
5 salad <NA>
6 chicken salad with egg chicken egg salad
Using stringdist could get some matches
library(fuzzyjoin)
stringdist_left_join(product_list, tibble(product = product_to_match),
method = 'soundex')

Convert list of data.frames to a single data.frame maintaining structure [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 2 years ago.
I'm trying to re-format a list containing multiple dataframes into one data frame. I've read around and can't find the specific synthax I need to achieve this.
I have a list:
list1 = list(df1 = data.frame(bread = c("sourdough","baguette","boule","multigrain"), cheese = c("parmigiano","cheddar","mozzarella","stilton")),
df2 = data.frame(bread = c("toast","brioche","focaccia","whole wheat"), cheese = c("gorgonzola","camembert","gouda","feta")))
and I require the dataframe to be stacked vertically with an additional column representing the list element name from which they came, as in the following example:
df = data.frame(breads = c("sourdough","baguette","boule","multigrain","toast","brioche","focaccia","whole wheat"),
cheese = c("parmigiano","cheddar","mozzarella","stilton","gorgonzola","camembert","gouda","feta"),
factor = rep(c("df1","df2"),each = 4))
Very simple, but can't get my head around it.
you can use
do.call(rbind,list1)
bread cheese
df1.1 sourdough parmigiano
df1.2 baguette cheddar
df1.3 boule mozzarella
df1.4 multigrain stilton
df2.1 toast gorgonzola
df2.2 brioche camembert
df2.3 focaccia gouda
df2.4 whole wheat feta
edit:
if you want explicit "From" col
new_df <- do.call(rbind,list1)
new_df$From <- sub("\\..*$","",rownames(new_df))
bread cheese From
df1.1 sourdough parmigiano df1
df1.2 baguette cheddar df1
df1.3 boule mozzarella df1
df1.4 multigrain stilton df1
df2.1 toast gorgonzola df2
df2.2 brioche camembert df2
df2.3 focaccia gouda df2
df2.4 whole wheat feta df2
You can try this:
#Bind
DF <- do.call(rbind,list1)
#Create factor
DF$factor <- rownames(DF)
rownames(DF)<-NULL
DF$factor <- gsub("\\..*","",DF$factor)
DF
bread cheese factor
1 sourdough parmigiano df1
2 baguette cheddar df1
3 boule mozzarella df1
4 multigrain stilton df1
5 toast gorgonzola df2
6 brioche camembert df2
7 focaccia gouda df2
8 whole wheat feta df2

Combining Two Data Frames Horizontally in R

I would like to combine two data frames horizontally in R.
These are my two data frames:
dataframe 1:
veg loc quantity
carrot sak three
pepper lon two
tomato apw five
dataframe 2:
seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli
I want the outcome to be one data frame that looks like this:
veg quantity
carrot three
pepper two
tomato five
eggplant eleven
potato six
zucchini four
broccoli two
The question says "horizontally" but from the sample output it seems that what you meant was "vertically".
Now, assuming the input shown reproducibly in the Note at the end, rbind them like this. No packages are used and no objects are overwritten.
sel <- c("veg", "quantity")
rbind( df1[sel], df2[sel] )
If you like you could replace the first line of code with the following which picks out the common columns giving the same result for sel.
sel <- intersect(names(df1), names(df2))
Note
Lines1 <- "veg loc quantity
carrot sak three
pepper lon two
tomato apw five"
Lines2 <- "seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli"
df1 <- read.table(text = Lines1, header = TRUE, strip.white = TRUE)
df2 <- read.table(text = Lines2, header = TRUE, strip.white = TRUE)
You can do it like this:
library (tidyverse)
df1 <- df1%>%select(veg, quantity)
df2 <- df2%>%select(veg, quantity)
df3 <- rbind(df1, df2)

Group words (from defined list) into themes in R

I am new to Stackoverflow and trying to learn R.
I want to find a set of defined words in a text. Return the count of these words in a table format with the associated theme I have defined.
Here is my attempt:
text <- c("Green fruits are such as apples, green mangoes and avocados are good for high blood pressure. Vegetables range from greens like lettuce, spinach, Swiss chard, and mustard greens are great for heart disease. When researchers combined findings with several other long-term studies and looked at coronary heart disease and stroke separately, they found a similar protective effect for both. Green mangoes are the best.")
library(qdap)
**#Own Defined Lists**
fruit <- c("apples", "green mangoes", "avocados")
veg <- c("lettuce", "spinach", "Swiss chard", "mustard greens")
**#Splitting in Sentences**
stext <- strsplit(text, split="\\.")[[1]]
**#Obtain and Count Occurences**
library(plyr)
fruitres <- laply(fruit, function(x) grep(x, stext))
vegres <- laply(veg, function(x) grep(x, stext))
**#Quick check, and not returning 2 results for** "green mangoes"
grep("green mangoes", stext)
**#Trying with stringr package**
tag_ex <- paste0('(', paste(fruit, collapse = '|'), ')')
tag_ex
library(dplyr)
library(stringr)
themes = sapply(str_extract_all(stext, tag_ex), function(x) paste(x, collapse=','))[[1]]
themes
#Create data table
library(data.table)
data.table(fruit,fruitres)
Using the respective qdap and stringr packages I am unable to obtain a solution I desire.
Desired solution for fruits and veg combined in a table
apples fruit 1
green mangoes fruit 2
avocados fruit 1
lettuce veg 1
spinach veg 1
Swiss chard veg 1
mustard greens veg 1
Any help will be appreciated. Thank you
I tried to generalize for N number of vectors
tidyverse and stringr solution
library(tidyverse)
library(stringr)
Create a data.frame of your vectors
data <- c("fruit","veg") # vector names
L <- map(data, ~get(.x))
names(L) <- data
long <- map_df(1:length(L), ~data.frame(category=rep(names(L)[.x]), type=L[[.x]]))
# You may receive warnings about coercing to characters
# category type
# 1 fruit apples
# 2 fruit green mangoes
# 3 fruit avocados
# etc
To count instances of each
long %>%
mutate(count=str_count(tolower(text), tolower(type)))
Output
category type count
1 fruit apples 1
2 fruit green mangoes 2
3 fruit avocados 1
4 veg lettuce 1
# etc
Extra stuff
We can add another vector easily
health <- c("blood", "heart")
data <- c("fruit","veg", "health")
# code as above
Extra output (tail)
6 veg Swiss chard 1
7 veg mustard greens 1
8 health blood 1
9 health heart 2

Resources