Count elements in dataframe column, then create separate columns in R - r

I am struggling for a few days with a solution myself. Hope you can help.
I checked the following already:
Counting the number of elements with the values of x in a vector
Split strings in a matrix column and count the single elements in a new vector
http://tidyr.tidyverse.org/reference/separate.html
Count of Comma separate values in r
I have a dataframe as follows:
df<-list(column=c("apple juice,guava-peach juice,melon apple juice","orange juice,pineapple strawberry lemon juice"))
df<-data.frame(df)
I want to separate each element separated by "," in its own column. Number of columns must be based on the maximum number of elements in each row in column
column1 column2 column3
apple juice guava-peach juice melon apple juice
orange juice pineapple strawberry lemon juice NA
I tried using
library(tidyverse)
library(stringr)
#want to calculate number of columns needed and the sequence
x<-str_count(df$column)
results<-df%>%separate(column,x,",")
Unfortunately I am not getting what I wish to.
Thank you for your help.

Do you mean this?
library(splitstackshape)
library(dplyr)
df %>%
cSplit("column", ",")
Output is:
column_1 column_2 column_3
1: apple juice guava-peach juice melon apple juice
2: orange juice pineapple strawberry lemon juice <NA>
Sample data:
df <- structure(list(column = structure(1:2, .Label = c("apple juice,guava-peach juice,melon apple juice",
"orange juice,pineapple strawberry lemon juice"), class = "factor")), .Names = "column", row.names = c(NA,
-2L), class = "data.frame")

Related

How to present the frequencies of each of the choices of multiple-choices questions that are presented in different ways?

I have this example dataframe (my real dataframe is larger and this one includes all the cases I am facing with my big dataframe)
df = data.frame(ingridents = c('bread', 'BREAD', 'Bread orange juice',
'orange juice', 'Apple', 'apple bread, orange juice',
'bread Apple ORANGE JUICE'),
Frequency = c(10,3,5,4,2,3,1) )
In this df dataframe we can see that :
the ingridient bread is drafted as bread, BREAD and Bread (alone or with other answers). The same thing with the ingridient apple.
the ingridient orange juice is drafted in multiple forms and in one of the groups of responses there is a comma and in another there is no comma. Also, I want R to recognize the totality of the orange juice expression. Not orange alone and juice alone.
The objective is to create another dataframe with each of these 3 ingridients and their frequencies as follows :
ingridents Frequency
1 BREAD 22
2 ORANGE JUICE 13
3 APPLE 6
How can I program an algorithm on R so that he can recognise each response with its total frequency (wheather it includes capital or small letters or wheather it is formed of two-word expressions such as orange juice) ?
Here is one way to do it. First, we'll do some string preprocessing (i.e. get all strings in upper case, remove commas and concatenate the juice), then split by space and do the summing:
library(tidyr)
library(dplyr)
library(stringr)
df |>
mutate(ingridents = ingridents |>
toupper() |>
str_remove_all(",") |>
str_replace_all("ORANGE JUICE", "ORANGE_JUICE")) |>
separate_rows(ingridents, sep = " ") |>
count(ingridents, wt = Frequency) |>
arrange(desc(n)) |>
mutate(ingridents = str_replace_all(ingridents, "ORANGE_JUICE", "ORANGE JUICE"))
Output:
# A tibble: 3 × 2
ingridents n
<chr> <dbl>
1 BREAD 22
2 ORANGE JUICE 13
3 APPLE 6

Combining Two Data Frames Horizontally in R

I would like to combine two data frames horizontally in R.
These are my two data frames:
dataframe 1:
veg loc quantity
carrot sak three
pepper lon two
tomato apw five
dataframe 2:
seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli
I want the outcome to be one data frame that looks like this:
veg quantity
carrot three
pepper two
tomato five
eggplant eleven
potato six
zucchini four
broccoli two
The question says "horizontally" but from the sample output it seems that what you meant was "vertically".
Now, assuming the input shown reproducibly in the Note at the end, rbind them like this. No packages are used and no objects are overwritten.
sel <- c("veg", "quantity")
rbind( df1[sel], df2[sel] )
If you like you could replace the first line of code with the following which picks out the common columns giving the same result for sel.
sel <- intersect(names(df1), names(df2))
Note
Lines1 <- "veg loc quantity
carrot sak three
pepper lon two
tomato apw five"
Lines2 <- "seller quantity veg
Ben eleven eggplant
Nour six potato
Loni four zucchini
Ahmed two broccoli"
df1 <- read.table(text = Lines1, header = TRUE, strip.white = TRUE)
df2 <- read.table(text = Lines2, header = TRUE, strip.white = TRUE)
You can do it like this:
library (tidyverse)
df1 <- df1%>%select(veg, quantity)
df2 <- df2%>%select(veg, quantity)
df3 <- rbind(df1, df2)

R: Conditional replacement using two data frames

I have a dataframe dflike this:
df <- data.frame(fruits = c("apple", "orange", "pineapple", "banana", "grape"))
df_rep <- data.frame(eng = c("apple", "orange", "grape"),
esp = c("manzana", "naranja", "uva"))
>df
fruits
apple
orange
pineapple
banana
grape
>df_rep
eng esp
apple manzana
orange naranja
grape uva
I want to replace the value in the fruits column of df referring to df_rep. If the values in the fruits column of df appears in eng column of df_rep, I want to replace them with the values in esp column of df_rep. So the result should look like this:
>df
fruits
manzana
naranja
pineapple
banana
uva
Point: I don't want to use ifelse as in my real data frame there are more than 100 replacement list. The example here is simplified for easy understanding. Nor for loop as my data frame contains more than 40,000 rows. I am looking for a simple and only one action solution.
Thank you very much for your help!
We can use the merge function (to simulate a SQL left join) and then the ifelse function to replace the fruits with non-NA esp values:
df2 <- merge(df, df_rep, by.x = 'fruits', by.y = 'eng', all.x = TRUE)
df2$fruits <- ifelse(is.na(df2$esp), df2$fruits, df2$esp)
# fruits esp
# 1 manzana manzana
# 2 banana <NA>
# 3 uva uva
# 4 naranja naranja
# 5 pineapple <NA>
Data
It's important to set stringsAsFactors = FALSE when creating the data:
df <- data.frame(fruits = c("apple", "orange", "pineapple", "banana", "grape"),
stringsAsFactors = FALSE)
df_rep <- data.frame(eng = c("apple", "orange", "grape"),
esp = c("manzana", "naranja", "uva"),
stringsAsFactors = FALSE)
Another option is coalesce from dplyr to replace the NAs that result from match with the respective values from df$fruits.
library(dplyr)
df$fruits2 <- coalesce(df_rep$esp[match(df$fruits, df_rep$eng)], df$fruits)
df
# fruits fruits2
#1 apple manzana
#2 orange naranja
#3 pineapple pineapple
#4 banana banana
#5 grape uva

R: factor to character does not work

I just do not understand why my data frame keeps being a factor even if I try to change it to a character.
Here I have a data frame list_attribute$affrete.
affrete
<chr>
Fruits
Apple
Grape Fruits
Lemon
Peach
Banana
Orange
Strawberry
Apple
And I applied a function to replace some values in list_attribute$affrete to other values using another data frame renaming, which has two columns(Name and Rename).
affrete <- plyr::mapvalues(x = unlist(list_attribute$affrete, use.names = F),
from = as.character(renaming$Name),
to = as.character(renaming$Rename))
affrete <- as.character(affrete)
list_attribute$affrete <- data.frame(affrete)
The data frame renaming looks like this;
Name Rename
<fctr> <fctr>
Apple Manzana
Orange Naranja
Lemon Limon
Grape Uva
Peach Melocoton
Pinapple Anana
And here is list_attribute$affrete after applying these processes above.
affrete
<fctr>
Manzana
Grape Fruits
Limon
Melocoton
Banana
Naranja
Strawberry
Manzana
Why is this column still a factor? I tried the method discussed here but none of them works. WHY? I'd appreciate for any help!
By default data.frame has argument stringsAsFactors = TRUE. When you call data.frame(affrete) it converts characters to factors. You can either:
Call data.frame(affrete, stringsAsFactors = FALSE) instead
Set this behaviour off permanently for your session with options(stringsAsFactors = FALSE)
Fix after the fact once it's already in the list with list_attribute$affrete$affrete <- as.character(list_attribute$affrete$affrete)
Use tbls from the tidyverse, so call tibble(affrete) instead. These never convert characters, among other benefits.
I think the problem is from list_attribute$affrete <- data.frame(affrete), the default behavior of data.frame() is with stringsAsFactors = TRUE

Group words (from defined list) into themes in R

I am new to Stackoverflow and trying to learn R.
I want to find a set of defined words in a text. Return the count of these words in a table format with the associated theme I have defined.
Here is my attempt:
text <- c("Green fruits are such as apples, green mangoes and avocados are good for high blood pressure. Vegetables range from greens like lettuce, spinach, Swiss chard, and mustard greens are great for heart disease. When researchers combined findings with several other long-term studies and looked at coronary heart disease and stroke separately, they found a similar protective effect for both. Green mangoes are the best.")
library(qdap)
**#Own Defined Lists**
fruit <- c("apples", "green mangoes", "avocados")
veg <- c("lettuce", "spinach", "Swiss chard", "mustard greens")
**#Splitting in Sentences**
stext <- strsplit(text, split="\\.")[[1]]
**#Obtain and Count Occurences**
library(plyr)
fruitres <- laply(fruit, function(x) grep(x, stext))
vegres <- laply(veg, function(x) grep(x, stext))
**#Quick check, and not returning 2 results for** "green mangoes"
grep("green mangoes", stext)
**#Trying with stringr package**
tag_ex <- paste0('(', paste(fruit, collapse = '|'), ')')
tag_ex
library(dplyr)
library(stringr)
themes = sapply(str_extract_all(stext, tag_ex), function(x) paste(x, collapse=','))[[1]]
themes
#Create data table
library(data.table)
data.table(fruit,fruitres)
Using the respective qdap and stringr packages I am unable to obtain a solution I desire.
Desired solution for fruits and veg combined in a table
apples fruit 1
green mangoes fruit 2
avocados fruit 1
lettuce veg 1
spinach veg 1
Swiss chard veg 1
mustard greens veg 1
Any help will be appreciated. Thank you
I tried to generalize for N number of vectors
tidyverse and stringr solution
library(tidyverse)
library(stringr)
Create a data.frame of your vectors
data <- c("fruit","veg") # vector names
L <- map(data, ~get(.x))
names(L) <- data
long <- map_df(1:length(L), ~data.frame(category=rep(names(L)[.x]), type=L[[.x]]))
# You may receive warnings about coercing to characters
# category type
# 1 fruit apples
# 2 fruit green mangoes
# 3 fruit avocados
# etc
To count instances of each
long %>%
mutate(count=str_count(tolower(text), tolower(type)))
Output
category type count
1 fruit apples 1
2 fruit green mangoes 2
3 fruit avocados 1
4 veg lettuce 1
# etc
Extra stuff
We can add another vector easily
health <- c("blood", "heart")
data <- c("fruit","veg", "health")
# code as above
Extra output (tail)
6 veg Swiss chard 1
7 veg mustard greens 1
8 health blood 1
9 health heart 2

Resources