Using several fields on the same level in pivottabler

Using several fields on the same level in pivottabler - r

I'm looking in pivottabler for the option to build the table for 2 parallel fields, like that:
SEX | POPULATION GROUPS
______________________|_______________________________________
1 | 2 | 1 | 2 | 3 | 4
___________|_________|_________|_________|_________|___________
AgeGroups |AgeGroups|AgeGroups|AgeGroups|AgeGroups|AgeGroups
| | | | | | | | | | | | | | | | |
1 | 2| 3| 1 | 2| 3| 1 | 2| 3| 1 |2 | 3| 1 | 2| 3| 1 | 2| 3
______|__|__|___|__|__|___|__|__|___|__|__|___|__|__|___|__|___
| | | | | | | | | | | | | | | | |
How can I add in AddColumnDataRows 2 or more fields parallel and not hierarchically?

This can be done by specifying atLevel=1 when adding the population group columns to the pivot table.
I have made up some sample data to use in an example below.
n <- 100
sex <- sample(x=c("M","F"), size=n, replace=TRUE)
pg <- sample(x=c("pg1","pg2","pg3","pg4"), size=n, replace=TRUE)
ag <- sample(x=c("ag1","ag2","ag3"), size=n, replace=TRUE)
grp <- sample(x=c("g1","g2","g3","g4"), size=n, replace=TRUE)
df <- data.frame(sex, pg, ag, grp)
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df)
pt$addColumnDataGroups("sex", addTotal=FALSE)
pt$addColumnDataGroups("pg", atLevel=1, addTotal=FALSE)
pt$addColumnDataGroups("ag", addTotal=FALSE)
pt$addRowDataGroups("grp")
pt$defineCalculation(calculationName="Count", summariseExpression="n()")
pt$renderPivot()
Example output:
Hope that helps
Chris

Related

frequency table for banner list

I am trying to create a function to generate frequency table (to show count , valid percentage , percentage) for list of banner.
I want to export tables in xlsx files.
like for variable "gear" , i want to calculate the table for banner below ()
library(expss)
df <- mtcars
df$all<- 1
df$small<-ifelse(df$vs==1,1,NA)
df$large<-ifelse(df$am ==1,1,NA)
val_lab(df$all)<-c("Total"=1)
val_lab(df$small)<-c("Small"=1)
val_lab(df$large)<-c("Large"=1)
banner <- list(dat$all,dat$small,dat$large)
data <- df
var <- "gear"
var1 <- rlang::parse_expr(var)
expss::var_lab(data[[var]])
#tab1 <- expss::fre(data[[var1]])
table1 <- expss::fre(data[[var1]],
stat_lab = getOption("expss.fre_stat_lab", c("Count N", "Valid percent", "Percent",
"Responses, %", "Cumulative responses, %")))
table1
the output table should be like below

You need to make custom function around fre:
library(expss)
df <- mtcars
df$all<- 1
df$small<-ifelse(df$vs==1,1,NA)
df$large<-ifelse(df$am ==1,1,NA)
val_lab(df$all)<-c("Total"=1)
val_lab(df$small)<-c("Small"=1)
val_lab(df$large)<-c("Large"=1)
my_fre <- function(curr_var) setNames(expss::fre(curr_var)[, 1:3],
c("row_labels", "Count N", "Valid percent"))
cross_fun_df(df, gear, list(all, small, large), fun = my_fre)
# | | Total | | Small | | Large | |
# | | Count N | Valid percent | Count N | Valid percent | Count N | Valid percent |
# | ------ | ------- | ------------- | ------- | ------------- | ------- | ------------- |
# | 3 | 15 | 46.88 | 3 | 21.43 | | |
# | 4 | 12 | 37.50 | 10 | 71.43 | 8 | 61.54 |
# | 5 | 5 | 15.62 | 1 | 7.14 | 5 | 38.46 |
# | #Total | 32 | 100.00 | 14 | 100.00 | 13 | 100.00 |
# | <NA> | 0 | | 0 | | 0 | |

Function to eliminate rows from a dataframe with certain condition in R

everyone!
I will try to explain my problem. It is very difficult for me. I Hope you can help me:
I have a data frame, lets call it DF1, that looks like the next one:
|Symbol | Date | Volume | Price|
|----------------------------|-------|
|A |2014-01-01 | 0 | 4 |
|A |2014-01-02 | 7 | 7 |
|A |2014-01-03 | 8 | 9 |
|A |2014-01-04 | 1 | 5 |
|B |2014-01-01 |45 | 6 |
|B |2014-01-02 |0 | 11 |
|B |2014-01-03 |34 | 8 |
|B |2014-01-04 |45 | 5 |
|C |2014-01-01 |4 | 6 |
|C |2014-01-02 |0 | 5 |
|C |2014-01-03 |14 | 25 |
|D |2014-01-01 |31 | 4 |
|D |2014-01-02 |7 | 6 |
|D |2014-01-03 |18 | 3 |
|D |2014-01-04 |15 | 7 |
|E |2014-01-01 |13 | 8 |
|E |2014-01-02 |0 | 9 |
Having this dataframe I create a new dataframe, let's call it DF2, through the following lines of code:
RM <- DF1 %>% group_by(Date) %>%
mutate(weight = Volume/sum(Volume),
R_i = weight*(log(Price)-log(lag(Price)))) %>%
summarise(RM = sum(R_i, na.rm = TRUE))
And from RM, I select only the dates that are of my interest :
RM_reg <- subset(RM, date >= "2014-03-05" & date<="2014-09-03")
Finally, RM_reg looks like this:
| Date | RM |
|2014-03-05 | 0 |
|2014-03-06 | 7 |
|2014-03-07 | 8 |
|2014-03-08 | 1 |
|2014-03-09 | 45 |
|2014-03-10 | 0 |
|2014-03-11 | 34 |
|2014-03-12 | 45 |
|2014-03-13 | 4 |
|2014-03-14 | 0 |
|2014-03-15 | 14 |
|2014-03-16 | 31 |
It should be noted that the values in the RM_reg column are not the actual values, but only examples. Starting from my original dataframe, RM_reg has 125 rows.
Then, from dataframe DF1, I extract the rows for which the Company column is equal to A through the following code:
DF_A <- DF_1%>%
filter(Symbol=="A")
And I add a column of returns to the dataframe DF_A, through the following code:
RA <- DF_A %>% group_by(Symbol)%>%
mutate(Ret_i = log(Price) - lag(log(Price)))
I eliminate the first row, which is NA:
AR <- na.omit(RA)
And from AR, I select only the dates that are of my interest :
AR_reg <- subset(AR, date >= "2014-03-05" & date<="2014-09-03")
AR_reg looks like this:
|Symbol | Date | volume |price | Ret_i |
|--------------------------------------------|
|A |2014-03-05 | 1 | 5 | 2 |
|A |2014-03-06 | 3 | 8 | 3 |
|A |2014-03-07 | 7 | 4 | 4 |
|A |2014-03-08 |3 | 6 | 5 |
|A |2014-03-09 |34 | 7 | 1 |
|A |2014-03-10 |45 | 34 | 4 |
|A |2014-03-11 |4 | 5 | 3 |
|A |2014-03-12 |9 | 7 | 5 |
|A |2014-03-13 |8 | 6 | 6 |
|A |2014-03-14 |4 | 4 | 1 |
|A |2014-03-15 |0 | 7 | 4 |
|A |2014-03-16 |4 | 7 | 7 |
It should be noted that the values in the AR_reg column are not the actual values, but only examples. Starting from my original dataframe, AR_reg also has 125 rows.
Finally, because RM_reg and AR_reg I can regress the Ret_i column of AR_reg on the RM column of RM_reg through the following code:
mod <- lm(AR_reg$Ret_i ~ RM_reg$RM)
What I need to do is to do the same as described above for all the Symbols in the dataframe DF1, in this case for, "B", "C", "D", "E". The problem is that we do not have the same amount of entries, or the same amount of rows corresponding to all Symbols, and this is a necessary condition to be able to do the regression. To do the regression I need to have 125 observations of returns for each Symbol.
What I have thought is to eliminate the Symbols for which the dataframe similar to AR_reg that is generated does not have 125 entries or rows; but the truth is that I do not know how to do this, I suppose that a function must be raised but this is a subject that I still do not dominate.
Thank you very much for reading me, I hope you have understood me. Any help or suggestion will be very appreciated
Translated with www.DeepL.com/Translator (free version)

Join DF1 with RM by Date, keep only data between specific dates, for each Symbol calculate Ret_i and drop NA values and create list of models.
The complete code would look like :
library(dplyr)
DF1$Date <- as.Date(DF1$Date)
RM <- DF1 %>%
group_by(Date) %>%
mutate(weight = Volume/sum(Volume),
R_i = weight*(log(Price)-log(lag(Price)))) %>%
summarise(RM = sum(R_i, na.rm = TRUE))
result <- DF1 %>%
left_join(RM, by = 'Date') %>%
filter(between(Date, as.Date("2014-03-05"), as.Date("2014-09-03")))
group_by(Symbol) %>%
mutate(Ret_i = log(Price) - lag(log(Price))) %>%
na.omit() %>%
summarise(model = list(lm(Ret_i~RM)))
result

How to drop unused value labels in crosstabulations table outputs using cro function from expss package?

I'm using heaven labelled dataframes (variables already have value labels when importing datasets). I need to run many crosstabulations of two variables. I’m using the cro function from expss package because by default displays value labels, and computes weighted crosstabs.
However, the output tables I get display unused value labels. How can I drop unused labels without manually dropping unused value labels for each variable? (by the way: the fre function from expss package has this argument by default: drop_unused_labels = TRUE, but cro function doesn’t)
Here is a reproducible example:
# Dataframe
df <- data.frame(sex = c(1, 2, 99, 2, 1, 2, 2, 2, 1, 2),
agegroup= c(1, 2, 99, 2, 3, 3, 2, 2, 2, 1),
weight = c(100, 20, 400, 300, 50, 50, 80, 250, 100, 100))
library(expss)
# Variable labels
var_lab(df$sex) <-"Sex"
var_lab(df$agegroup) <-"Age group"
# Value labels
val_lab(df$sex) <- make_labels("1 Male
2 Female
97 Didn't know
98 Didn't respond
99 Abandoned survey")
val_lab(df$agegroup) <- make_labels("1 1-29
2 30-49
3 50 and more
97 Didn't know
98 Didn't respond
99 Abandoned survey")
cro(df$sex, df$agegroup, weight = df$weight)
| | | Age group | | | | | |
| | | 1-29 | 30-49 | 50 and more | Didn't know | Didn't respond | Abandoned survey |
| --- | ---------------- | --------- | ----- | ----------- | ----------- | -------------- | ---------------- |
| Sex | Male | 100 | 100 | 50 | | | |
| | Female | 100 | 650 | 50 | | | |
| | Didn't know | | | | | | |
| | Didn't respond | | | | | | |
| | Abandoned survey | | | | | | 400 |
| | #Total cases | 2 | 5 | 2 | | | 1 |
I want to get rid of the columns and rows called ‘Didn't know’ and ‘Didn't respond’.

You can use drop_unused_labels function to remove the labels which are not used.
library(expss)
df1 <- drop_unused_labels(df)
cro(df1$sex, df1$agegroup, weight = df1$weight)
| | | Age group | | | |
| | | 1-29 | 30-49 | 50 and more | Abandoned survey |
| --- | ---------------- | --------- | ----- | ----------- | ---------------- |
| Sex | Male | 100 | 100 | 50 | |
| | Female | 100 | 650 | 50 | |
| | Abandoned survey | | | | 400 |
| | #Total cases | 2 | 5 | 2 | 1 |

For each combination of a set of variables in a list, calculating correlations between this combination and another variable in R

In R I want to generate correlation co-efficients by comparing 2 variables whilst also retaining a phylogenetic signal.
The initial way I thought to do this is not computationally efficient, and I think there is a much simpler, but I do not have the skills in R to do it.
I have a csv file which looks like this:
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159 |
| Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984 |
| Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289 |
| Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987 |
| Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148 |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
What I want to do is, for each possible combination of the percentages within the 20 single letter columns (amino acids, so 10 million combinations). Is to calculate the correlation between each different combination and the OGT variable in the CSV.... (whilst retaining a phylogenetic signal)
My current code is this:
library(parallel)
library(dplyr)
library(tidyr)
library(magrittr)
library(ape)
library(geiger)
library(caper)
taxonomynex <- read.nexus("taxonomyforzeldospecies.nex")
zeldodata <- read.csv("COMPLETECOPYFORR.csv")
Species <- dput(zeldodata)
SpeciesLong <-
Species %>%
gather(protein, proportion,
A:Y) %>%
arrange(Species)
S <- unique(SpeciesLong$protein)
Scombi <- unlist(lapply(seq_along(S),
function(x) combn(S, x, FUN = paste0, collapse = "")))
joint_protein <- function(protein_combo, data){
sum(data$proportion[vapply(data$protein,
grepl,
logical(1),
protein_combo)])
}
SplitSpecies <-
split(SpeciesLong,
SpeciesLong$Species)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("Scombi", "joint_protein"))
SpeciesAggregate <-
parLapply(cl,
X = SplitSpecies,
fun = function(data){
X <- lapply(Scombi,
joint_protein,
data)
names(X) <- Scombi
as.data.frame(X)
})
Species <- cbind(Species, SpeciesAggregate)
`
Which attempts to feed in each combination into memory and then calculate the sum of each proportion of each of the acids, but this takes forever to finish and crashes before completion.
I think it would be better to feed in correlation co-efficents into a vector, and then just print out the relative co-efficients of each different combination for each species, but I don't know the best way of doing this in R.
I also aim to retain a phylogenetic signal using the ape package using something along the lines of this:
pglsModel <- gls(OGT ~ AminoAcidCombination, correlation = corBrownian(phy = taxonomynex),
data = zeldodata, method = "ML")
summary(pglsModel)
Apologies for how unclear this is, if anyone has any advice, much appreciated!
Edit: Link to taxonomyforzeldospecies.nex
Output from dput(Zeldodata):
1 Species OGT Domain A C D E F G H I K L M N P Q R S T V W Y
------------------------------- ----- ---------- --------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- --------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- --------------
2 Aeropyrum pernix 95 Archaea 9.7659115711 0.6720465616 4.3895390781 7.6501943794 2.9344881615 8.8666657183 1.5011817208 5.6901432494 4.1428307243 11.0604191603 2.21143353 1.9387130928 5.1038552753 1.6855017182 7.7664358772 6.266067034 4.2052190807 9.2692433532 1.318690698 3.5614200159
3 Argobacterium fabrum 26 Bacteria 11.5698896021 0.7985475923 5.5884500155 5.8165463343 4.0512504104 8.2643271309 2.0116736244 5.7962804605 3.8931525401 9.9250463349 2.5980609708 2.9846761128 4.7828063605 3.1262365491 6.5684282943 5.9454781844 5.3740045968 7.3382308193 1.2519739683 2.3149400984
4 Anaeromyxobacter dehalogenans 27 Bacteria 16.0337898849 0.8860252895 5.1368827707 6.1864992608 2.9730203513 9.3167603253 1.9360386851 2.940143349 2.3473650439 10.898494736 1.6343905351 1.5247123262 6.3580285706 2.4715303021 9.2639057482 4.1890063803 4.3992339725 8.3885969061 1.2890166336 1.8265589289
5 Aquifex aeolicus 85 Bacteria 5.8730327277 0.795341216 4.3287799008 9.6746388172 5.1386954322 6.7148035486 1.5438364179 7.3358775924 9.4641440609 10.5736658776 1.9263080969 3.6183861236 4.0518679067 2.0493569604 4.9229955632 4.7976564501 4.2005259246 7.9169763709 0.9292167138 4.1438942987
6 Archaeoglobus fulgidus 83 Archaea 7.8742687687 1.1695110027 4.9165979364 8.9548767369 4.568636662 7.2640358917 1.4998752909 7.2472039919 6.8957233203 9.4826333048 2.6014466253 3.206476915 3.8419576418 1.7789787933 5.7572748236 5.4763351139 4.1490633048 8.6330814159 1.0325605451 3.6494619148

this will give you a long data frame with each combination and sum per Species (takes about 35 seconds on my machine)...
zeldodata <-
Species %>%
gather(protein, proportion, A:Y) %>%
group_by(Species) %>%
mutate(combo = sapply(1:n(), function(i) combn(protein, i, FUN = paste0, collapse = ""))) %>%
mutate(sum = sapply(1:n(), function(i) combn(proportion, i, FUN = sum))) %>%
unnest() %>%
select(-protein, -proportion)
an example of calculating each species separately and saving the data to disk before reading each one in and combining them...
library(readr)
library(dplyr)
library(tidyr)
library(purrr)
# read in CSV file
zeldodata <-
read_delim(
delim = "|",
trim_ws = TRUE,
col_names = TRUE,
col_types = "cicdddddddddddddddddddd",
file = "Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y
Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159
Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984
Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289
Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987
Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148"
)
# save an RDS file for each species
for(species in unique(zeldodata$Species)) {
zeldodata %>%
filter(Species == species) %>%
gather(protein, proportion, A:Y) %>%
mutate(combo = sapply(1:n(), function(i) combn(protein, i, FUN = paste0, collapse = ""))) %>%
mutate(sum = sapply(1:n(), function(i) combn(proportion, i, FUN = sum))) %>%
unnest() %>%
select(-protein, -proportion) %>%
saveRDS(file = paste0(species, ".RDS"))
}
# read in and combine all the RDS files
zeldodata <-
list.files(pattern = "\\.RDS") %>%
map(read_rds) %>%
bind_rows()

How to make a multiple corpora in R

This is a car review data which has more than 40,000 rows and each review has more than 500 characters. This is sample data : https://drive.google.com/open?id=1ZRwzYH5McZIP2NLKxncmFaQ0mX1Pe0GShTMu57Tac_E
| brand | review | favorite | c4 | c5 | c6 | c7 | c8 |
| brand1 | 500 characters1 | 100 characters1 | | | | | |
| brand2 | 500 characters2 | 100 Characters2 | | | | | |
| brand2 | 500 characters3 | 100 Characters3 | | | | | |
| brand2 | 500 characters4 | 100 Characters4 | | | | | |
| brand3 | 500 characters5 | 100 Characters5 | | | | | |
| brand3 | 500 characters6 | 100 characters6 | | | | | |
I'd like to merge review column by brands like this :
| Brand | review | favorite | c4 | c5 | c6 | c7 | c8 |
| brand1 | 500 characters1 | 100 characters1 | | | | | |
| brand2 | 500 characters2 | 100 Characters2 | | | | | |
| | 500 characters3 | 100 Characters3 | | | | | |
| | 500 characters4 | 100 Characters4 | | | | | |
| brand3 | 500 characters5 | 100 Characters5 | | | | | |
| | 500 characters6 | 100 characters6 | | | | | |
So, I tired to use aggregate().
temp <- aggregate(data$review ~ data$brand , data, as.list )
But, It takes very long.
Is there any simple way to merge that?
Thank you in advance!

Try splitting them on each factor and then pasting them together. aggregate() is a horribly slow function and should be avoided for all but the smallest datasets.
This should do the trick: (note I downloaded your Google file as sampleDF.csv here)
sampleDF <- read.csv("~/Downloads/sampleDF.csv", stringsAsFactors = FALSE)
# aggregate text by brand
brand.split <- split(sampleDF$text, as.factor(sampleDF$Brand))
brand.grouped <- sapply(brand.split, paste, collapse = " ")
# aggregate favorite by brand
favorite.split <- split(sampleDF$favorite, as.factor(sampleDF$Brand))
favorite.grouped <- sapply(favorite.split, paste, collapse = " ")
newDf <- data.frame(brand = names(brand.split),
text <- favorite.grouped,
favorite <- favorite.grouped,
stringsAsFactors = FALSE)
If you want to bring in other variables they will need to vary at the brand level only.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using several fields on the same level in pivottabler - r

Related

frequency table for banner list

Function to eliminate rows from a dataframe with certain condition in R

How to drop unused value labels in crosstabulations table outputs using cro function from expss package?

For each combination of a set of variables in a list, calculating correlations between this combination and another variable in R

How to make a multiple corpora in R

Categories

Resources