R: transposing and aggregating columns - aggregate-functions

I have this dataframe with v1 as names of categories and v2 as items.
I need to aggregate v1 so fruit will appear only once, and all items in v2 will be transposed and placed in together in a single cell separated by comma.
Any assistance will be appreciated :)

You can easily do it using transpose in aggregate function.
dt<-aggregate(v2~v1,data=dt,fun=t)
However you will have some unwanted text in the rolled up string which you can easily removed using gsub function
dt$v2<-gsub("[c()]","",dt$v2))

Related

How to dynamically add pair of columns

I have a dataframe with columns like this:
Income | Wt | Ht | Growth_Income | Growth_Wt | Growth_Ht.
Each column has 300 rows of numeric values. I would like to find a way how to add the columns that look the same (e.g. Income and Growth_Income). I would also like to find a way how to populate the dataframe so that i do the summation five times and each iteration is based on the previous output.
Sorry, im quite new with R and I havent thought of any way how to write the code yet. In excel, it would be easy dragging the formula but i need to code it in r because otherwise my program wont work. I hope someone could help me out here :(
Assuming that the patterns are as showed in the example, we remove the substring prefix from the column names, use that to split the dataset into a list of data.frames and loop through the list to get the rowSums
nm1 <- sub(".*_", "", names(df1))
sapply(split.default(df1, nm1), rowSums)

Remove rows for multiple dataframes having a name matching a pattern

I am trying to remove the first 9 rows of multiple dataframes that have the same structures but different names (keeping similar name structure). In my example, there are 4 dataframes with respectively the names
Mydataframe_A, Mydataframe_B, Mydataframe_C, Mydataframe_D.
Currently it is working with the following code:
`Mydataframe_A`<- `Mydataframe_A`[-c(1:9),]
`Mydataframe_B`<- `Mydataframe_B`[-c(1:9),]
`Mydataframe_C`<- `Mydataframe_C`[-c(1:9),]
`Mydataframe_D`<- `Mydataframe_D`[-c(1:9),]
But I would like to write this is with only one line and not having to specify each time each name of dataframe.
I think this could work by using a pattern name and lists because for example this is what I am doing to rbind different dataframes:
All_mydataframes <- rbindlist(mget(ls(pattern = "^Mydataframe_")))
Any idea on how to do this ?
Thanks a ton!
Since mget turns this into a list, you can use apply family functions:
rbindlist(lapply(mget(ls(pattern = "^Mydataframe_")), function(x) x[-c(1:9), ]))
This takes the list from mget and removes the first 9 rows, then rbind it from list to data.table. The only problem is you can't differentiate what data.frame the original data was part of.

Combine/unite series of columns

I have a data frame like this, that continues to variable length (an even column number):
V1 V2 V3 V4 V5 V6
A B C D E F
I would like the first half of the data frame to form pairs with the second half of the data frame. (In the case above that would be pairs such as AD, BE and CF.)
Taken from another post, I have made this but I can't manage to make a data frame out of it.
lapply(1:(ncol(df)/2), function(x) paste(df[,c(x,x+(ncol(df)/2))], collapse = "")) %>%
data.frame
Could someone explain what actually happens in this piece of code?
I am not sure exactly what problem you are facing but there are potentially two problems I see. First is that your character variables are actually factors. In that case you will get back underlying indexes rather than the characters. A second potential issue could be in the paste function. Writing it like this gives me the right results. You will have to use rename_all from dplyr to make the variables names usable.
lapply(1:(ncol(df)/2), function(x) paste0(df[[x]], df[[x + ncol(df) / 2]])) %>% data.frame
Now what is going on here:
Assuming that you will always have an even number of columns, we are dividing that number by two and then for each of that column index, applying the paste0 function. paste0 is a simple wrapper for paste(..., sep = ''). We are pasting the column x and column x + half the number of columns. In my updated code, I am using [[]] because that will return a character vector.

How to split a list to only use one column

I'm relatively new to R, and I can't figure out how to split the list that I'm working with. I have
B<-tapply(newdata$lf.d1, newdata$year, mean)
But I want to concatenate the mean values onto another matrix without the year values. How would I go about doing this?
The result of tapply with a single grouping factor will be an R contingency table with rownames. There is only a single column (actually not even that because it is a table object and only has a single dimension unless you coerce it with as.matrix). If you want to remove the names, then use the unname function.
unname(B)
unname(as.matrix(B))

how to remove rows from dataframe in r (different situation)

So I want to remove the first row in the data frame. my code is like
reddot.info<-reddot.info[-1,]
But then i find i can not view this data frame. i figure out the reason. because when i run code
reddot.info[1,]
if appears as
Num Product_names URL
1 NA Product Names URL
which means that if i use this code i will also remove the column names.
so what should i do to remove the first row in stead of removing the column names and first row together.
Thank you so much.
You probably don't have column names to begin with because removing the first row like that doesn't remove the column names.
colnames(reddot.info) <- reddot.info[1,]
reddot.info <- reddot.info[-1,]

Resources