Remove duplicate in dataframe

Remove duplicate in dataframe - r

I have a dataframe known as Tgame containing two columns game and hours_played. I am trying to remove duplicates in the column game and also sum up the average for column hours_played for game column.

Should be as simple as this (using data.table):
library(data.table)
setDT(Tgame)[, mean(hours_played), by = game]

Related

R _ Make combinations of multiple dataframes & columns of different size

My first question here...
I have 2 dataframes, both with a different number of rows.
The first one has 3 columns, the second one has 1 column.
I want to make all combinations of values from the 1st column of the 1st dataframe with values in the 1st (and only) column of the second dataframe, and values of 2nd column of 1st dataframe with values in 1st (and only) column of second dataframe, and so on...
I assume the result will be a one-column dataframe (?).
Something like this:
Attempts with combn did not help me yet...
Thanks!

Probably not fully what you want, but provides a starting point. Providing your first dataframe is called df and the other one (with one column) df2
#make data long using tidyr
df_long <- tidyr::pivot_longer(df, cols = c("loc1", "loc2", "loc3"))
#cartesian join with codes column
CJ(df_long$value, df2)

Aggregate rows across some columns using ID and keep others unchanged in a large R dataframe

I have a large dataframe (6000rx42c) where I have an almost unique ID. There are some duplicates where one ID has multiple rows, which vary only by 2 numerical columns which I am happy to add up into 1 row.
I've spent ages looking and aggregate seems to work,however I need to list all columns I am keeping which is a pain. Can someone suggest a better solution? I am not wedded to aggregate.
NewDF <-aggregate(cbind(AddColl1,AddCol2)~ID+OtherCol1+OtherCol2+OtherCol3...OtherCol39 , DF , sum)

Updating column values according to a specific combination of duplicates in R

I am still new to R and I am attempting to solve a seemingly simple problem. I would like to identify all of the unique combinations of values from 4 different rows, and update an additional column in my df to annotate whether or not it is unique.
Giving a df with columns A-Z, I have used the following code to identify unique combinations of column A,B,C,D, and E. I am trying to update column F with this information.
unique(df[ ,c("A", "B","C","D", "E")])
This returns each of the individual rows with unique combinations as expected, but I cannot figure out what the next step I should take in order to update column "F" with a value to indicate that it is a unique row. Thanks in advance for any pointers!

difference between last row and row meeting condition dplyr

This is probably easy, but in a grouped data frame, I'm trying to find the difference in diff.column between the last row and the row where var.col is B The condition only appears once within each group. I'd like to make that difference a new variable using summarize from dplyr.
my.data<-data.frame(diff.col=seq(1:10),var.col=c(rep('A',5),'B',rep('A',4)))
I'd like to keep this in dplyr and I know how to code it except for selecting diff.col where var.col==B.
my.data%>%summarize(new.var=last(diff.col)-????)

extract columns that don't have a header or name in R

I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks

Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.

I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove duplicate in dataframe - r

I have a dataframe known as Tgame containing two columns game and hours_played. I am trying to remove duplicates in the column game and also sum up the average for column hours_played for game column.

Should be as simple as this (using data.table): library(data.table) setDT(Tgame)[, mean(hours_played), by = game]

Related

R _ Make combinations of multiple dataframes & columns of different size

Aggregate rows across some columns using ID and keep others unchanged in a large R dataframe

Updating column values according to a specific combination of duplicates in R

difference between last row and row meeting condition dplyr

extract columns that don't have a header or name in R

Categories

Resources