This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 7 years ago.
I am trying to calculate mean salary for every job title from a data set which has 2159 job titles and convert into a list. My code
> for (i in 1:length(unique(sfs$JobTitle))) {
a<-print(paste((sfs$JobTitle[[i]])))
b<-print(paste((mean(sfs$BasePay[[i]]))))
ms<-list(a,b)
}
Also tried
for (i in 1:length(unique(sfs$JobTitle))) { ms<-matrix((sfs$JobTitle[[i]]),(mean(sfs$BasePay[[i]]))) }
The output I am getting is a list of 2 elements only. Can you guys help. Thanks
Perhaps you don't need a for loop. There are other ways to do it.
If you have a data.frame try this:
agg = aggregate(BasePay ~ JobTitle, data=sfs, mean)
This would work also:
sapply(split(sfs$BasePay, sfs$JobTitle), mean)
If you insist on having a list, use lapply:
lapply(split(sfs$BasePay, sfs$JobTitle), mean)
Related
This question already has answers here:
How to remove columns with same value in R
(4 answers)
Closed 2 years ago.
I have a really large dataset and I want to filter out some of the columns because it is the same data all throughout (ex: company name is all "Walmart"). I can go through and do these manually but I'm looking for a code to do it automatically.
I had in mind a function to subset based on if sum(unique(colnam)) == 1 but not sure how to get it to work. Thanks.
which(sapply(dat, function(col) length(unique(col)) == 1))
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 years ago.
I have a CSV file where there are 4 columns. I would like to get answer by adding the 4th column values where the 3rd column values are same.
The data that i have looks like this:
Now i want to aggregate the above data like this:
Anyone can help me with your ideas!
Using aggregate would do the trick here. Below I'm summing the value variable using id as the group (notice ids 6 and 10 are repeating).
df <- data.frame(id = c(1,2,3,4,5,6,6,7,8,9,10,10),
value = c(9,5,6,8,4,3,2,5,3,5,1,2))
df_sum <- aggregate(value ~ id, data=df, FUN=sum)
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 5 years ago.
I have a dataframe containing sells by day, but as there are many values for one day, so i'd like to rebuilt my dataframe to have just one value per day which is the sum of all the sells of this day.
what i have:
what i'd like to have:
(the values are not well calculated but it's for the example)
I tried agregate and functions like this but it dosn't work and I dont know how to do this...
Thanks for help
Aggregate should work
aggregate(TOT_OP_MAIN_PAID_MNT,by=list(DATE_ID),FUN=sum)
This should work
df <- data.frame(DATE_ID = 1, TOT_OP_MAIN_PAID_MNT = 1)
aggregate(TOT_OP_MAIN_PAID_MNT ~ DATE_ID, df, sum)
This question already has answers here:
How to find the highest value of a column in a data frame in R?
(10 answers)
Closed 7 years ago.
I have the following dataframe:
I would like to create a table of the column headers, in a column with their maximum value to the right of them. I will be looking to embed this in a Shiny App. Does anyone know how I can do this?
I used:
colMax <- function(data) sapply(data, max, na.rm=TRUE)
Then I called as.data.frame(colmax(data)) and it worked as needed.
Duplicate: How to find the highest value of a column in a data frame in R?
This question already has answers here:
Moving columns within a data.frame() without retyping
(17 answers)
Closed 9 years ago.
I'd like to reorganize my data frame. I just wanted to move the last column into first place and the rest leave in the same order. I used function subset to do it. It works but it would be painful if I have like 100 columns or so.
Is there any easier way to do it ?
tbl_comp <- subset(tbl_comp, select=c("Description","Meve_mean","Mmor_mean", "Mtot_mean", "tot_meanMe", "tot_meanMm", "tot_sdMe", "tot_sdMm", "Wteve_mean", "Wtmor_mean", "Wttot_mean", "tot_meanwte", "tot_meanwtm", "tot_sdwte", "tot_sdwtm"))
Try this
tbl_comp <- subset(tbl_comp, select=c(Description , Meve_mean:tot_sdwtm))
tbl_comp <- cbind(tbl_comp[ncol(tbl_comp)], tbl_comp[-ncol(tbl_comp)])
will do the trick.