aggregate over multiple columns [duplicate] - r

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Hey I have some data looks like this:
ExpNum Compound Peak Tau SS
1 a 100 30 50
2 a 145 23 45
3 b 78 45 56
4 b 45 43 23
5 c 344 23 56
Id like to fund the mean based on Compound name
What I have
Norm_Table$Norm_Peak = (aggregate(data[[3]],by=list(Compound),FUN=normalization))
This is fine and I have this coding repeating 3 times just changing the data[[x]] number. Would lapply work here? or a for loop?

A dplyr solution:
library(dplyr)
data %>%
group_by(Compound) %>%
summarize_each(funs(mean), -ExpNum)

Related

creating multi rows depend on special conditions [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have data.frame as follows :
duration classlabel
100 W
120 1
390 2
30 3
30 2
150 3
30 4
60 3
60 4
30 3
120 4
30 3
120 4
I have to make a number of lines according to duration with the class label in R. as an example, I have to make 100 rows with the class label 'W', and then 120 rows with the class label '2', etc.
anyone, can help me to solve this problem?
An option would be uncount
library(tidyr)
uncount(df1, duration, .remove = FALSE)
Or with rep from base R to replicate the sequence of rows by 'duration' column and expand the rows based on the numeric index
df1[rep(seq_len(nrow(df1)), df1$duration),]

Least Absolute Deviation in R [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 4 years ago.
I have a LIST of dataframes. Each dataframe has the same numer of rows and columns.
Here is a sample dataframe:
df
TIME AMOUNT
20 456
30 345
15 122
12 267
Here is the expected RESULT:
I would like to count the AMOUNT_NORM column where
each value in the AMOUNT column was divided by the sum of all values in the AMOUNT column.
df
TIME AMOUNT AMOUNT_NORM
20 456 0.38
30 345 0.29
15 122 0.1
12 267 0.22
The following should do what you want
library(tidyverse)
df %>% mutate(AMOUNT_NORM = AMOUNT/SUM(AMOUNT))
EDIT: didn't read the list of dataframes bit. in this case you just do:
lapply(your_df_list, function(x) {
x %>% mutate(AMOUNT_NORM = AMOUNT/SUM(AMOUNT))
})

R Concatenate column in data frame with one value/string [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 4 years ago.
I am trying to concatenate some data in a column of a df, with "0000"
I tried to use paste() in a loop, but it becomes very performance heavy, as I have +2.000.000 rows. Thus, it takes forever.
Is there a smart, less performance heavy way to do it?
#DF:
CUSTID VALUE
103 12
104 10
105 15
106 12
... ...
#Desired result:
#DF:
CUSTID VALUE
0000103 12
0000104 10
0000105 15
0000106 12
... ...
How can this be achieved?
paste is vectorized so it'll work with a vector of values (i.e. a column in a data frame. The following should work:
DF <- data.frame(
CUSTID = 103:107,
VALUE = 13:17
)
DF$CUSTID <- paste0('0000', DF$CUSTID)
Should give you
CUSTID VALUE
1 0000103 13
2 0000104 14
3 0000105 15
4 0000106 16
5 0000107 17

Random selection based on a variable in a R dataframe [duplicate]

This question already has answers here:
Take the subsets of a data.frame with the same feature and select a single row from each subset
(3 answers)
Closed 7 years ago.
I have a data frame with 1000 columns. It is a dataset of animals from different breeds. However I have more animals from some breeds. So what I want to do is to select a random sample of those breeds with more animals and make all breeds with the same number of observations.
In details: I have 400 Holstein animals, 300 Jersey, 100 Hereford and 150 Nelore and 50 Canchim. What I want to do is to randomly select 50 animals from each breed. So I would have a total of 250 animals at the end. I know how to randomly select using runif, however I am not sure how I can apply that in my case.
My data looks like:
Breed ID Trait1 Trait2 Trait3
Holstein 1 11 22 44
Jersey 2 22 33 55
Nelore 3 33 44 66
Nelore 4 44 55 77
Canchim 5 55 66 88
I have tried:
Data = data[!!ave(seq_along(data$Breed), unique(data$Breed), FUN=function(x) sample(x, 50) == x),]
However, it does not work and I am not allowed to install the package dplyr in the server that I am using.
Thank in advance.
You can split your animals data frame on the breed, and then apply a custom function to each chunk which will randomly extract 50 rows:
animals.split <- split(animals, animals$Breed)
animals.list <- lapply(animals.split, function(x) {
y <- x[sample(nrow(x), 50), ]
return(y)
}
result <- unsplit(animals.list, f = animals$Breed)

Sum multiple columns [duplicate]

This question already has an answer here:
Summarizing multiple columns with data.table
(1 answer)
Closed 3 years ago.
I am trying to write a function that will sum the column(s) in the data frame according to the values in the first two columns.For example I have a matrix M,
Crs gr P_7 P_8
38 1 3 16
38 1 12 45
38 1 9 28
40 2 3 9
40 2 14 29
40 1 4 3
40 2 8 2
I want to sum the columns according to column1(crs) first and then column2(gr). Result will be,
Crs gr P_7 P_8
38 1 24 89
40 2 25 40
40 1 4 3
Currently I am using,
M <- M[, list(sum(P_7),sum(P_8)), by=list(Crs,gr)]
But the problem with this, is that I have to define the names of columns which wont be fixed. So, I was wondering how can I do this without defining the names of the columns.
Thanks in advance!
You're looking for this:
M[, lapply(.SD, sum), by = list(Crs, gr)]
The package plyr has some magic for situations just like this. Use a combination of ddply and numcolwise, like this:
library(plyr)
ddply(dat, .(Crs, gr), numcolwise(sum))
results in:
Crs gr P_7 P_8
1 38 1 24 89
2 40 1 4 3
3 40 2 25 40

Resources