This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have data.frame as follows :
duration classlabel
100 W
120 1
390 2
30 3
30 2
150 3
30 4
60 3
60 4
30 3
120 4
30 3
120 4
I have to make a number of lines according to duration with the class label in R. as an example, I have to make 100 rows with the class label 'W', and then 120 rows with the class label '2', etc.
anyone, can help me to solve this problem?
An option would be uncount
library(tidyr)
uncount(df1, duration, .remove = FALSE)
Or with rep from base R to replicate the sequence of rows by 'duration' column and expand the rows based on the numeric index
df1[rep(seq_len(nrow(df1)), df1$duration),]
Related
This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 4 years ago.
I am trying to concatenate some data in a column of a df, with "0000"
I tried to use paste() in a loop, but it becomes very performance heavy, as I have +2.000.000 rows. Thus, it takes forever.
Is there a smart, less performance heavy way to do it?
#DF:
CUSTID VALUE
103 12
104 10
105 15
106 12
... ...
#Desired result:
#DF:
CUSTID VALUE
0000103 12
0000104 10
0000105 15
0000106 12
... ...
How can this be achieved?
paste is vectorized so it'll work with a vector of values (i.e. a column in a data frame. The following should work:
DF <- data.frame(
CUSTID = 103:107,
VALUE = 13:17
)
DF$CUSTID <- paste0('0000', DF$CUSTID)
Should give you
CUSTID VALUE
1 0000103 13
2 0000104 14
3 0000105 15
4 0000106 16
5 0000107 17
This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 4 years ago.
I am trying to divide all rows of my dataframe column by a number (say 10). I thought it to be a trivial problem until I tried it. In the example below, I am trying to get the 'mm' column to result in values 8100, 3222.2 and 5433.3
test <- data.frame(locations=c("81000","32222","54333"), value=c(87,54,43))
test$mm <- as.numeric(test$locations) / 10
head(test)
locations value mm
1 81000 87 0.3
2 32222 54 0.1
3 54333 43 0.2
What am I doing wrong?
Change factors to be character, then apply as.numeric
> test$mm <- as.numeric(as.character(test$locations)) / 10
> test
locations value mm
1 81000 87 8100.0
2 32222 54 3222.2
3 54333 43 5433.3
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Hey I have some data looks like this:
ExpNum Compound Peak Tau SS
1 a 100 30 50
2 a 145 23 45
3 b 78 45 56
4 b 45 43 23
5 c 344 23 56
Id like to fund the mean based on Compound name
What I have
Norm_Table$Norm_Peak = (aggregate(data[[3]],by=list(Compound),FUN=normalization))
This is fine and I have this coding repeating 3 times just changing the data[[x]] number. Would lapply work here? or a for loop?
A dplyr solution:
library(dplyr)
data %>%
group_by(Compound) %>%
summarize_each(funs(mean), -ExpNum)
This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 7 years ago.
How to sort a matrix based on columns and its values in R?
For Example :
I have a matrix like this :
ID Name Number
1 Bat 43
2 Apple 42
4 Dog 41
5 Ball 41
6 Cat 40
I want to sort the matrix based on the values of the column Number. If two values are same then it should sort based on the column Name. The exepcted output should be
ID Name Number
6 Cat 40
5 Ball 41
4 Dog 41
2 Apple 42
1 Bat 43
Since, Ball and Dog has same value for the column Number . They are sorted according to the column Name(that is alphabetically). Can someone help me in doing this?
using order:
df[with(df, order(Number, Name)), ]
This question already has answers here:
Sum of rows based on column value
(4 answers)
Closed 8 years ago.
I have a dataframe as such:
Response Spent Saved
1 Yes 100 25
2 Yes 200 50
3 No 20 2
4 No 13 3
I would like to sum up the amounts Spent and Saved, depending on the Response, ie:
Response Spent Saved
1 Yes 300 75
2 No 33 5
Right now, I am using a hackneyed approach, where I subset the dataframe into 2 new dataframes, convert the 2nd and 3rd columns into numeric data, do a colSums on each column individually, then save the outputs into a vector, then create a new dataframe....suffice to say it is a terrible approach.
How could I do this is a more effective manner?
Thanks for reading
Check ?aggregate
If your data.frame is DF, following should do what you want.
aggregate(. ~ Response, data = DF, FUN = sum)
## Response Spent Saved
## 1 No 33 5
## 2 Yes 300 75