This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 4 years ago.
I am trying to divide all rows of my dataframe column by a number (say 10). I thought it to be a trivial problem until I tried it. In the example below, I am trying to get the 'mm' column to result in values 8100, 3222.2 and 5433.3
test <- data.frame(locations=c("81000","32222","54333"), value=c(87,54,43))
test$mm <- as.numeric(test$locations) / 10
head(test)
locations value mm
1 81000 87 0.3
2 32222 54 0.1
3 54333 43 0.2
What am I doing wrong?
Change factors to be character, then apply as.numeric
> test$mm <- as.numeric(as.character(test$locations)) / 10
> test
locations value mm
1 81000 87 8100.0
2 32222 54 3222.2
3 54333 43 5433.3
Related
This question already has answers here:
Limit result of subtraction to a minimum of zero
(1 answer)
Constrain lower limit of the result of a subtraction
(1 answer)
Closed 2 years ago.
I am a beginner in working with functions in R.
I would like to help to construct a simple function with the following example below:
database 1
a b
1 70
3 74
4 76
6 68
I would like to create a new column in this dataset with the following condition:
column c: I want to generate values base on (threshold 73) 73 column b values - 73 = difference (70 - 73 first row) but if the value is negative I want to put 0.
Like this:
database 2
a b c
1 70 0
3 74 1
4 76 3
6 68 0
please, does someone could show me any function?
Thanks!
You can try pmax
df$c <- pmax(df$b-73,0)
or
df$c <- (df$b-73)*(df$b>73)
df$c <- ifelse(df$b-73<0,0,df$b-73)
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have data.frame as follows :
duration classlabel
100 W
120 1
390 2
30 3
30 2
150 3
30 4
60 3
60 4
30 3
120 4
30 3
120 4
I have to make a number of lines according to duration with the class label in R. as an example, I have to make 100 rows with the class label 'W', and then 120 rows with the class label '2', etc.
anyone, can help me to solve this problem?
An option would be uncount
library(tidyr)
uncount(df1, duration, .remove = FALSE)
Or with rep from base R to replicate the sequence of rows by 'duration' column and expand the rows based on the numeric index
df1[rep(seq_len(nrow(df1)), df1$duration),]
This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 4 years ago.
I am trying to concatenate some data in a column of a df, with "0000"
I tried to use paste() in a loop, but it becomes very performance heavy, as I have +2.000.000 rows. Thus, it takes forever.
Is there a smart, less performance heavy way to do it?
#DF:
CUSTID VALUE
103 12
104 10
105 15
106 12
... ...
#Desired result:
#DF:
CUSTID VALUE
0000103 12
0000104 10
0000105 15
0000106 12
... ...
How can this be achieved?
paste is vectorized so it'll work with a vector of values (i.e. a column in a data frame. The following should work:
DF <- data.frame(
CUSTID = 103:107,
VALUE = 13:17
)
DF$CUSTID <- paste0('0000', DF$CUSTID)
Should give you
CUSTID VALUE
1 0000103 13
2 0000104 14
3 0000105 15
4 0000106 16
5 0000107 17
This question already has answers here:
Order of occurance of the same value in a vector
(1 answer)
Adding an repeated index for factors in data frame
(4 answers)
R create ID within a group [duplicate]
(2 answers)
Closed 5 years ago.
Say I have a long form data frame, of time series data, basically. It's going to look like this. Somewhere along my conversion of raw data the numbering got lost, and so I'd like to get back a column of frame numberings (starting from 1).
The $frame column is my desired output.
Edit: Newly added NaN values in my example, see comments below. Also changed title of question to reflect this specifically.
name value frame
A 41 1
A NaN 2
A 72 3
B 24 1
B 51 2
C 28 1
C NaN 2
C 57 3
C NaN 4
C 34 5
D 24 1
D 75 2
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Hey I have some data looks like this:
ExpNum Compound Peak Tau SS
1 a 100 30 50
2 a 145 23 45
3 b 78 45 56
4 b 45 43 23
5 c 344 23 56
Id like to fund the mean based on Compound name
What I have
Norm_Table$Norm_Peak = (aggregate(data[[3]],by=list(Compound),FUN=normalization))
This is fine and I have this coding repeating 3 times just changing the data[[x]] number. Would lapply work here? or a for loop?
A dplyr solution:
library(dplyr)
data %>%
group_by(Compound) %>%
summarize_each(funs(mean), -ExpNum)