This question already has answers here:
Using row-wise column indices in a vector to extract values from data frame [duplicate]
(2 answers)
Closed 3 years ago.
I'm looking to create a new variable, d, which grabs the value from either an or b based off of the variable C.
dat = data.frame(a=1:10,b=11:20,c=rep(1:2,5))
The result would be:
d = c(1,12,3,14,... etc)
We can use a row/column indexing where the row index is the sequence of rows and column index the 'c' column, cbind them and extract the elements from the dataset based on this
dat$d <- dat[1:2][cbind(seq_len(nrow(dat)), dat$c)]
dat$d
#[1] 1 12 3 14 5 16 7 18 9 20
NOTE: This should also work when there are multiple column values to extract.
You can do
dat$d <- ifelse(dat$c==1,dat$a,dat$b)
A dplyr variant
dat %>%
mutate(d = case_when(c==1 ~ a,
TRUE ~ b))
Related
This question already has answers here:
R Create column which holds column name of maximum value for each row
(4 answers)
Closed 1 year ago.
Say we have the following matrix,
x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
What I'm trying to do is:
1- Find the maximum value of each row. For this part, I'm doing the following,
df <- apply(X=x, MARGIN=1, FUN=max)
2- Then, I want to extract the column names of the maximum values and put them next to the values. Following the reproducible example, it would be "C" for the three rows.
Any assistance would be wonderful.
You can use apply like
maxColumnNames <- apply(x,1,function(row) colnames(x)[which.max(row)])
Since you have a numeric matrix, you can't add the names as an extra column (it would become converted to a character-matrix).
You can choose a data.frame and do
resDf <- cbind(data.frame(x),data.frame(maxColumnNames = maxColumnNames))
resulting in
resDf
A B C maxColumnNames
X 1 4 7 C
Y 2 5 8 C
Z 3 6 9 C
This question already has answers here:
Sum elements of a vector beween zeros in R
(3 answers)
Closed 2 years ago.
I want to add values from a column. They go in sequence:
0,225,2352,34234,23442,23456,0,123,...
I want to add the values from 0 until the following 0 but not including the second.
For example, i want an output of
(0+225+2352+34234+23442+23456),(0+123+,...,),...
I want to store them as a new column of totals
One simple solution in base R is
sapply(split(x, cumsum(x == 0)), sum)
With split you basically create groups of elements that you want to sum together using sapply. The final result will be a named numeric vector.
Sample data
x <- c(0,225,2352,34234,23442,23456,0,123,2,0,1,42)
sapply(split(x, cumsum(x == 0)), sum)
# 1 2 3
# 83709 125 43
This question already has answers here:
Convert row names into first column
(9 answers)
Closed 6 years ago.
I have following data frame:
RMSE
A 0.03655830
B 0.24513014
C 0.02009853
D 0.02223135
I want to move column that has A,B,C,D to be the first column and add an index to the data.frame.
try this:
df <- cbind(newColName = rownames(df), df)
rownames(df) <- 1:nrow(df)
hope this is what you meant, the result will be:
newColName RMSE
1 A 0.03655830
2 B 0.24513014
3 C 0.02009853
4 D 0.02223135
This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 7 years ago.
I would like to remove rows that have specific values for columns that match values in another data frame.
a<-c(1,1,2,2,2,4,5,5,5,5)
b<-c(10,10,22,30,30,30,40,40,40,40)
c<-c(1,2,1,2,2,2,2,1,1,2)
d<-rnorm(1:10)
data<-data.frame(a,b,c,d)
a<-c(2,5)
b<-c(30,40)
c<-c(2,1)
x<-data.frame(a,b,c)
So that y can become:
a b c d
1 10 1 -0.2509255
1 10 2 0.4142277
2 22 1 -0.1340514
4 30 2 -1.5372009
5 40 2 1.9001932
5 40 2 -1.2825212
I tried the following, which did not work:
y<-data[!data$a==a & !data$b==b & !data$c==c,]
y<-subset(data, !data$a==x$a & !data$b==x$b & !data$c==x$c)
I also tried to just flag the ones that should be removed in order to subset in a second step, but this did not work either:
y<-data
y$rm<-ifelse(y$a==x$a & y$b==x$b & y$c==x$c, 1, 0)
The real "data" and "x" are much longer, and there are variable number of rows in data that match each row in x.
We can use anti_join from dplyr. It will return all rows from 'data' that are not matching values in 'x'. We specify the variables to be considered in the by argument.
library(dplyr)
anti_join(data, x, by=c('a', 'b', 'c'))
This question already has answers here:
Extract matrix column values by matrix column name
(2 answers)
Closed 7 years ago.
In R I can access the data in a column vector of a column matrix by the following:
mat2[,1]
Each column of mat2 has a name. How can I retrieve the data from the first column by using the name attribute instead of [,1]?
For example suppose my first column had the name "saturn". I want something like
mat2[,1] == mat2[saturn]
The following should do it:
mat2[,'saturn']
For example:
> x <- matrix(1:21, nrow=7, ncol=3)
> colnames(x) <- paste('name', 1:3)
> x[,'name 1']
[1] 1 2 3 4 5 6 7
Bonus information (adding to the first answer)
x[,c('name 1','name 2')]
would return two columns just as if you had done
x[,1:2]
And finally, the same operations can be used to subset rows
x[1:2,]
And if rows were named...
x[c('row 1','row 2'),]
Note the position of the comma within the brackets and with respect to the indices.