R find row value based on value in another column without ifelse - r

I have a data.frame built up as follows:
a b c d column_name
1 2 3 4 a
2 3 4 1 b
3 4 1 2 c
4 1 2 3 d
Now I want to get the value for each row, of the column that matches the name in column_name. I build this with an ifelse like so:
df$value <- ifelse(df$column_name=="a", df$a,
ifelse(df$column_name=="b", df$b,
ifelse(df$column_name=="c", df$c,
ifelse(df$column_name=="d", df$d, "NA"))))
However this is not very pretty and efficient. With more then 4 possible columns it becomes impossible to use.
Does anyone know a more efficient and beautiful way? I tried apply(), but couldn't get it to work.

We can create a column index by matching the 'column_name' with the column names of the dataset (match(df$column_name, colnames(df))), cbind it with the row index (1:nrow(df)), extract the elements of 'df' based on this and assign (<-) it to create the 'value' column.
df$value <- df[-ncol(df)][cbind(1:nrow(df), match(df$column_name, colnames(df)))]
df$value
#[1] 1 3 1 3

Related

To count how many times one row is equal to a value

To count how many times one row is equal to a value
I have a df here:
df <- data.frame('v1'=c(1,2,3,4,5),
'v2'=c(1,2,1,1,2),
'v3'=c(NA,2,1,4,'1'),
'v4'=c(1,2,3,NaN,5),
'logical'=c(1,2,3,4,5))
I would like to know how many times one row is equal to the value of the variable 'logical' with a new variable 'count'
I wrte a for loop like this:
attach(df)
df$count <- 0
for(i in colnames(v1:v4)){
if(df$logical == i){
df$count <- df$count+1}
}
but it doesn't work. there's still all 0 in the new variable 'count'.
Please help to fix it.
the perfect result should looks like this:
df <- data.frame('v1'=c(1,2,3,4,5),
'v2'=c(1,2,1,1,2),
'v3'=c(NA,2,1,4,'1'),
'v4'=c(1,2,3,NaN,5),
'logical'=c(1,2,3,4,5),
'count'=c(3,4,2,2,2))
Many thanks from a beginner.
We can use rowSums after creating a logical matrix
df$count <- rowSums(df[1:4] == df$logical, na.rm = TRUE)
df$count
#[1] 3 4 2 2 2
Personally I guess so far the solution by #akrun is an elegant and also the best efficient way to add the column count.
Another way (I don't know if that is the one you are looking for the "elegance") you can used to "attach" the column the count column to the end of df might be using within, i.e.,
df <- within(df, count <- rowSums(df[1:4]==logical,na.rm = T))
such that you will get
> df
v1 v2 v3 v4 logical count
1 1 1 <NA> 1 1 3
2 2 2 2 2 2 4
3 3 1 1 3 3 2
4 4 1 4 NaN 4 2
5 5 2 1 5 5 2

Is there a way I can automate several which.max functions on a dataframe?

Say I have a dataframe of:
DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
>DF
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4
I know you can return the rowname of the highest value for a single column by using:
which.max(DF[,1])
but would it be possible to use which match the return the rowname of the highest value in each column without manually typing a which.max for each column in the dataframe?
Use a loop
sapply(DF, which.max)

Assigning values to correlative series in r

I hope you can help me with this issue I have.
I have a big dataframe, to simplify it, it look like this:
df <- data.frame(radius = c (2,3,5,7,4,6,9,8,3,7,8,9,2,4,5,2,6,7,8,9,1,10,8))
df$num <- c(1,2,3,4,5,6,7,8,9,10,11,1,12,13,1,14,15,16,17,18,19,1,1)
df
The column $num has correlative series (1-11, 1, 12-13, 1, 14-19,1,1)
I would like to assign a value (sorted) per each correlative serie as a column. the outcome should be like this:
df$outcome <- c(1,1,1,1,1,1,1,1,1,1,1,2,3,3,4,5,5,5,5,5,5,6,7)
df
thanks a lot!
A.
We can get the difference between adjacent elements in 'num' using diff and check whether it is not equal to 1. The logical output will be one less than the length of the 'num' vector. We pad with 'TRUE' and cumsum to get the expected output.
df$outcome <- cumsum(c(TRUE,diff(df$num)!=1))
df$outcome
#[1] 1 1 1 1 1 1 1 1 1 1 1 2 3 3 4 5 5 5 5 5 5 6 7

Sum of cells with same row and column name in R

I have a matrix created using table() command in R in which rows and columns do not have same values.
0 1 2
1 1 2 3
2 4 5 6
3 7 7 8
How can I sum the elements with the same row and column name? In this example it is equal to (2+6=)8.
Here's one approach:
# find the values present in both row names and column names
is <- do.call(intersect, unname(dimnames(x)))
# calculate the sum
sum(x[cbind(is, is)])
where x is your table.
Another one, self-explanatory:
sum(x[colnames(x)[col(x)] == rownames(x)[row(x)]])

How to match 1 column to 2 columns?

I'm trying to match numbers from one column to numbers in two other columns. I can do this just fine when matching to only a single column, but have problems extending to two columns. Here is what I am doing:
I have 2 dataframes, df1:
number value
1
2
3
4
5
and df2:
number_a number_b value
3 3
1 5
5 1
4 2
2 4
What I want to do is match column "number" from df1 to EITHER "number_a" or number_b" in df2, then insert "value" from df2 into "value" of df1, to give the result df1 as:
number value
1 5
2 4
3 3
4 2
5 1
My approach is to use
df1$value <- df2$value[match(df1$number, df2$number_a)]
or
df1$value <- df2$value[match(df1$number, df2$number_b)]
which yields, respectively, for df1
number value
1 NA
2 NA
3 3
4 NA
5 1
and
number value
1 5
2 4
3 NA
4 2
5 NA
However, I can't seem to fill in all of the "value" column in df1 using this approach. How can I match "number" to "number_a" and "number_b" in one fell swoop. I tried
df1$value <- df2$value[match(df1$number, df2$number_a:number_b)]
but that didn't work.
Thanks!
Easier solution:
df2$number <- ifelse(is.na(df2$number_a), df2$number_b, df2$number_a)
If you're not familiar with ifelse, it works with vectors in the form:
ifelse(Condition, ValueIfTrue, ValueIfFalse)
I am a newbie to R (coming from several years with C). Was trying out the suggestions and I thought I would paste what I came up with:
// Assuming either 'number_a' or 'number_b' is valid
// Combine into new column 'number' and delete them original columns
df2 <- transform(df2, number = ifelse(is.na(df2$number_a), df2$number_b,
df2$number_a))[-c(1:2)]
// Combine the two data frames by the column 'number'
df <- merge(df1, df2, by = "number")
number value
1 5
2 4
3 3
4 2
5 1

Resources