Select column dynamically based on value from another column in R [duplicate] - r

This question already has answers here:
Select values from different columns based on a variable containing column names [duplicate]
(2 answers)
Closed 4 years ago.
How do I use column in data table as variable name to fetch values from other columns based on the said column.
library(data.table)
a = c(2,3,5)
b = c(5,7,7)
c = c(1,2,3)
x = c ('a','b','c')
dt <- data.table(a,b,c,x)
> dt
a b c x
1: 2 5 1 a
2: 3 7 2 b
3: 5 7 3 c
output I desire column y which is based on values of column x which contains the column names of values to be fetched.
dt
a b c x y
1: 2 5 1 a 2
2: 3 7 2 b 7
3: 5 7 3 c 3
I tried
dt[,get(x)]
dt[,match(x,colnames(dt))]

By looping through the sequence of rows, extract the value with get and assign it to create 'y'
dt[, y := .SD[, get(x), seq_len(.N)]$V1]
dt
# a b c x y
#1: 2 5 1 a 2
#2: 3 7 2 b 7
#3: 5 7 3 c 3

Related

Repeat rows with a variable in r [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have a data.frame with n rows and I would like to repeat this rows according to the observation of another variable
This is an example for a data.frame
df <- data.frame(a=1:3, b=letters[1:2])
df
a b
1 1 a
2 2 b
3 3 c
And this one is an example for a variable
df1 <- data.frame(x=1:3)
df1
x
1 1
2 2
3 3
In the next step I would like to repeat every row from the df with the observation of df1
So that it would look like this
a b
1 1 a
2 2 b
3 2 b
4 3 c
5 3 c
6 3 c
If you have any idea how to solve this problem, I would be very thankful
You simply can repeat the index like:
df[rep(1:3,df1$x),]
# a b
#1 1 a
#2 2 b
#2.1 2 b
#3 3 c
#3.1 3 c
#3.2 3 c
or not fixed to size 3
df[rep(seq_along(df1$x),df1$x),]

In data.table in R, how can we create an sequenced indicator variable by the values of two columns? [duplicate]

This question already has answers here:
data.table "key indices" or "group counter"
(2 answers)
Create a new data frame column based on the values of two other columns
(2 answers)
Closed 4 years ago.
In the data.table package in R, for a given data table, I am wondering how an indicator index can be created for the values that are the same in two columns. For example, for the following data table,
> M <- data.table(matrix(c(2,2,2,2,2,2,2,5,2,5,3,3,3,6), ncol = 2, byrow = T))
> M
V1 V2
1: 2 2
2: 2 2
3: 2 2
4: 2 5
5: 2 5
6: 3 3
7: 3 6
I would like to create a new column that essentially orders the values that are the same for each row of the two columns, so that I can get something like:
> M
V1 V2 Index
1: 2 2 1
2: 2 2 1
3: 2 2 1
4: 2 5 2
5: 2 5 2
6: 3 3 3
7: 3 6 4
I essentially would like to repeat values of .N above, is there a nice way to do it?
We can use .GRP after grouping by 'V1' and 'V2'
M[, Index := .GRP, .(V1, V2)]

Sort a data.table programmatically using character vector of multiple column names

I need to sort a data.table on multiple columns provided as character vector of variable names.
This is my approach so far:
DT = data.table(x = rep(c("b","a","c"), each = 3), y = c(1,3,6), v = 1:9)
#column names to sort by, stored in a vector
keycol <- c("x", "y")
DT[order(keycol)]
x y v
1: b 1 1
2: b 3 2
Somehow It displays just 2 rows and removes other records. But if I do this:
DT[order(x, y)]
x y v
1: a 1 4
2: a 3 5
3: a 6 6
4: b 1 1
5: b 3 2
6: b 6 3
7: c 1 7
8: c 3 8
9: c 6 9
It works like fluid.
Can anyone help with sorting using column name vector?
You need ?setorderv and its cols argument:
A character vector of column names of x by which to order
library(data.table)
DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
#column vector
keycol <-c("x","y")
setorderv(DT, keycol)
DT
x y v
1: a 1 4
2: a 3 5
3: a 6 6
4: b 1 1
5: b 3 2
6: b 6 3
7: c 1 7
8: c 3 8
9: c 6 9
Note that there is no need to assign the output of setorderv back to DT. The function updates DT by reference.

split a dataframe with numbers separated by the add sign '+' into new rows [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
Sorry for the naive question but I have a dataframe like this:
n sp cap
1 1 a 3
2 2 b 3+2+4
3 3 c 2
4 4 d 1+5
I need to split the numbers separated by the add sign ("+") into new rows in order to the get a new dataframe like this below:
n sp cap
1 1 a 3
2 2 b 3
3 2 b 2
4 2 b 4
5 3 c 2
6 4 d 1
7 4 d 5
How can I do that? strsplit?
thanks in advance
We could use cSplit from splitstackshape
library(splitstackshape)
cSplit(df1, 'cap', sep="+", 'long')
# n sp cap
#1: 1 a 3
#2: 2 b 3
#3: 2 b 2
#4: 2 b 4
#5: 3 c 2
#6: 4 d 1
#7: 4 d 5
Or could do this in base R. Use strsplit to split the elements of "cap" column to substrings, which returns a list (lst), Replicate the rows of dataset by the length of each list element, subset the dataset based on the new index, convert the "lst" elements to "numeric", unlist, and cbind with the modified dataset.
lst <- strsplit(as.character(df1$cap), "[+]")
df2 <- cbind(df1[rep(1:nrow(df1), sapply(lst, length)),1:2],
cap= unlist(lapply(lst, as.numeric)))

R delete non max values in redundant rows

I have a matrix that contains following:
A B C D
a 1 3 2 5
b 3 2 5 8
a 2 1 0 9
a 4 2 1 3
c 4 3 1 1
b 2 5 1 9
A, B, C, D are column names and
a, b, c, d are row names.
I want to make it look like
A B C D
a 4 3 2 9
b 3 5 5 9
c 4 3 1 1
using R, Which is to
1) order the row in alphabetical order,
2) and then if there are redundant rows (i.e. there are other rows with the same row name), pick a maximum value among the redundant rows for each column and delete the others.
I first used python to do this process, but I was wondering if there is
more convenient way for this job in R.
I would appreciate any help.
You can use data.table
dt_in <- data.table(matrix_in)
dt_in[, name := rownames(matrix_in)]
dt_max <- dt_in[, list(A = max(A), B = max(B), C = max(C), D = max(D)), by = "name"]
as.matrix(data.frame(dt_max))
Here's a one liner using data.table you can keep the rows while converting to data.table and then apply max function over all columns using lapply(.SD,...) by the rn variable (the saved row names)
library(data.table)
data.table(m, keep.rownames = TRUE)[, lapply(.SD, max), by = rn]
# rn A B C D
# 1: a 4 3 2 9
# 2: b 3 5 5 9
# 3: c 4 3 1 1
You can simply use aggregate function:
aggregate(matrix ~ rownames(matrix), matrix, max)

Resources