extract specefic colum based on character value in a row - r

I have a data that contains character in the first row like this
J K L M N O P
A T F T F F F T
B 14 15 10 2 3 4 78
C 10 47 15 9 6 12 12
D 17 44 17 1 0 15 11
E 3 12 14 3 2 15 17
i want to extract only the columns that contain the value "T" in row A
so the result i want is this :
J L P
A T T T
B 14 10 78
C 10 15 12
D 17 17 11
E 3 14 17
also, in second time, i want to know how to do the same thing using two conditions, for example : extract all columns that contain value "T" in column A and value 17 in row D so the result will be :
J L
A T T
B 14 10
C 10 15
D 17 17
E 3 14
Thank you

Here is your answer.
> df <- df[, df["A",] == "T" & df["D",] == 17]
You can use index to filter columns. It supports logical statements and you can combine them with &.

Related

R : Change name of variable using for loop

I have a data, and vectors conatin name of variables, from these vectorsi calculate the sum of variables contained in the vector and i want to put the result in a new variables that have diffrent names
let say i have three vectors
>data
Name A B C D E
r1 1 5 12 21 15
r2 2 4 7 10 9
r3 5 15 6 9 6
r4 7 8 0 7 18
And i have these vectors that are generated using for loop that are in variable vec
V1 <- ("A","B","C")
V2 <- ("B","D")
V3 <- ("D","E")
Edit 1 :
These vector are generated using for loop and i don't know the vectors that will be generated or the elemnts contained in these vector , here i'm giving just an example , i want to calculate the sum of variables in each vector and make the result in new variable in my data frame
The issue is don't know how to give new name to variables created (that contains the sum of each vector)
data$column[j] <- rowSums(all_data_Second_program[,vec])
j <- j+1
To obtain this result for example
Name A B C Column1 D Column2 E Column3
r1 1 5 12 18 21 26 15 36
r2 2 4 7 13 10 14 9 19
r3 5 15 6 26 9 24 6 15
r4 7 8 0 15 7 15 18 25
But i didn't obtain this result
Please tell me if you need any more informations or clarifications
Can you tell me please how to that
Put the vectors in a list and then you can use rowSums in lapply -
list_vec <- list(c("A","B","C"), c("B","D"), c("D","E"))
new_cols <- paste0('Column', seq_along(list_vec))
data[new_cols] <- lapply(list_vec, function(x) rowSums(data[x]))
data
# Name A B C D E Column1 Column2 Column3
#1 r1 1 5 12 21 15 18 26 36
#2 r2 2 4 7 10 9 13 14 19
#3 r3 5 15 6 9 6 26 24 15
#4 r4 7 8 0 7 18 15 15 25
We may use a for loop
for(i in 1:3) {
data[[paste0('Column', i)]] <- rowSums(data[get(paste0('V', i))],
na.rm = TRUE)
}
-output
> data
Name A B C D E Column1 Column2 Column3
1 r1 1 5 12 21 15 18 26 36
2 r2 2 4 7 10 9 13 14 19
3 r3 5 15 6 9 6 26 24 15
4 r4 7 8 0 7 18 15 15 25

Replace a column in a datatable by another column using dynamic indexing

Similar to Replace a value in a datatable by giving the column index,
I'd like to replace a column in a data.table by another column in the same data.table using column indexes only. (yes, withstanding the fact that this is generally not a good practice. In my case, it is the only way)
DT <- data.table(A=1:5, B=6:10, C=10:14)
and I want
DT[, A:=C]
but not using A and C. only their index numbers 1 and 3.
Edit: needed to elaborate a bit more on my use-case. I have multiple columns that need to be replaced by multiple other columns. The replacements are indicated by two columns in the data.table.
DT <- data.table(A=1:5
, B=6:10
, C=10:14
, D=15:19
, E=20:24
, F=25:29
, G=c(1,2,NA,NA,NA)
, H=c(3,4,NA,NA,NA))
> DT
A B C D E F G H
1: 1 6 10 15 20 25 1 3 # --> column 1 (A) should be replaced by column 3 (C)
2: 2 7 11 16 21 26 2 4 # --> column 2 (B) should be replaced by column 4 (D)
3: 3 8 12 17 22 27 NA NA
4: 4 9 13 18 23 28 NA NA
5: 5 10 14 19 24 29 NA NA
Column G indicates the columns that need to be replaced. Column H indicates the columns that would replace those indicated in column G. Dealing with a data.table of a few thousand columns. and I know the names of columns H and G, so they don't need to be dynamic.
desired outputs:
> desired_output1:
A B C D E F G H
1: 10 15 10 15 20 25 1 3 #all of column A was replaced by column C
2: 11 16 11 16 21 26 2 4 #all of column B was replaced by column D
3: 12 17 12 17 22 27 NA NA
4: 13 18 13 18 23 28 NA NA
5: 14 19 14 19 24 29 NA NA
> desired_output2:
A B C D E F G H
1: 10 6 10 15 20 25 1 3 # col A for this row was replaced by col C
2: 2 16 11 16 21 26 2 4 # col B for this row was replaced by col D
3: 3 8 12 17 22 27 1 2
4: 4 9 13 18 23 28 NA NA
5: 5 10 14 19 24 29 NA NA
Well I don't think there is really any elegant way to accomplish this other than looping the assign statement. So basically you will need to use DT[["G"]][i] for the ith column to be replaced and then DT[["H"]][i] for the replacement column using list notation. In data.table you can refer to the column to be replaced by a number but to get the replacement values you will need to use DT[[DT[["H"]][i]]] which for i=1 would be DT[[3]]. Putting everything together inside an lapply loop would give you the following:
lapply(seq_along(na.omit(DT[["G"]])),function(i) DT[,DT[["G"]][i]:=DT[[DT[["H"]][i]]]])
Since columns G and H will either both contain values or both be NA you can just choose one for the index in lapply in which I chose G. However, make sure that the NA values are at the end of the columns or the seq_along will give you bad indices when executing the loop. I assume based on your description that this will be the case.
Since you really don't care about the list produced by the lapply but only using it as a more efficient for loop you can suppress the output to the console (which may get annoying if you have thousands of columns to change) by wrapping the above with an invisible if you wish:
invisible(lapply(seq_along(na.omit(DT[["G"]])),function(i) DT[,DT[["G"]][i]:=DT[[DT[["H"]][i]]]]))
Hope this helps some!

reordering selected column of data.table in r

I have a data table dt[] which contains 500 columns, I need to pick 6 columns say (a,c,k,m,n,o) from the data table and put them in the starting of the data table.
Is there any way of doing this ?
We can create a vector of columns of interest ('nm1'), then concatenate that with the column names that are found in 'nm1' (using setdiff. In data.table, for subsetting columns, we use with = FALSE.
nm1 <- c('a', 'c' 'k', 'm', 'n', 'o')
dt[, c(nm1, setdiff(names(dt1), nm1)), with=FALSE]
Other option include setcolorder, but the above method is more convenient as it will not replace the order in the original dataset.
NOTE: No external packages used.
Whether dealing with data.frames or data.tables, I would suggest loading "data.table" and using setcolorder.
Paired up with moveMe from my "SOfun" package, you have a very flexible means of reordering columns.
Loading package and creating sample data:
library(SOfun)
library(data.table)
DT <- as.data.table(as.list(setNames(1:26, letters)))
DF <- setDF(copy(DT))
DT
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
DF
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Moving columns:
setcolorder(DT, moveMe(names(DT), "a,c,k,m,n,o first"))
DT
# a c k m n o b d e f g h i j l p q r s t u v w x y z
# 1: 1 3 11 13 14 15 2 4 5 6 7 8 9 10 12 16 17 18 19 20 21 22 23 24 25 26
setcolorder(DF, moveMe(names(DF), "a,c,k,m,n,o first"))
DF
# a c k m n o b d e f g h i j l p q r s t u v w x y z
# 1 1 3 11 13 14 15 2 4 5 6 7 8 9 10 12 16 17 18 19 20 21 22 23 24 25 26
Beyond "first", you also have "last", "before", and "after".
setcolorder(DF, moveMe(names(DF), "a,c,k,m,n,o first; l,e,q,r,w last"))
DF
# a c k m n o b d f g h i j p s t u v x y z l e q r w
# 1 1 3 11 13 14 15 2 4 6 7 8 9 10 16 19 20 21 22 24 25 26 12 5 17 18 23

Combining a list of named vectors without mangling the names

How do I combine a list of named vectors? I need to split a vector of integers (with characters for names) for use with parallel::parSapply() and combine them back again. Example code:
text <- 1:26
names(text) <- letters
n <- 4
text <- split(text, cut(1:length(text),breaks=n,labels=1:n))
# text <- parSapply(..., text, ...) would go here in the actual code
However, the names get mangled when I use unlist to convert the data back into a named vector:
> unlist(text)
1.a 1.b 1.c 1.d 1.e 1.f 1.g 2.h 2.i 2.j 2.k 2.l 2.m 3.n 3.o 3.p 3.q 3.r 3.s 4.t 4.u 4.v
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
4.w 4.x 4.y 4.z
23 24 25 26
What I'm looking for is the following result (except that it should work with any value of n):
> c(text[[1]],text[[2]],text[[3]],text[[4]])
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
One option without changing the structure of 'text' would be to change the names of the vector (unlist(text)) with the names of onjects within the list elements.
setNames(unlist(text), unlist(sapply(text, names)))
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Or if it is okay to remove the names of the 'text' object, set the names of 'text' to NULL and then unlist
unlist(setNames(text, NULL))
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
You can remove the list elements names first then there won't be compound naming happening.
> names(text) <- NULL
> do.call(c, text)
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Same as
> unlist(text)
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Or as #RichardScriven pointed out in the comment, you can do it as follows without removing the name in the source variable: do.call("c", c(text, use.names = FALSE))

matrix addition - Multiple unique identifiers

AM trying to add elements from two different matrices, Each of the matrix has got three unique identifiers as below:
Matrix A:
A B C D E F G H
1 x 1 2 10 11 12 13 10
2 y 1 2 11 12 14 12 13
3 y 1 3 12 10 11 12
The second matrix look like:
A B C D E F G H
1 x 1 2 20 14 17 10 10
2 y 1 2 11 12 14 12 13
3 y 1 3 17 10 19 12
Please note that the variables A, B, and D form unique identifiers for each of the participants.
I would wish to write a code so that as I sum the matrix values I consider this.
You should your data in the long format.
library(reshape2)
dat.l <- melt(dat,id=c('A','B','D'))
dat1.l <- melt(dat1,id=c('A','B','D'))
Then you just sum value :
dat.l$value = dat.l$value + dat1.l$value

Resources