Sorting data frame by character string

Sorting data frame by character string - r

I have a data frame and need to sort its columns by a character string.
I tried it like this:
# character string
a <- c("B", "E", "A", "D", "C")
# data frame
data <- data.frame(A = c(0, 0, 1), B = c(1, 1, 1), C = c(1, 0, 1), D = c(0, 0, 1), E = c(0, 1, 1))
data
# A B C D E
# 1 0 1 1 0 0
# 2 0 1 0 0 1
# 3 1 1 1 1 1
# sorting
data.sorted <- data[, order(a)]
# order of characters in data
colnames(data.sorted)
# [1] "C" "A" "E" "D" "B"
However, the order of columns in the sorted data frame is not the same as the characters in the original character string.
Is there any way, how to sort it?

The function order(a) returns the position in the vector a that each ranked value lies in. So, since "A" (ranked first) lies in the third position of a, a[1] is equal to 3. Similarly "C" (ranked third) lies in the fifth position of a, then a[3] equals 5.
Luckily your solution is actually even more simple, thanks to the way R works with brackets. If you ask to see just the column named "B" you'll get:
> data[, "B", drop=FALSE]
B
1 1
2 1
3 1
Or if you want two specific columns
> data[, c("B", "E")]
B E
1 1 0
2 1 1
3 1 1
And finally, more generally, if you have a whole vector by which you want to order your columns, then you can do that, too:
> data.sorted <- data[, a]
> data.sorted
B E A D C
1 1 0 0 0 1
2 1 1 0 0 0
3 1 1 1 1 1
> all(colnames(data.sorted)==a)
[1] TRUE

string[] str = { "H", "G", "F", "D", "S","A" };
Array.Sort(str);
for (int i = 0; i < str.Length; i++)
{
Console.WriteLine(str[i]);
}
Console.ReadLine();

Related

Find the number of items row wise with an exception

I have a set of the following form:-
a <- data.frame(X1=c("A", "B", "C", "D", "0"),
X2=c("B", "A", "D", "E", "A"),
X3=c("0", "0", "B", "A", "0"),
X4=c("A", "0", "A", "0", "0")
)
# a
# X1 X2 X3 X4
# A B 0 A
# B A 0 0
# C D B A
# D E A 0
# 0 A 0 0
What I want to know if in each row how many items are there except "0" and save them in a new column. The expected output should be :-
# b
# 3
# 2
# 4
# 3
# 1
Duplicates should be counted as different, ie, if a row consists of 2 "A", 1 "B" and a "0", it should return 3. Thanks in advance.

We could compare the dataframe with 0 and use rowSums to calculate number of entries except 0 in each row.
rowSums(a != 0)
#[1] 3 2 4 3 1
Although, it is not needed here (since applying rowSums is straight-forward) we can also use apply row-wise :
apply(a!= 0 , 1, sum)

If you have single character in each cell of data frame a, then here is a base R option. Otherwise (if you have have any multiple characters in some cells), please turn to the approach by #Ronak Shah
a$b <- nchar(gsub("0","",do.call(paste0,a)))
such that
> a
X1 X2 X3 X4 b
1 A B 0 A 3
2 B A 0 0 2
3 C D B A 4
4 D E A 0 3
5 0 A 0 0 1

We can use lengths with split
lengths(split(a[a!=0], row(a)[a != 0]))

Create a vector or a list with the content of several columns

I want to create a vector or a list in a dataset concatenating the scores of several columns in another dataset.
I can do it like this:
my_vec <- c(x$v1, x$v2, x$v3...)
But I would need like 60 lines of code. I am pretty sure there is another way of doing it. When I try this:
my_vec <- c(x$v1:x$v644)
I get this error message
Warning messages:
1: In t$`1`:t$`644` :
numerical expression has 20 elements: only the first used
My dataset looks like this
x <- read.table(
text = " v1 v2 V3
0 1 0
1 0 1
0 0 0
0 0 1
1 0 0",
header = TRUE
)
And as an output I would like just a vector with values for each column one after the other, like this:
my_vec <- c(0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0)

could a simple solution like
my_vec = x[, 1:644]
work?

You can use paste(). If they are all in the same data.frame, you can create a new column with the concatenation.
require(tidyverse)
df %>%
mutate(conc = paste(a, b, c))
Output:
a b c conc
1 a e g a e g
2 b f h b f h
3 c g i c g i
4 d h j d h j
Sample data:
df <- data.frame(a = letters[1:4],
b = letters[5:8],
c = letters[7:10])
Edit
If they are in diffrent vectors, you can do something like this:
a <- letters[1:4]
b <- letters[5:8]
c <- letters[7:10]
reduce(list(a, b, c), union)
Output:
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

Assuming that x is a data.frame:
> as.vector(unlist( split(as.matrix(x), row(x)) ))
The output might be a vector of characters, however you can make the values numeric again using as.numeric().
Credit:
Got the idea using the following post as a guide
How to split a R data frame into vectors (unbind)
EDIT:
Using your example data, we get the following:
x <- read.table(
text = " v1 v2 V3
0 1 0
1 0 1
0 0 0
0 0 1
1 0 0",
header = TRUE
)
> as.vector(unlist( split(as.matrix(x), row(x)) ))
[1] 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0

make binary (presence/absence) data matrix from multiple lists in r

I have a series of separate variable lists (character strings) that are different lengths. I want to combine them into one data frame to make a presence (1)/absence (0) matrix. Given that they are different lengths, I can't figure out how to even create the initial data frame. Here my example:
data1 <- c("a", "b", "c", "d", "e", "f")
data2 <- c("e", "f", "g")
data3 <- c("a", "c", "g")
My final result I would like to create a binary presence/absence matrix as below so I can create a graphic (similar to a heatmap) to display this.
data1 data2 data3
a 1 0 1
b 1 0 0
c 1 0 1
d 1 0 0
e 1 1 0
f 1 1 0
g 0 1 1
I'm still new to R so hope my explanation is okay. Thanks for the help.

There is a helper function in the splitstackshape package called charMat that you might want to give a try
dat <- paste0("data", 1:3)
out <- t(splitstackshape:::charMat(listOfValues = mget(dat), fill = 0L))
colnames(out) <- dat
out
# data1 data2 data3
#a 1 0 1
#b 1 0 0
#c 1 0 1
#d 1 0 0
#e 1 1 0
#f 1 1 0
#g 0 1 1
data
data1 <- c("a", "b", "c", "d", "e", "f")
data2 <- c("e", "f", "g")
data3 <- c("a", "c", "g")
explanation
The function expects a list as first argument. We can use mget to create that list
mget(dat)
#$data1
#[1] "a" "b" "c" "d" "e" "f"
#$data2
#[1] "e" "f" "g"
#$data3
#[1] "a" "c" "g"
where dat is a character vector that contains the names of your input data
dat
#[1] "data1" "data2" "data3"
t is used to transpose the output of charMat.
Hope this helps.

I would do it like this using %in% which returns a logical vector if a value is present or not. Later, we use as.integer to convert logical value to 0 and 1.
# create a master list
master_list <- unique(c(data1, data2, data3))
# make sure each list is as long as master to avoid,
# this error : longer object length is not a multiple of shorter object length
# adding 'll' just a random value
data1 <- c(data1, rep('ll', length(master_list) - length(data1)))
data2 <- c(data2, rep('ll', length(master_list) - length(data2)))
data3 <- c(data3, rep('ll', length(master_list) - length(data3)))
# create output matrix
mat <- matrix(c(as.integer(master_list %in% data1),
as.integer(master_list %in% data2),
as.integer(master_list %in% data3)),
nrow = length(master_list),
dimnames = list(master_list))
[,1] [,2] [,3]
a 1 0 1
b 1 0 0
c 1 0 1
d 1 0 0
e 1 1 0
f 1 1 0
g 0 1 1

Select columns based on columns sum

Any suggestion to select the columns of the row when value =1 and the sum columns values =1. it means that I will just select unique values, non-shared with the other individuals.
indv. X Y Z W T J
A 1 0 1 0 0 1
B 0 1 1 0 0 0
C 0 0 1 1 0 0
D 0 0 1 0 1 0
A: X, J
B: Y
C: W
D: T

Here you go! A solution in base r.
First we simulate your data, a data.frame with named rows and columns.
You can use sapply() to loop over the column indices.
A for-loop over the column indices will achieve the same thing.
Finally, save the results in a data.frame however you want.
# Simulate your example data
df <- data.frame(matrix(c(1, 0, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0,
0, 0, 1, 1, 0, 0,
0, 0, 1, 0, 1, 0), nrow = 4, byrow = T))
# Names rows and columns accordingly
names(df) <- c("X", "Y", "Z", "W", "T", "J")
rownames(df) <- c("A", "B","C", "D")
> df
X Y Z W T J
A 1 0 1 0 0 1
B 0 1 1 0 0 0
C 0 0 1 1 0 0
D 0 0 1 0 1 0
Then we select columns where the sum == 1- columns with unique values.
For every one of these columns, we find the row of this value.
# Select columns with unique values (if sum of column == 1)
unique.cols <- which(colSums(df) == 1)
# For every one of these columns, select the row where row-value==1
unique.rows <- sapply(unique.cols, function(x) which(df[, x] == 1))
> unique.cols
X Y W T J
1 2 4 5 6
> unique.rows
X Y W T J
1 2 3 4 1
The rows are not named correctly yet (they are still the element named of unique.cols). So we reference the rownames of df to get the rownames.
# Data.frame of unique values
# Rows and columns in separate columns
df.unique <- data.frame(Cols = unique.cols,
Rows = unique.rows,
Colnames = names(unique.cols),
Rownames = rownames(df)[unique.rows],
row.names = NULL)
The result:
df.unique
Cols Rows Colnames Rownames
1 1 1 X A
2 2 2 Y B
3 4 3 W C
4 5 4 T D
5 6 1 J A
Edit:
This is how you could summarise the values per row using dplyr.
library(dplyr)
df.unique %>% group_by(Rownames) %>%
summarise(paste(Colnames, collapse=", "))
# A tibble: 4 x 2
Rownames `paste(Colnames, collapse = ", ")`
<fct> <chr>
1 A X, J
2 B Y
3 C W
4 D T

One idea is to use rowwise apply to find the columns with 1, after we filter out the columns with sum != to 1, i.e.
apply(df[colSums(df) == 1], 1, function(i) names(df[colSums(df) == 1])[i == 1])
$A
[1] "X" "J"
$B
[1] "Y"
$C
[1] "W"
$D
[1] "T"
You can play around with the output to get it to desired state, i.e.
apply(df[colSums(df) == 1], 1, function(i) toString(names(df[colSums(df) == 1])[i == 1]))
# A B C D
#"X, J" "Y" "W" "T"
Or
data.frame(cols = apply(df[colSums(df) == 1], 1, function(i) toString(names(df[colSums(df) == 1])[i == 1])))
# cols
#A X, J
#B Y
#C W
#D T

Here is an option with tidyverse. We gather the dataset to 'long' format, grouped by 'key', fiter the rows where 'val' is 1 and the sum of 'val is 1, grouped by 'indv.', summarise the 'key' by pasteing the elements together
library(dplyr)
library(tidyr)
gather(df1, key, val, -indv.) %>%
group_by(key) %>%
filter(sum(val) == 1, val == 1) %>%
group_by(indv.) %>%
summarise(key = toString(key))
# A tibble: 4 x 2
# indv. key
# <chr> <chr>
#1 A X, J
#2 B Y
#3 C W
#4 D T

How to utilize recursive functions to help rank matrix rows - R

I would like some advice as to how best solve this puzzle. I have got some of the way to solving it using manually written long-hand code. I feel as if I need to utilize recursive functions, but I am still not very good at using them. I hope this question is not too long, I'm trying to be as succinct as possible whilst giving enough information. Sorry if it's too long - though hopefully somebody finds it of interest.
I have a matrix mat1
# A B C D E F G
# A 0 2 1 1 0 1 1
# B 0 0 0 1 2 2 1
# C 1 2 0 0 0 2 1
# D 1 1 2 0 1 2 1
# E 2 0 2 1 0 2 1
# F 1 0 0 0 0 0 1
# G 1 1 1 1 1 1 0
This represents the results of contests between individuals in rows and columns. Numbers refer to how often the individual in the row 'won' against the individual in the column.
I wish to rank individuals A-G from 1-7 using the following criteria:
number of wins against all others (most wins should be ranked 1, least wins 7, 2nd most wins 2, etc.)
if number of wins are tied, then ranks should be based on the number of wins obtained when considering contests only between those individuals with the same number of wins.
if individuals still have a tied number of wins, then ranks should be applied randomly.
I realize that this is not a very good ranking system, but that's not the issue here. According to the above scheme, ranks should be the following:
1 - D or E - D & E have joint highest overall wins (8), and equal wins also in contests between them.
2 - E or D - pick randomly D or E for rank 1 and rank 2
3 - A or C - tied with A,B,C,G for overall 6 wins, both have 4 wins in contents with ABCG
4 - C or A - considering contests between C&A both have 1 win, so randomly pick for rank3 and rank4
5 - G - tied with A,B,C,G for overall 6 wins, has 3 wins in contests between A,B,C,G
6 - B - tied with A,B,C,G for overall 6 wins, but only has 1 win in contests between A,B,C,G
7 - F - has the fewest wins of all in the overall win matrix
What I have tried:
storeresults <- vector("list") #use this to store results of the following
Step 1: Use winsfun function (see below) to identify number of wins of each individual & whether wins are unique (as noted by dupes column):
w1 <- winsfun(mat1)
storeresults[[1]] <- w1 #store results
w1 Only "F" has a unique number of wins and so can be ranked (7th) in the first instance:
# wins ranks dupes
#A 6 4.5 TRUE
#B 6 4.5 TRUE
#C 6 4.5 TRUE
#D 8 1.5 TRUE
#E 8 1.5 TRUE
#F 2 7.0 FALSE
#G 6 4.5 TRUE
Step 2: For individuals with non-unique wins (i.e. duplicated ranks) subset them into matrices considering only contests against others with the same number of wins, and determine new ranks if possible.
allSame(w1[,3]) #FALSE - this says that not all wins/ranks are unique so need to subset
s2 <- subsetties(w1) #this just splits the data into groups by number of wins (see below)
w2 <- lapply(s2, winsfun, m=mat1)
storeresults[[2]] <- w2 # store results
w2 As can be seen, those individuals with 8 wins (the most of anyone) from Step1 ("D" and "E") each have one win versus each other. They cannot be teased apart, so will be ranked 1 and 2 randomly. Those individuals with 6 wins (A, B, C, G) have different number of wins when only considering contests between each other. "B" and "G" can be ranked 6th overall and 5th overall respectively. We need to reconsider "A" and "C" in contests against only each other:
$`6`
wins ranks dupes
A 4 1.5 TRUE
B 1 4.0 FALSE
C 4 1.5 TRUE
G 3 3.0 FALSE
$`8`
wins ranks dupes
D 1 1.5 TRUE
E 1 1.5 TRUE
Step 3: Repeat Step 2 where required
allSame(w2[[1]][,3]) #FALSE - need to subset again as not everyone has same number of wins
allSame(w2[[2]][,3]) #TRUE - no more action required
s3 <- subsetties(w2[[1]])
w3 <- winsfun(s3[[1]], m=mat1)
storeresults[[3]] <- w3 #store results
w3 When considering "A" and "C" together, they have one win each, so should now be ranked randomly in 2nd and 3rd place. They cannot be teased apart.
wins ranks dupes
A 1 1.5 TRUE
C 1 1.5 TRUE
allSame(w3[,3]) #TRUE - no more action required - both have same number of wins
Step 4 Processing Stored Results
storeresults
# I can manually work out ranks from this, but have yet to work out how to do it in R
Below are the functions used in the above:
Function to calculate wins and ranks of subsetted matrices
winsfun <- function(m, out=NULL){
if (is.null(out)==F){
m1 <- m[rownames(out),rownames(out)]
wins <- apply(m1, 1, sum)
ranks <- rank(-wins)
dupes <- duplicated(wins)| duplicated(wins, fromLast = T)
df <- data.frame(wins, ranks,dupes)
return(df)
}
else
wins <- apply(m, 1, sum)
ranks <- rank(-wins)
dupes <- duplicated(wins)| duplicated(wins, fromLast = T)
df <- data.frame(wins, ranks,dupes)
return(df)
}
Function to subset those rows with duplicated ranks
subsetties <- function(df){
df1 <- df[df[,3]==T,]
df1.sp <- split(df1, df1$wins)
return(df1.sp)
}
Function to test if all elements of vector are identical
allSame <- function(x) length(unique(x)) == 1
Code to recreate above matrix:
structure(c(0, 0, 1, 1, 2, 1, 1, 2, 0, 2, 1, 0, 0, 1, 1, 0, 0,
2, 2, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 2, 0, 1, 0, 0, 1, 1, 2, 2,
2, 2, 0, 1, 1, 1, 1, 1, 1, 1, 0), .Dim = c(7L, 7L), .Dimnames = list(
c("A", "B", "C", "D", "E", "F", "G"), c("A", "B", "C", "D",
"E", "F", "G")))
I hope this question is clear. I am trying to work out how to perform this algorithm iteratively. I am not too sure how to achieve this, but hopefully by writing this out long-hand and providing the functions I have been using, it may be obvious to somebody. One extra thing is that it's best to have the proposed solution be generally applicable (i.e. to matrices of different sizes).

calc_gain<-function(mat=mat1){
if(nrow(mat)==1) {
return(row.names(mat))
} else {
classement<-sort(rowSums(mat),decreasing=T)
diffgains<-diff(classement)
if (all(diffgains!=0)){
return(names(classement))
} else {
if (all(diffgains==0)){
return(sample(names(classement)))
} else {
parex<-split(classement,factor(classement,levels=unique(classement)))
class_parex<-lapply(parex,function(vect){calc_gain(mat[names(vect),names(vect),drop=F])})
return(unlist(class_parex))
}
}
}
}
Here is what the function does :
if there is only one element, it returns the name of it (only "player" there is)
else, it calculates the scores.
If there is no tie, it returns the "players" in the order first to last
else, - if all "players" have the same score, it randomly gives an order.
else, it splits the ordered list according to the scores and apply the function (that is the recursive part) on the subsets of "players" with tied scores.

Here's a start:
Step0:
> split(rownames(m), -rowSums( m ) )
$`-8`
[1] "D" "E"
$`-6`
[1] "A" "B" "C" "G"
$`-2`
[1] "F"
Step1:
m <- m[ order( -rowSums(m) ), ]) # order within overall wins
A B C D E F G
D 1 1 2 0 1 2 1
E 2 0 2 1 0 2 1
A 0 2 1 1 0 1 1
B 0 0 0 1 2 2 1
C 1 2 0 0 0 2 1
G 1 1 1 1 1 1 0
F 1 0 0 0 0 0 1
> rowSums( m )
D E A B C G F
8 8 6 6 6 6 2
Step2: Order within group that has 4 wins
> mred <- m[c("A","B","C","G"), c("A","B","C","G") ]
> mred
A B C G
A 0 2 1 1
B 0 0 0 1
C 1 2 0 1
G 1 1 1 0
> rowSums(mred)
A B C G
4 1 4 3
> rownames(mred)[order(-rowSums(mred))]
[1] "A" "C" "G" "B"

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Sorting data frame by character string - r

string[] str = { "H", "G", "F", "D", "S","A" }; Array.Sort(str); for (int i = 0; i < str.Length; i++) { Console.WriteLine(str[i]); } Console.ReadLine();

Related

Find the number of items row wise with an exception

Create a vector or a list with the content of several columns

make binary (presence/absence) data matrix from multiple lists in r

Select columns based on columns sum

How to utilize recursive functions to help rank matrix rows - R

Categories

Resources