Create FiniteDimensionalAlgebra from OperationTable - sage

Let's assume we have an OperationTable -- T.
sage: T = OperationTable(Set, operation=operation, names='elements');
We would like to create a FiniteDimensionalAlgebra from T.
Question.
What is the fastest way to convert a table T to the form being expected by FiniteDimensionalAlgebra?
The list table must have the following form: there exists a finite-dimensional k-algebra of degree n with basis (e1,…,en)
(e1,… ,en) such that the i-th element of table is the matrix of right multiplication by ei with respect to the basis (e1,…,en).
Example.
An example of list of matrices accepted by FiniteDimensionalAlgebra over field QQ.
sage: B = FiniteDimensionalAlgebra(QQ, [Matrix([[1,0,0], [0,1,0], [0,0,0]]), Matrix([[0,1,0], [0,0,0], [0,0,0]]), Matrix([[0,0,0], [0,0,0], [0,0,1]])])
An instance of T.
sage: T
. a b c d e f g h i j
+--------------------
a| a a a a a a a a a a
b| a a a a a a b b b b
c| a a a b b b b b c c
d| a a b b b c c c c d
e| a a b b c c c d d e
f| a a b c c c d e e e
g| a b b c c d e e f f
h| a b b c d e e f g g
i| a b c c d e f g g h
j| a b c d e e f g h i
sage: T.table()
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 2, 2],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 2, 2, 3, 3, 4],
[0, 0, 1, 2, 2, 2, 3, 4, 4, 4],
[0, 1, 1, 2, 2, 3, 4, 4, 5, 5],
[0, 1, 1, 2, 3, 4, 4, 5, 6, 6],
[0, 1, 2, 2, 3, 4, 5, 6, 6, 7],
[0, 1, 2, 3, 4, 4, 5, 6, 7, 8]]

I'm not sure if I have managed left/right correctly here, but something like this will work:
# elt_to_idx: looks like {'a': 0, 'b': 1, ...}
elt_to_idx = {b:a for (a,b) in enumerate(T.column_keys())}
mat_list = []
# Convert T to a list of matrices.
for a in T.column_keys():
# dictionary to define matrix for right multiplication by a:
dict_a = {}
for (idx_b, b) in enumerate(T.column_keys()):
# Each row (or should it be column?) should have a single entry of "1"
dict_a[idx_b, elt_to_idx[T[b, a]]] = 1
mat_list.append(matrix(QQ, 4, 4, dict_a))
FiniteDimensionalAlgebra(QQ, mat_list)

Related

R reshape wide to long: multiple variables, observations with multiple indicies

I have got some data containing observations with multiple idicies $y_{ibc}$ stored in a messy wide format. I have been fiddling around with tidyr and reshape2 but could not figure it out (reshaping really is my nemesis).
Here is an example:
df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a1b1c1 = c(5,
2, 1, 4, 3, 1, 0, 1, 3), a2b1c1 = c(3, 4, 1, 1, 3, 2, 1, 4, 4
), a3b1c1 = c(4, 0, 0, 1, 1, 1, 0, 0, 1), a1b2c1 = c(1, 0, 4,
2, 4, 1, 0, 4, 2), a2b2c1 = c(2, 0, 1, 0, 1, 0, 3, 2, 0), a3b2c1 = c(2,
4, 3, 0, 2, 3, 3, 3, 4), yc1 = c(1, 2, 2, 1, 2, 2, 2, 1, 1), a1b1c2 = c(4,
2, 3, 0, 4, 4, 2, 1, 4), a2b1c2 = c(3, 0, 3, 3, 4, 4, 3, 2, 2
), a3b1c2 = c(3, 1, 0, 1, 4, 0, 2, 2, 3), a1b2c2 = c(2, 2, 0,
3, 2, 1, 4, 1, 0), a2b2c2 = c(3, 0, 2, 3, 4, 4, 4, 0, 4), a3b2c2 = c(0,
0, 0, 2, 0, 0, 1, 4, 3), yc2 = c(2, 2, 2, 1, 2, 2, 2, 1, 1), X = c(5,
6, 3, 7, 4, 3, 2, 3, 2)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
This is what I want (excerpt):
id b c y a1 a2 a3 X
1 1 b1 c1 1 5 3 4 5
2 1 b2 c1 1 1 2 2 5
3 1 b1 c2 2 4 3 3 5
4 1 b2 c2 2 2 3 0 5
Using tidyr & dplyr:
library(tidyverse)
df %>%
pivot_longer(cols = matches("a.b.c."), names_to = "name", values_to = "value") %>%
separate(name, into = c("a", "b", "c"), sep = c(2,4)) %>%
mutate(y = case_when(c == "c1" ~ yc1,
c == "c2" ~ yc2)) %>%
pivot_wider(names_from = a, values_from = value) %>%
select(id, b, c, y, a1, a2, a3, X)
First, convert all your a/b/c columns to a long format & separate the 3 values into separate columns. Then combine your y columns into one depending on the value of c using mutate andcase_when (you could also use if_else for two options but case_when is more expandable for more values). Then pivot your a columns back to wide format and use select to put them in the right order and get rid of the yc1 and yc2 columns.

as.factor not working with INT values on R

Hey guys if you could please help me. I got this dataset:
q1 q2 q3 m1 m2 b1 b2
A 78 150 2887 4 4 0 1
B 74 142 2904 4 4 1 1
C 79 137 1564 4 4 1 0
D 80 164 4522 2 2 0 0
E 74 173 5025 2 3 0 1
F 73 140 1971 3 3 0 1
I want to transform m1:b2 into factors. If I do
data[,4:7] <- as.factor(data[,4:7])
it doesn't work, the values change to char vectors. It gets messed up like this:
q1 q2 q3 m1 m2 b1
A 78 150 2887 c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3)
B 74 142 2904 c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3)
C 79 137 1564 c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0)
D 80 164 4522 c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1)
E 74 173 5025 c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3)
F 73 140 1971 c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3)
b2
A c(0, 1, 1, 0, 0, 0)
B c(1, 1, 0, 0, 1, 1)
C c(4, 4, 4, 2, 2, 3)
D c(4, 4, 4, 2, 3, 3)
E c(0, 1, 1, 0, 0, 0)
F c(1, 1, 0, 0, 1, 1)
But if I use lapply it works fine. Can you explain me why? Because I've been using as.factor(d[]) in other occasions and it worked just fine with other data.frame objects. Thank you.
Checking the documentation for as.factor (by typing ?as.factor), you'll see it says that the first argument x is "a vector of data, usually taking a small number of distinct values". If you supply multiple columns of a data frame, they are treated as one vector. In your example, as.factor creates a unique factor level for each unique value in the entire vectorized, concatenation of columns 4 through 7 of your data frame above.
You should use:
data[4:7] <- lapply(data[4:7], as.factor)
or (requiring tidyverse packages)
data <- data %>% mutate_at(4:7, as.factor)
Both of these solutions will correctly treat each column supplied, here columns 4, 5, 6, and 7, as their own vectors, individually. Each one is converted to a factor separately, and re-assigned appropriately.

Within rows of data frame, find first occurrence and longest sequence of value

Consider this data frame, which provides the scored responses on a 15-item test for 10 individuals:
library(tidyverse)
input <- tribble(
~ID, ~i1, ~i2, ~i3, ~i4, ~i5, ~i6, ~i7, ~i8, ~i9, ~i10, ~i11, ~i12, ~i13, ~i14, ~i15,
"A", 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
"B", 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
"C", 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
"D", 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
"F", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
"G", 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
"H", 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
"I", 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
"J", 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1
)
I want R to go row-by-row, and scan the cells in each row from left to right, in order to create these new columns:
first_0_name: returns the column name of the cell containing the first occurrence of the value 0
first_0_loc: returns the column location of the cell containing the first occurrence of the value 0
streak_1: starting from the first occurrence of 0, find the next occurrence of 1, and then count how many consecutive 1 before the next occurrence of 0.
The new columns should appear as below
new_cols <- tribble(
~first_0_name, ~first_0_loc, ~streak_1,
"i9", 10, 5,
"i4", 5, 4,
"i6", 7, 8,
"i8", 9, 4,
"i9", 10, 5,
NA, NA, NA,
"i1", 2, 5,
"i3", 4, 8,
"i2", 3, NA,
"i1", 2, 1
)
Thanks in advance for any help!
If you wanted to use base R a little more directly and avoid the cost of transforming the whole data frame. This solution also retains the order of rows without having to create extra ordering columns (unlike the tidyverse solution).
results <- apply(input, 1, function(x) {
# get indices of all zeros
zeros <- which(x == 0)
# exit early if no zeros are found
if (length(zeros) == 0) {
return(data.frame(first_0_name = NA, first_0_loc = NA, streak_1 = NA))
}
first.name <- names(zeros[1]) # name of first 0 column
first.idx <- zeros[1] # location of first zero
longest.streak <- diff(zeros)[1] - 1 # length of first 0-0 streak
return(data.frame(first_0_name = first.name,
first_0_loc = first.idx,
streak_1 = ifelse(longest.streak == 0, NA, longest.streak))
)
})
output <- do.call(rbind, results)
first_0_name first_0_loc streak_1
i9 i9 10 5
i4 i4 5 4
i6 i6 7 8
i8 i8 9 NA
i91 i9 10 5
1 <NA> NA NA
i1 i1 2 5
i3 i3 4 8
i2 i2 3 NA
i31 i3 4 2
Edit #2: Rewrote as combination of two summarizations.
input_tidy <- input %>%
gather(col, val, -ID) %>%
group_by(ID) %>%
arrange(ID) %>%
mutate(col_num = row_number() + 1)
input[,1] %>%
# Combine with summary of each ID's first zero
left_join(input_tidy %>% filter(val == 0) %>%
summarize(first_0_name = first(col),
first_0_loc = first(col_num))) %>%
# Combine with length of each ID's first post-0 streak of 1's
left_join(input_tidy %>%
filter(val == 1 & cumsum(val == 1 & lag(val, default = 1) == 0) == 1) %>%
summarize(streak_1 = n()))
# A tibble: 10 x 4
ID first_0_name first_0_loc streak_1
<chr> <chr> <dbl> <int>
1 A i9 10 5
2 B i4 5 4
3 C i6 7 8
4 D i8 9 4
5 E i9 10 5
6 F NA NA NA
7 G i1 2 5
8 H i3 4 8
9 I i2 3 NA
10 J i3 4 2
An option using melt from data.table
library(data.table)
melt(setDT(input), id.var = 'ID')[, .(first_o_name = first(variable[value == 0]),
first_o_loc = which(value == 0)[1] +1,
streak_1 = sum(cumsum(c(TRUE, diff(value == 0) < 0)) == 2) - 1 ), ID
][streak_1 < 0, streak_1 := NA_real_][]
A base R option can also be with apply and rle
do.call(rbind, apply(input[-1], 1, function(x) {
first_o_loc <- unname(which(x == 0)[1] + 1)
first_o_name <- names(x)[first_o_loc-1]
rl <- rle(x)
rl1 <- within.list(rl, {
i1 <- cumsum(values == 0) == 1
values <- values[i1]
lengths <- lengths[i1]})
streak_1 <- unname(rl1$lengths[2])
data.frame(first_o_name, first_o_loc, streak_1)}))
# first_o_name first_o_loc streak_1
#1 i9 10 5
#2 i4 5 4
#3 i6 7 8
#4 i8 9 4
#5 i9 10 5
#6 <NA> NA NA
#7 i1 2 5
#8 i3 4 8
#9 i2 3 NA
#10 i3 4 2

Advanced if/then/loop function to create new columns

I am learning R (focused on the tidyverse packages) and am hoping that someone could help with the following problem that has me stumped.
I have a data-set that looks similar to the following:
library("tibble")
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
)
Basically, I have multiple rows of respondent data, and each respondent gave 16 responses of either "1" or "2".
For each respondent (i.e., each row) I would like to create an additional three columns:
The first new column - called "switchCount" - identifies the number of times the respondent switched from a "2" response to a "1" response.
The second new column - called "switch1" - identifies the index of the first time the respondent switched from a "2" response to a "1" response.
The third new column - called "switch2" - identifies the index of the final time the respondent switched from a "2" response to a "1" response.
If there is no switch and all values are "2", then return the index of 0.
If there is no switch and all values are "1", then return the index of 16.
The final datatable should therefore look like this:
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16, ~switchCount, ~switch1, ~switch2,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1, 1,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 3, 9,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 3, 6, 15,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 16, 16
)
One approach could be to concatenate all response columns row wise and then find the occurrences of 2,1 using gregexpr
library(dplyr)
myData %>%
rowwise() %>%
mutate(concat_cols = paste(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15,r16,sep=";"),
switchCount = ifelse(gregexpr("2;1", concat_cols)[[1]][1] == -1,
0,
length(gregexpr("2;1", concat_cols)[[1]])),
switch1 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
min(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1)),
switch2 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
max(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1))) %>%
select(-concat_cols)
Output is:
id r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 switchCount switch1 switch2
1 A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1 1
2 B 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4
3 C 2 2 2 1 1 1 2 2 2 1 1 1 1 2 2 2 2 3 9
4 D 1 1 2 2 2 2 1 1 2 2 1 1 1 2 2 1 3 6 15
5 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 16 16
Sample data:
myData <- structure(list(id = c("A", "B", "C", "D", "E"), r1 = c(2, 2,
2, 1, 1), r2 = c(2, 2, 2, 1, 1), r3 = c(2, 2, 2, 2, 1), r4 = c(2,
2, 1, 2, 1), r5 = c(2, 1, 1, 2, 1), r6 = c(2, 1, 1, 2, 1), r7 = c(2,
1, 2, 1, 1), r8 = c(2, 1, 2, 1, 1), r9 = c(2, 1, 2, 2, 1), r10 = c(2,
1, 1, 2, 1), r11 = c(2, 1, 1, 1, 1), r12 = c(2, 1, 1, 1, 1),
r13 = c(2, 1, 1, 1, 1), r14 = c(2, 1, 2, 2, 1), r15 = c(2,
1, 2, 2, 1), r16 = c(2, 1, 2, 1, 1), switchCount = c(0, 1,
2, 3, 0), switch1 = c(1, 4, 3, 6, 16), switch2 = c(1, 4,
9, 15, 16)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

Find the wieght of each element of another vector

I have a vector v and I want to have a vector w which is the weight of each element of v. How can I get the result (vector w)in R? For example,
v = c(0, 0, 1, 1, 1, 3, 4, 4, 4, 4, 5, 5, 6)
u = unique(v)
w = c(2, 3, 1, 4, 2, 1)
Use table:
table(v)
v
0 1 3 4 5 6
2 3 1 4 2 1

Resources