In Julia, I've defined a polynomial using DynamicPolynomials, e.g.:
using DynamicPolynomials
#polyvar x y
p = x + y + x^2 + x*y + y^2
cx = rand(10)
cy = rand(10)
Now I would like to iterate over the terms of the polynomial and evaluate the terms at x=cx[i] and y=cy[i]. How can I do this? Finally, I would like to create a matrix M[i, j] = t[j]([cx[i], cy[i]]), where t[j]is the j-th term of the polynomial p.
I guess you can do it directly. Here is an example:
using DynamicPolynomials
#polyvar x y
p = x + y + x^2 + x*y + y^2
cx = 1:10
cy = 11:20
and now
julia> res = [t(x=>vx,y=>vy) for (vx, vy) in zip(cx,cy), t in p]
10×5 Array{Int64,2}:
1 11 121 1 11
4 24 144 2 12
9 39 169 3 13
16 56 196 4 14
25 75 225 5 15
36 96 256 6 16
49 119 289 7 17
64 144 324 8 18
81 171 361 9 19
100 200 400 10 20
You can annotate row and column to check more easily that you get what you want in the following way:
julia> using NamedArrays
julia> NamedArray(res, (collect(zip(cx,cy)), collect(p)), ("point", "term"))
10×5 Named Array{Int64,2}
point ╲ term │ x^2 xy y^2 x y
─────────────┼────────────────────────
(1, 11) │ 1 11 121 1 11
(2, 12) │ 4 24 144 2 12
(3, 13) │ 9 39 169 3 13
(4, 14) │ 16 56 196 4 14
(5, 15) │ 25 75 225 5 15
(6, 16) │ 36 96 256 6 16
(7, 17) │ 49 119 289 7 17
(8, 18) │ 64 144 324 8 18
(9, 19) │ 81 171 361 9 19
(10, 20) │ 100 200 400 10 20
Related
seq can only use a single value in the by parameter. Is there a way to vectorize by, i.e. to use multiple intervals?
Something like this:
seq(1, 10, by = c(1, 2))
would return c(1, 2, 4, 5, 7, 8, 10). Now, this is possible to do this with e.g. seq(1, 10, by = 1)[c(T, T, F)] because it's a simple case, but is there a way to make it generalizable to more complex sequences?
Some examples
seq(1, 100, by = 1:5)
#[1] 1 2 4 7 11 16 17 19 22 26 31...
seq(8, -5, by = c(3, 8))
#[1] 8 5 -3
this looks like a close base R solution?
ans <- Reduce(`+`, rep(1:5, 100), init = 1, accumulate = TRUE)
ans[1:(which.max(ans >= 100) - 1)]
[1] 1 2 4 7 11 16 17 19 22 26 31 32 34 37 41 46 47 49 52 56 61 62 64
[24] 67 71 76 77 79 82 86 91 92 94 97
you would have to inverse a part of it, if you want to calculate 'down'
ans <- Reduce(`+`, rep(c(-3, -8), 20), init = 8, accumulate = TRUE)
ans[1:(which.max(ans <= -5) - 1)]
[1] 8 5 -3
still, you would have tu 'guess' the number of repetitions needed (20 of 100 in the examples above) to create ans.
This doesn't seem possible Maël. I suppose it's easy enough to write one?
seq2 <- function(from, to, by) {
vals <- c(0, cumsum(rep(by, abs(ceiling((to - from) / sum(by))))))
if(from > to) return((from - vals)[(from - vals) >= to])
else (from + vals)[(from + vals) <= to]
}
Testing:
seq2(1, 10, by = c(1, 2))
#> [1] 1 2 4 5 7 8 10
seq2(1, 100, by = 1:5)
#> [1] 1 2 4 7 11 16 17 19 22 26 31 32 34 37 41 46 47 49 52 56 61 62 64 67 71
#> [26] 76 77 79 82 86 91 92 94 97
seq2(8, -5, by = c(3, 8))
#> [1] 8 5 -3
Created on 2022-12-23 with reprex v2.0.2
We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...
We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21
Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1
Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)
We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21
You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100
A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21
This is a follow-up on this question, which was marked as a duplicate to this, but the suggested solution does not work.
I have the following data.frame:
set.seed(1)
mydf <- data.frame(A=paste(sample(LETTERS, 4), sample(1:20, 20), sep=""),
B=paste(sample(1:20, 20), sample(LETTERS, 4), sep=""),
C=sample(LETTERS, 20), D=sample(1:100, 20), value=rnorm(20))
> mydf
A B C D value
1 G5 6N T 9 -0.68875569
2 J18 8T R 87 -0.70749516
3 N19 1A L 34 0.36458196
4 U12 7K Z 82 0.76853292
5 G11 14N J 98 -0.11234621
6 J1 20T F 32 0.88110773
7 N3 17A B 45 0.39810588
8 U14 19K W 83 -0.61202639
9 G9 15N U 80 0.34111969
10 J20 3T I 36 -1.12936310
11 N8 9A K 70 1.43302370
12 U16 16K G 86 1.98039990
13 G6 10N M 39 -0.36722148
14 J7 18T D 62 -1.04413463
15 N13 5A Y 35 0.56971963
16 U4 11K N 28 -0.13505460
17 G17 4N O 64 2.40161776
18 J15 2T C 17 -0.03924000
19 N2 12A P 59 0.68973936
20 U10 13K X 10 0.02800216
I want to order it according to columns A to D, but A and D are mixed, so natural order is required.
I know I can apply regular ordering, like:
mydf2 <- mydf[do.call(order, c(mydf[1:4], list(decreasing = FALSE))),]
> mydf2
A B C D value
5 G11 14N J 98 -0.11234621
17 G17 4N O 64 2.40161776
1 G5 6N T 9 -0.68875569
13 G6 10N M 39 -0.36722148
9 G9 15N U 80 0.34111969
6 J1 20T F 32 0.88110773
18 J15 2T C 17 -0.03924000
2 J18 8T R 87 -0.70749516
10 J20 3T I 36 -1.12936310
14 J7 18T D 62 -1.04413463
15 N13 5A Y 35 0.56971963
3 N19 1A L 34 0.36458196
19 N2 12A P 59 0.68973936
7 N3 17A B 45 0.39810588
11 N8 9A K 70 1.43302370
20 U10 13K X 10 0.02800216
4 U12 7K Z 82 0.76853292
8 U14 19K W 83 -0.61202639
12 U16 16K G 86 1.98039990
16 U4 11K N 28 -0.13505460
But this is not the result I need. I need 10 after 9, not after 1 (you can check column A to see it is not in the order I need.)
In the comments of my original question, it was suggested to use the multi.mixedorder function.
However, as you can see below, the result is identical to the one using just order, which is still not what I want.
multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
do.call(order, c(
lapply(list(...), function(l){
if(is.character(l)){
factor(l, levels=mixedsort(unique(l)))
} else {
l
}
}),
list(na.last = na.last, decreasing = decreasing)
))
}
mydf3 <- mydf[do.call(multi.mixedorder, c(mydf[1:4], list(decreasing = FALSE))),]
> mydf3
A B C D value
5 G11 14N J 98 -0.11234621
17 G17 4N O 64 2.40161776
1 G5 6N T 9 -0.68875569
13 G6 10N M 39 -0.36722148
9 G9 15N U 80 0.34111969
6 J1 20T F 32 0.88110773
18 J15 2T C 17 -0.03924000
2 J18 8T R 87 -0.70749516
10 J20 3T I 36 -1.12936310
14 J7 18T D 62 -1.04413463
15 N13 5A Y 35 0.56971963
3 N19 1A L 34 0.36458196
19 N2 12A P 59 0.68973936
7 N3 17A B 45 0.39810588
11 N8 9A K 70 1.43302370
20 U10 13K X 10 0.02800216
4 U12 7K Z 82 0.76853292
8 U14 19K W 83 -0.61202639
12 U16 16K G 86 1.98039990
16 U4 11K N 28 -0.13505460
OK solved it, the multi.mixedsort function needs a fix to be able to deal with factors:
multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
do.call(order, c(
lapply(list(...), function(l){
if(is.character(l)){
factor(l, levels=mixedsort(unique(l)))
} else {
factor(as.character(l), levels=mixedsort(levels(l)))
}
}),
list(na.last = na.last, decreasing = decreasing)
))
}
Otherwise convert all factor columns in mydf into character, with:
mydf[] <- lapply(mydf, as.character)
but with the fix, this shouldn't be needed
I'm new to R so this question might be quite basic.
There is a column in my data which goes like 4 4 4 4 7 7 7 13 13 13 13 13 13 13 4 4 7 7 7 13 13 13 13 13 13 13 13 4 4.....
One cycle of 4...7...13... is considered as one complete run, to which I will assign a Run Number (1, 2, 3...) to each run.
The number of times that each value (4, 7, 13) repeats is not fixed, and the total number of rows in a run is not fixed either. The total number of runs is unknown (but typically ranging from 60-90). The order of (4, 7, 13) is fixed.
I have attached my current code here. It works fine, but it does take a minute or two when there's about a few million rows of data. I'm aware that growing vectors in a for loop is really not recommended in R, so I would like to ask if anyone has a more elegant solution to this.
Sample data can be generated with the code below, and the desired output can also be generated with the sample code below.
#Generates sample data
df <- data.frame(Temp = c(sample(50:250, 30)), Pres = c(sample(500:1000, 30)),
Message = c(rep(4, 3), rep(7, 2), rep(13, 6), rep(4, 4), rep(7, 1), rep(13, 7), rep(4, 3), rep(7, 4)))
Current Solution
prev_val = 0
Rcount = 1
Run_Count = c()
for (val in df$Message)
{
delta = prev_val - val
if((delta == 9))
Rcount = Rcount + 1
prev_val = val
Run_Count = append(Run_Count, Rcount)
}
df$Run = Run_Count
The desired output:
226 704 4 1
138 709 4 1
136 684 4 1
57 817 7 1
187 927 7 1
190 780 13 1
152 825 13 1
126 766 13 1
202 855 13 1
214 757 13 1
172 922 13 1
50 975 4 2
159 712 4 2
212 802 4 2
181 777 4 2
102 933 7 2
165 753 13 2
67 962 13 2
119 631 13 2
The data frame will later be split by the Run Number, but after being categorized according to the value, i.e.
... 4 1
... 4 1
... 4 1
... 4 1
... 4 2
... 4 2
... 4 2
... 4 3
.....
I am not sure if this is an improvement, but it uses the rle run length encoding function to determine the length of each repeat in each run.
df <- data.frame(Temp = c(sample(50:250, 30)), Pres = c(sample(500:1000, 30)),
Message = c(rep(4, 3), rep(7, 2), rep(13, 6), rep(4, 4), rep(7, 1), rep(13, 7), rep(4, 3), rep(7, 4)))
rleout<-rle(df$Message)
#find the length of the runs and create the numbering
runcounts<-ceiling(length(rleout$lengths)/3)
runs<-rep(1:runcounts, each=3)
#need to trim the length of run numbers for cases where there is not a
# full sequence, as in the test case.
rleout$values<-runs[1:length(rleout$lengths)]
#create the new column
df$out<-inverse.rle(rleout)
I'm sure someone can come along and demonstrate and a better and faster method using data tables.
easily use:
df$runID <- cumsum(c(-1,diff(df$Message)) < 0)
# Temp Pres Message runID
# 1 174 910 4 1
# 2 181 612 4 1
# 3 208 645 4 1
# 4 89 601 7 1
# 5 172 812 7 1
# 6 213 672 13 1
# 7 137 848 13 1
# 8 153 833 13 1
# 9 127 591 13 1
# 10 243 907 13 1
# 11 146 599 13 1
# 12 151 567 4 2
# 13 139 855 4 2
# 14 147 793 4 2
# 15 227 533 4 2
# 16 241 959 7 2
# 17 206 948 13 2
# 18 236 875 13 2
# 19 133 537 13 2
# 20 70 688 13 2
# 21 218 528 13 2
# 22 244 927 13 2
# 23 161 697 13 2
# 24 177 572 4 3
# 25 179 911 4 3
# 26 192 559 4 3
# 27 60 771 7 3
# 28 245 682 7 3
# 29 196 614 7 3
# 30 171 536 7 3
Ok, first of all let me generate some sample data:
A_X01 <- c(34, 65, 23, 43, 22)
A_X02 <- c(2, 4, 7, 8, 3)
B_X01 <- c(24, 45, 94, 23, 54)
B_X02 <- c(4, 2, 4, 9, 1)
C_X01 <- c(34, 65, 876, 45, 87)
C_X02 <- c(123, 543, 86, 87, 34)
Var <- c(3, 5, 7, 2, 3)
DF <- data.frame(A_X01, A_X02, B_X01, B_X02, C_X01, C_X02, Var)
What I want to do is apply an equation to the concurrent columns of A and B for both X01 and X02, with a third column "Var" used in the equation.
So far I have been doing this the following way:
DF$D_X01 <- (DF$A_X01 + DF$B_X01) * DF$Var
DF$D_X02 <- (DF$A_X02 + DF$B_X02) * DF$Var
My desired output is as follows:
A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
1 34 2 24 4 34 123 3 174 18
2 65 4 45 2 65 543 5 550 30
3 23 7 94 4 876 86 7 819 77
4 43 8 23 9 45 87 2 132 34
5 22 3 54 1 87 34 3 228 12
As you'll appreciate this is a lot of lines of code to do something fairly simple. Meaning at present my scripts are rather long (as I have multiple columns in the actual dataset)!
One of the apply functions must be the way to go but I can't seem to get my head around it for concurrent columns. I did think about using lapply but how would I get this to work for the two lists of columns and for the right columns to be added together?
I've looked around and can't seem to find a way to do this which must be a fairly common problem?
Thanks.
EDIT:
Original question was a bit confusing so have updated with a desired output and some extra conditions.
Try this
indx <- gsub("\\D", "", grep("A_X|B_X", names(DF), value = TRUE)) # Retrieving indexes
indx2 <- DF[grep("A_X|B_X", names(DF))] # Considering only the columns of interest
DF[paste0("D_X", unique(indx))] <-
sapply(unique(indx), function(x) rowSums(indx2[which(indx == x)])*DF$Var)
DF
# A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
# 1 34 2 24 4 34 123 3 174 18
# 2 65 4 45 2 65 543 5 550 30
# 3 23 7 94 4 876 86 7 819 77
# 4 43 8 23 9 45 87 2 132 34
# 5 22 3 54 1 87 34 3 228 12
You may also try
indxA <- grep("^A", colnames(DF))
indxB <- grep("^B", colnames(DF))
f1 <- function(x,y,z) (x+y)*z
DF[sprintf('D_X%02d', indxA)] <- Map(f1 , DF[indxA], DF[indxB], list(DF$Var))
DF
# A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
#1 34 2 24 4 34 123 3 174 18
#2 65 4 45 2 65 543 5 550 30
#3 23 7 94 4 876 86 7 819 77
#4 43 8 23 9 45 87 2 132 34
#5 22 3 54 1 87 34 3 228 12
Or you could use mapply
DF[sprintf('D_X%02d', indxA)] <- mapply(`+`, DF[indxA],DF[indxB])*DF$Var