Sort 22x2 array by the first column - julia

I have the following:
one = [0.3, 0.3, 0.3, 0.3, 0.3, 0.17, 0.255, 0.1, 0.145, 0.275, 0.17, 0.225, 0.25, 0.25, 0.28, 0.29, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]
two = [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0]
data_needed = [one two] # build 22×2 Array{Float64,2}
For example output (truncated)
22×2 Array{Float64,2}:
0.3 0.5
0.3 1.0
0.3 1.5
0.3 2.0
0.3 2.5
0.17 3.0
0.255 3.5
0.1 4.0
0.145 4.5
0.275 5.0
So i wish to sort the full 22,2 array by the first column:
data_needed[1:size(data_needed,1)]
Float64[22]
0.300
0.300
0.300
0.300
0.300
0.170
0.255
0.100
0.145
0.275
0.170
0.225
0.250
0.250
0.280
0.290
0.300
0.300
0.300
0.300
0.300
0.300
Sort in ascending order:
Float64[22]
0.100
0.145
0.170
0.170
0.225
0.250
0.250
0.255
0.275
0.280
0.290
0.300
0.300
If sort by this first column in ascending order - it may associate the corresponding values in the second column to the same row position as the sorted column.
If i sorted a full data frame as e.g by a specific column, it would associate the other data on the same row to sorted order - does this happen for Arrays? using sort() with no avail.

answer:
sortslices(data_needed,dims=1)
22×2 Array{Float64,2}:
0.1 4.0
0.145 4.5
0.17 3.0
0.17 5.5
0.225 6.0
0.25 6.5
0.25 7.0
0.255 3.5
0.275 5.0
0.28 7.5

Related

How to keep the name of vector as a column in the tibble

I have the following list of named vectors:
lof <- list(PP1 = c(A = -0.96, R = 0.8, N = 0.82, D = 1, C = -0.55,
E = 0.94, Q = 0.78, G = -0.88, H = 0.67, I = -0.94, L = -0.9,
K = 0.6, M = -0.82, F = -0.85, P = -0.81, S = 0.41, T = 0.4,
W = 0.06, Y = 0.31, V = -1), PP2 = c(A = -0.76, R = 0.63, N = -0.57,
D = -0.89, C = -0.47, E = -0.54, Q = -0.3, G = -1, H = -0.11,
I = -0.05, L = 0.03, K = 0.1, M = 0.03, F = 0.48, P = -0.4, S = -0.82,
T = -0.64, W = 1, Y = 0.42, V = -0.43), PP3 = c(A = 0.31, R = 0.99,
N = 0.02, D = -1, C = 0.19, E = -0.99, Q = -0.38, G = 0.49, H = 0.37,
I = -0.18, L = -0.24, K = 1, M = -0.08, F = -0.58, P = -0.07,
S = 0.57, T = 0.37, W = -0.47, Y = -0.2, V = -0.14))
What I want to do is to convert it to tibble and keeping the name of the vector as a column in a tibble.
With this:
library(tidyverse)
as_tibble(lof)
I get this:
# A tibble: 20 × 3
PP1 PP2 PP3
<dbl> <dbl> <dbl>
1 -0.96 -0.76 0.31
2 0.8 0.63 0.99
.. etc ...
What I want to get is this:
PP1 PP2 PP3. residue
1 -0.96 -0.76 0.31 A
2 0.8 0.63 0.99 R
.. etc ...
How can I achieve that?
It is straightforward to add a new column named "residue":
as_tibble(lof) %>%
mutate(residue = names(lof[[1]]))
# A tibble: 20 × 4
PP1 PP2 PP3 residue
<dbl> <dbl> <dbl> <chr>
1 -0.96 -0.76 0.31 A
2 0.8 0.63 0.99 R
3 0.82 -0.57 0.02 N
4 1 -0.89 -1 D
5 -0.55 -0.47 0.19 C
6 0.94 -0.54 -0.99 E
7 0.78 -0.3 -0.38 Q
8 -0.88 -1 0.49 G
9 0.67 -0.11 0.37 H
10 -0.94 -0.05 -0.18 I
11 -0.9 0.03 -0.24 L
12 0.6 0.1 1 K
13 -0.82 0.03 -0.08 M
14 -0.85 0.48 -0.58 F
15 -0.81 -0.4 -0.07 P
16 0.41 -0.82 0.57 S
17 0.4 -0.64 0.37 T
18 0.06 1 -0.47 W
19 0.31 0.42 -0.2 Y
20 -1 -0.43 -0.14 V
or
lof_new <- as_tibble(lof)
lof_new$residue <- names(lof[[1]])
The other answer works however if your names are not in order in all of the list items or they are of different lengths, you could also use the below.
library(tidyverse)
lof %>%
map(~tibble(name = names(.x), value=.x)) %>%
bind_rows(.id = "ID") %>%
pivot_wider(names_from = ID, values_from = value, values_fill = NA)
# A tibble: 20 x 4
name PP1 PP2 PP3
<chr> <dbl> <dbl> <dbl>
1 A -0.96 -0.76 0.31
2 R 0.8 0.63 0.99
3 N 0.82 -0.57 0.02
4 D 1 -0.89 -1
5 C -0.55 -0.47 0.19
6 E 0.94 -0.54 -0.99
7 Q 0.78 -0.3 -0.38
8 G -0.88 -1 0.49
9 H 0.67 -0.11 0.37
10 I -0.94 -0.05 -0.18
11 L -0.9 0.03 -0.24
12 K 0.6 0.1 1
13 M -0.82 0.03 -0.08
14 F -0.85 0.48 -0.58
15 P -0.81 -0.4 -0.07
16 S 0.41 -0.82 0.57
17 T 0.4 -0.64 0.37
18 W 0.06 1 -0.47
19 Y 0.31 0.42 -0.2
20 V -1 -0.43 -0.14
bind_cols(lof, residue=names(lof$PP1))

How to apply the function rollapply for all variables using "across" dplyr's function?

I have this data frame with type double columns from A to Z.
df
t A B C ... X Y Z
1 1 0.97 12.50 5.10 ... 0.67 4.46 5.72
2 2 -0.81 5.45 2.75 ... 0.82 -7.46 3.57
3 3 0.28 8.64 2.12 ... -0.56 23.71 2.64
4 4 -0.16 -4.38 2.54 ... 0.79 -5.60 3.28
5 5 1.94 1.62 4.72 ... -1.13 5.93 3.23
6 6 1.72 26.38 5.74 ... -1.62 15.05 2.43
7 7 0.36 12.47 4.20 ... -1.21 6.20 5.92
8 8 0.30 -29.05 4.41 ... 0.62 8.63 3.99
9 9 -0.39 16.78 2.79 ... 0.04 -8.90 2.37
10 10 0.79 3.57 4.14 ... -0.14 24.26 2.85
11 11 0.67 6.13 2.72 ... -0.15 -8.22 5.72
12 12 -0.95 0.56 3.81 ... -0.04 -4.88 3.19
13 13 0.04 16.40 3.27 ... -0.64 19.51 4.61
14 14 2.12 2.29 2.46 ... 0.38 14.48 5.60
15 15 0.17 7.72 2.74 ... -1.55 -8.20 5.96
16 16 -0.88 1.80 4.92 ... 0.99 -0.06 3.72
17 17 0.95 26.38 3.65 ... -0.38 2.08 3.58
18 18 -0.70 12.16 3.66 ... 1.24 0.95 2.57
19 19 -1.08 8.15 3.92 ... -0.95 -14.75 3.12
20 20 -0.05 23.40 3.71 ... 1.77 30.34 4.26
I would like to do a two-year moving average MA for each variable. That is the average of t-2, t-1, t
It would look like this:
t A B C ... X Y Z
1 1 NA NA NA ... NA NA NA
2 2 NA NA NA ... NA NA NA
3 3 0.15 8.86 3.32 ... 0.31 6.90 3.98
4 4 -0.23 3.24 2.47 ... 0.35 3.55 3.16
5 5 0.69 1.96 3.13 ... -0.30 8.01 3.05
6 6 1.17 7.87 4.33 ... -0.65 5.13 2.98
7 7 1.34 13.49 4.89 ... -1.32 9.06 3.86
8 8 0.79 3.27 4.78 ... -0.74 9.96 4.11
9 9 0.09 0.07 3.80 ... -0.18 1.98 4.09
10 10 0.23 -2.90 3.78 ... 0.17 8.00 3.07
11 11 0.36 8.83 3.22 ... -0.08 2.38 3.65
12 12 0.17 3.42 3.56 ... -0.11 3.72 3.92
13 13 -0.08 7.70 3.27 ... -0.28 2.14 4.51
14 14 0.40 6.42 3.18 ... -0.10 9.70 4.47
15 15 0.78 8.80 2.82 ... -0.60 8.60 5.39
16 16 0.47 3.94 3.37 ... -0.06 2.07 5.09
17 17 0.08 11.97 3.77 ... -0.31 -2.06 4.42
18 18 -0.21 13.45 4.08 ... 0.62 0.99 3.29
19 19 -0.28 15.56 3.74 ... -0.03 -3.91 3.09
20 20 -0.61 14.57 3.76 ... 0.69 5.51 3.32
For this, I have tried to do the following:
> df %>% as_tibble() %>%
mutate(across(where(is.double), ~ zoo::rollapply(3, mean, align='right', fill=NA)) )
But the output is an error.
Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error in `across()`:
! Problem while computing column `date`.
Caused by error in `match.fun()`:
! argument "FUN" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Any idea why this is happening and any suggestions on how to fix it?
Thank you. Regards
A few points:
the code in the question omitted the first argument to rollapply
as long as there are no NA's in the input you can use rollmean in place of rollapply
instead of align="right" use rollapplyr or rollmeanr with r on the end
these all work with multiple columns already so you don't really need to use across in the first place
it is even simpler if you work with zoo objects
because dplyr causes numerous other packages to fail and it is the fault of dplyr, not the other packages, it is best to load dplyr excluding the functions that it clobbers. You can still access them using dplyr::
please read the instructions at the top of the r tag page and, in particular show the input in reproducible form using dput. We did that for you this time in the Note at the end.
Try any of these:
library(dplyr, exclude = c("filter", "lag"))
library(zoo)
df %>% mutate(rollmeanr(.[-1], 3, fill = NA) %>% as_tibble)
df %>% mutate(across(-1, rollmeanr, 3, fill = NA))
df %>% mutate(across(where(is.double), rollmeanr, 3, fill = NA))
df %>% read.zoo %>% rollmeanr(3, fill = NA) %>% fortify.zoo
# with zoo objects
z <- read.zoo(df); rollmeanr(z, 3, fill = NA)
Note
df <- structure(list(t = 1:20, A = c(0.97, -0.81, 0.28, -0.16, 1.94,
1.72, 0.36, 0.3, -0.39, 0.79, 0.67, -0.95, 0.04, 2.12, 0.17,
-0.88, 0.95, -0.7, -1.08, -0.05), B = c(12.5, 5.45, 8.64, -4.38,
1.62, 26.38, 12.47, -29.05, 16.78, 3.57, 6.13, 0.56, 16.4, 2.29,
7.72, 1.8, 26.38, 12.16, 8.15, 23.4), C = c(5.1, 2.75, 2.12,
2.54, 4.72, 5.74, 4.2, 4.41, 2.79, 4.14, 2.72, 3.81, 3.27, 2.46,
2.74, 4.92, 3.65, 3.66, 3.92, 3.71), X = c(0.67, 0.82, -0.56,
0.79, -1.13, -1.62, -1.21, 0.62, 0.04, -0.14, -0.15, -0.04, -0.64,
0.38, -1.55, 0.99, -0.38, 1.24, -0.95, 1.77), Y = c(4.46, -7.46,
23.71, -5.6, 5.93, 15.05, 6.2, 8.63, -8.9, 24.26, -8.22, -4.88,
19.51, 14.48, -8.2, -0.06, 2.08, 0.95, -14.75, 30.34), Z = c(5.72,
3.57, 2.64, 3.28, 3.23, 2.43, 5.92, 3.99, 2.37, 2.85, 5.72, 3.19,
4.61, 5.6, 5.96, 3.72, 3.58, 2.57, 3.12, 4.26)), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"), class = "data.frame")

Remove doubles with no decimal places

I have a vector:
x <- c(0.0, 0.5, 1.000, 1.5, 1.6, 1.7, 1.75, 2.0, 2.4, 2.5, 3.0, 74.0)
How can I extract only the values of x which contain nonzero values after the decimal place? For example, the resultant vector would look like this:
c(0.5, 1.5, 1.6, 1.7, 1.75, 2.4, 2.5)
Which has removed 0.0, 1.000, 2.0, 3.0, and 74.0.
alternatively
x[x %% 1 != 0]
#[1] 0.50 1.50 1.60 1.70 1.75 2.40 2.50
or
x[trunc(x) != x]
#[1] 0.50 1.50 1.60 1.70 1.75 2.40 2.50
or
x[as.integer(x) != x]
#[1] 0.50 1.50 1.60 1.70 1.75 2.40 2.50
or ( now I stop!)
x[grepl("\\.[^0]+$",x)]
#[1] 0.50 1.50 1.60 1.70 1.75 2.40 2.50
:D
We can construct a logical index with round
x[round(x) != x]
#[1] 0.50 1.50 1.60 1.70 1.75 2.40 2.50

Find all the sums of all combinations of 3 rows from 5 columns

I've loaded a table from a .CVS file using
mydata = read.csv("CS2Data.csv") # read csv file
which gave me:
mydata
Date DCM TMUS SKM RCI SPOK
1 11/2/2015 -0.88 -2.16 -1.04 1.12 0.67
2 12/1/2015 1.03 3.26 -2.25 -5.51 -0.23
3 1/4/2016 1.94 1.29 0.13 -1.16 0.11
4 2/1/2016 -0.41 -2.94 0.99 3.93 -0.19
5 3/1/2016 -0.68 1.27 -0.79 -2.06 -0.33
6 4/1/2016 1.82 1.22 -0.05 -1.27 -0.46
7 5/2/2016 -0.36 3.40 0.63 -2.77 0.46
8 6/1/2016 1.94 0.77 0.51 -0.26 1.66
9 7/1/2016 0.12 3.18 1.84 -1.34 -0.67
10 8/1/2016 -1.83 -0.20 -1.10 -0.90 -1.91
11 9/1/2016 0.05 0.31 1.11 0.80 1.17
12 10/3/2016 -0.02 3.19 -0.81 -4.00 0.29
I'd like to find all combination of any 3 of the 5 numbers for each month (row).
I tried using the combn function based on an answer I found here:
combin <- combn(mydata, 3, rowSums, simplify = TRUE)
but that gave me the error-
"Error in FUN(x[a], ...) : 'x' must be numeric"
Next I tried naming each column separately
DCM=mydata[2]
TMUS=mydata[3]
SKM=mydata[4]
RCI=mydata[5]
SPOK=mydata[6]
and then using:
stock_ret <- data.table(DCM, TMUS,SKM,RCI,SPOK)
combin <- combn(stock_ret, 3, rowSums, simplify = TRUE)
I suspect there's an easier way to just use the column headers directly from the .CVS file to do this but I'm stuck.
Get all but the first column with dates (origin of the error in the question):
mydata <- mydata[,-1]
Use combn to calculate selecting 3 columns at a time:
combn(mydata, m = 3, FUN = rowSums, simplify = TRUE)
Example:
> mydata <- iris[1:10,1:4]
> combn(mydata, m = 3, FUN = rowSums, simplify = TRUE)
[,1] [,2] [,3] [,4]
[1,] 10.0 8.8 6.7 5.1
[2,] 9.3 8.1 6.5 4.6
[3,] 9.2 8.1 6.2 4.7
[4,] 9.2 7.9 6.3 4.8
[5,] 10.0 8.8 6.6 5.2
[6,] 11.0 9.7 7.5 6.0
[7,] 9.4 8.3 6.3 5.1
[8,] 9.9 8.6 6.7 5.1
[9,] 8.7 7.5 6.0 4.5
[10,] 9.5 8.1 6.5 4.7
The general logic to apply for any dataframe:
set.seed(1) # for reproducibility
# create a dataframe frame
df <- as.data.frame(matrix(c(rnorm(10), rnorm(10), rnorm(10),rnorm(10),rnorm(10)), nrow=10))
df # show it
# V1 V2 V3 V4 V5
# 1 -0.6264538 1.51178117 0.91897737 1.35867955 -0.1645236
# 2 0.1836433 0.38984324 0.78213630 -0.10278773 -0.2533617
# ...
# 10 -0.3053884 0.59390132 0.41794156 0.76317575 0.8811077
combinations <- combn(5,3) #123 124 125 ...345
# all combination of any 3 of the 5 columns
lapply(1:dim(combinations)[[2]], function(x) {df[combinations[,x]]})
# sums of all combination of any 3 of the 5 columns
lapply(1:dim(combinations)[[2]], function(x) {rowSums(df[combinations[,x]])})
# use "matrix(unlist(...), nrow)" for better presentation and easier later handlings
matrix(unlist(lapply(1:dim(combinations)[[2]], function(x) {rowSums(df[combinations[,x]])})),nrow=nrow(df))
The solution for the specific data of the questioner:
mydata <- as.data.frame(matrix(c(
11/2/2015, -0.88, -2.16, -1.04, 1.12, 0.67,
12/1/2015, 1.03, 3.26, -2.25, -5.51, -0.23,
1/4/2016, 1.94, 1.29, 0.13, -1.16, 0.11,
2/1/2016, -0.41, -2.94, 0.99, 3.93, -0.19,
3/1/2016, -0.68, 1.27, -0.79, -2.06, -0.33,
4/1/2016, 1.82, 1.22, -0.05, -1.27, -0.46,
5/2/2016, -0.36, 3.40, 0.63, -2.77, 0.46,
6/1/2016, 1.94, 0.77, 0.51, -0.26, 1.66,
7/1/2016, 0.12, 3.18, 1.84, -1.34, -0.67,
8/1/2016, -1.83, -0.20, -1.10, -0.90, -1.91,
9/1/2016, 0.05, 0.31, 1.11, 0.80, 1.17,
10/3/2016, -0.02, 3.19, -0.81, -4.00, 0.29), nrow=12, byrow=TRUE))
names(mydata) <- c("Date", "DCM", "TMUS", "SKM", "RCI", "SPOK") # name the columns
mydata # show the dataframe
# Date DCM TMUS SKM RCI SPOK
# 1 0.0027295285 -0.88 -2.16 -1.04 1.12 0.67
# 2 0.0059553350 1.03 3.26 -2.25 -5.51 -0.23
# ............................................
# 12 0.0016534392 -0.02 3.19 -0.81 -4.00 0.29
combinations <- combn(5,3) #123 124 125 ...345
# all combination of any 3 of the 5 columns
lapply(1:dim(combinations)[[2]], function(x) {mydata[,2:6][combinations[,x]]})
# sums of all combination of any 3 of the 5 columns
lapply(1:dim(combinations)[[2]], function(x) {rowSums(mydata[,2:6][combinations[,x]])})
# use "matrix(unlist(...), nrow)" for better presentation and easier later handlings
matrix(unlist(lapply(1:dim(combinations)[[2]], function(x) {rowSums(mydata[,2:6][combinations[,x]])})),nrow=nrow(mydata))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] -4.08 -1.92 -2.37 -0.80 -1.25 0.91 -2.08 -2.53 -0.37 0.75
# [2,] 2.04 -1.22 4.06 -6.73 -1.45 -4.71 -4.50 0.78 -2.48 -7.99
# [3,] 3.36 2.07 3.34 0.91 2.18 0.89 0.26 1.53 0.24 -0.92
# ...............................................................
# [12,] 2.36 -0.83 3.46 -4.83 -0.54 -3.73 -1.62 2.67 -0.52 -4.52
That performs correctly.
Check, e.g., 10th case; 0.75=sum(-1.04, 1.12, 0.67) -7.99=sum(-2.25, -5.51, -0.23) ...

Replacing values in each column independently according to value order in R

I have a matrix:
mat <-structure(c(0.35, 0.27, 0.26, 0.28, 0.23, 0.37, 0.28, 0.27, 0.28,
+ 0.22, 0.34, 0.27, 0.25, 0.25, 0.24, 0.35, 0.27, 0.25, 0.29, 0.27,
+ 0.66, 0.37, 0.49, 0.46, 0.42, 0.64, 0.4, 0.48, 0.45, 0.42, 0.81,
+ 0.39, 0.36, 0.37, 0.36, 0.34, 0.34, 0.43, 0.42, 0.34), .Dim = c(5L,
+ 8L), .Dimnames = list(c("a", "b", "c", "d", "e"), c("f", "g",
+ "h", "i", "j", "k", "l", "m")))
print(mat)
f g h i j k l m
a 0.35 0.37 0.34 0.35 0.66 0.64 0.81 0.34
b 0.27 0.28 0.27 0.27 0.37 0.40 0.39 0.34
c 0.26 0.27 0.25 0.25 0.49 0.48 0.36 0.43
d 0.28 0.28 0.25 0.29 0.46 0.45 0.37 0.42
e 0.23 0.22 0.24 0.27 0.42 0.42 0.36 0.34
For each column I want the lowest k values to be replaced by 0
To achieve this, I used a for loop and ifelse:
k <- 3
for (j in 1:ncol(mat)) { mat[,j][tail(order(mat[,j], decreasing = TRUE, na.last = FALSE), ifelse(nrow(mat)<=k, 0, nrow(mat)-k))] <- 0 }
print(mat)
f g h i j k l m
a 0.35 0.37 0.34 0.35 0.66 0.64 0.81 0.34
b 0.27 0.28 0.27 0.27 0.00 0.00 0.39 0.00
c 0.00 0.00 0.25 0.00 0.49 0.48 0.00 0.43
d 0.28 0.28 0.00 0.29 0.46 0.45 0.37 0.42
e 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
So, it all worked fine but unfortunately the loop is very slow for a large number of columns.
How can I speed up things?
apply seems not to be suitable as I want to the whole matrix returned.
We can use apply with rank
apply(mat, 2, function(x)
replace(x,rank(x, ties.method='first') <k, 0))

Resources