I have a 2-dimensional matrix that I want to find its unique rows. For example, if
M=[1 2 3;
4 7 8;
1 2 3;
0 2 3]
then, the output of the Base.unique command,
Base.unique(M, dims=1)
is the matrix
[1 2 3;
4 7 8;
0 2 3]
which is the one that I am looking for. However, Base.unique seems to be quite slow when the size of the matrix is large. Is there any other faster option?
PS: The ThreadsX module has a similar function, ThreadsX.unique, which, according to this link, is faster than Base.unique. However, it seems that it only accepts one-dimensional vectors (or at least that's what I have inffered from that function).
Finding the unique elements in an array is just a fundamentally algorithmically hard problem. Most naive implementations would be O(n^2), and while you can do it in O(n) time for a single list that is already sorted, well, then you have to sort first, and even quicksort is, as we know, no better than O(nlog(n)) on average.
I spent a bit of time trying to cook up something that beats Base.unique but I could only manage a few percent even in the best case so far; usually, I can only beat base if I can get the problem to make better use of your CPU's SIMD instructions (e.g. AVX on x86 or Neon on ARM) with LoopVectorization.jl, and this particular problem is just not very SIMD-friendly.
There is one trick I can think of which might be able to help you a bit here though. If unique is hard, then unique on rows of a matrix is, in technical terms, a pain in the a** to optimize. For example, consider the performance difference of:
julia> using BenchmarkTools
julia> M = rand(1:20,10^6,4)
1000000×4 Matrix{Int64}:
17 9 17 4
2 19 19 18
13 9 14 7
⋮
19 8 2 17
20 20 5 9
julia> #benchmark unique($M, dims=1)
BenchmarkTools.Trial: 58 samples with 1 evaluation.
Range (min … max): 67.044 ms … 105.911 ms ┊ GC (min … max): 0.00% … 11.76%
Time (median): 85.795 ms ┊ GC (median): 0.00%
Time (mean ± σ): 86.759 ms ± 10.448 ms ┊ GC (mean ± σ): 3.87% ± 5.09%
█ █ ▁ ▁ ▁▄▁ ▄ ▁ ▄ ▁ ▁ ▁▁▁ ▁
▆▆▁▆▆▁▁▁▁▁▆█▁█▁▁▆█▁▁█▁███▁▁█▆▆█▁▁▆▆▆▆▁▁▁█▆█▁█▁███▁█▆▆▆▁▁▁▆▁▆ ▁
67 ms Histogram: frequency by time 106 ms <
Memory estimate: 27.14 MiB, allocs estimate: 47.
julia> V = prod(M, dims=2)
1000000×1 Matrix{Int64}:
10404
12996
11466
⋮
5168
18000
julia> #benchmark unique($V)
BenchmarkTools.Trial: 555 samples with 1 evaluation.
Range (min … max): 7.739 ms … 12.680 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.512 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.989 ms ± 1.044 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▆█▁▃▄
▃▄▅▇████████▅▄▄▄▅▅▅▃▃▃▄▄▅▄▅▃▅▄▅▄▃▄▄▃▄▃▂▄▃▁▃▂▃▂▃▃▃▃▃▃▂▃▃▃▃▃ ▃
7.74 ms Histogram: frequency by time 11.9 ms <
Memory estimate: 103.06 KiB, allocs estimate: 24.
about an order of magnitude. So, if there is any way you can make your matrix into a vector, that could help you a lot.
In particular, if the numbers in your real matrix can fit into a smaller type than Int64 or Float64 and there are only a few of them in each row, then you can potentially reversibly reinterpret each row as a 64-bit type:
julia> M = Int16.(M)
1000000×4 Matrix{Int16}:
17 9 17 4
2 19 19 18
13 9 14 7
8 1 2 12
8 16 19 13
⋮
12 19 2 15
4 5 10 14
19 8 2 17
20 20 5 9
julia> Mᵥ = vec(reinterpret(UInt64, M'))
1000000-element reshape(reinterpret(UInt64, adjoint(::Matrix{Int16})), 1000000) with eltype UInt64:
0x0004001100090011
0x0012001300130002
0x0007000e0009000d
0x000c000200010008
0x000d001300100008
⋮
0x000f00020013000c
0x000e000a00050004
0x0011000200080013
0x0009000500140014
julia> reinterpret(Int16, unique(Mᵥ)')'
159674×4 adjoint(reinterpret(Int16, adjoint(::Vector{UInt64}))) with eltype Int16:
17 9 17 4
2 19 19 18
13 9 14 7
8 1 2 12
8 16 19 13
⋮
17 5 5 11
1 18 14 17
2 20 5 9
15 17 14 4
N.B. that you have to make sure the number of bits in each row adds up to the right number (padding if you have to), and you have to transpose first, because Julia is column-major!
However because reinterpret is basically free, this whole operation can still end up being significantly faster than unique-by-rows:
julia> #benchmark unique($M, dims=1)
BenchmarkTools.Trial: 65 samples with 1 evaluation.
Range (min … max): 60.212 ms … 93.517 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 78.691 ms ┊ GC (median): 0.00%
Time (mean ± σ): 77.909 ms ± 6.788 ms ┊ GC (mean ± σ): 3.85% ± 5.57%
▁ ▃ ▁ ▃▃▁ █ ▁ ▁
▄▁▁▁▄▁▁▁▁▁▁▁▁▄▄▄▄▄▇▇▄▁▄▁▄█▇▁▁▁▄▄▄▄▇█▄█▇▁███▄▄▄█▁▁▁▄▇▁▁▁█▁▁█ ▁
60.2 ms Histogram: frequency by time 89.1 ms <
Memory estimate: 23.48 MiB, allocs estimate: 47.
julia> #benchmark reinterpret(Int16,unique(vec(reinterpret(UInt64, $M')))')'
BenchmarkTools.Trial: 137 samples with 1 evaluation.
Range (min … max): 30.070 ms … 46.186 ms ┊ GC (min … max): 0.00% … 15.62%
Time (median): 36.334 ms ┊ GC (median): 0.00%
Time (mean ± σ): 36.433 ms ± 3.491 ms ┊ GC (mean ± σ): 1.65% ± 4.51%
▁ ▁ ▃ ▁ █▆ ▁▁▃▆ ▁▃▆▁▁▃▁█▃█▁▆▃▁ █ ▃
▄▁█▇▇█▇█▇█▇██▇████▇██████████████▇█▇▁▄▁▄▄█▁▁▇▁▄▄▄▁▁▄▁▄▁▁▄▁▄ ▄
30.1 ms Histogram: frequency by time 46.2 ms <
Memory estimate: 5.96 MiB, allocs estimate: 44.
Since we've made your matrix into a vector, now in principle we could also substitute ThreadsX.unique, but in my tests so far this turned out not to be faster:
julia> #benchmark reinterpret(Int16,ThreadsX.unique(vec(reinterpret(UInt64, $M')))')'
BenchmarkTools.Trial: 56 samples with 1 evaluation.
Range (min … max): 69.633 ms … 136.835 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 80.427 ms ┊ GC (median): 0.00%
Time (mean ± σ): 89.547 ms ± 18.360 ms ┊ GC (mean ± σ): 10.91% ± 13.49%
▄ █ ▁
▆▆▁▄█▇▆█▇▁▇▇▁▆▁▄▁▁▁▁▁▁▁▁▄▄▁▁▁▄▁▁▆▁▄▄▁▁▁▁▄█▆▁▄▄▁▁▁▁▄▁▁▄▁▁▁▁▁▄ ▁
69.6 ms Histogram: frequency by time 131 ms <
Memory estimate: 121.28 MiB, allocs estimate: 3310.
Related
I have 300 stocks (here for example i show you 5), how can i create an equally weighted portfolio and then backtest it ?
Book1
# A tibble: 3,385 x 6
...1 AAA BBB CCC DDD EEE
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2007-02-08 00:00:00 100 100 100 100 100
2 2007-02-09 00:00:00 100. 100. 100. 100. 101.
3 2007-02-12 00:00:00 100. 100. 100. 100. 101.
4 2007-02-13 00:00:00 99.9 99.9 100. 99.9 100.
5 2007-02-14 00:00:00 100. 100. 99.9 100. 99.9
6 2007-02-15 00:00:00 100. 100. 99.9 100. 99.5
7 2007-02-16 00:00:00 100. 100. 100. 100. 100.
8 2007-02-20 00:00:00 100. 100. 99.9 100. 100.
9 2007-02-21 00:00:00 101. 100. 100. 100. 101.
10 2007-02-22 00:00:00 101. 101. 100. 100. 102.
# ... with 3,375 more rows
Could you help me ? i tried to follow other posts but it seems not working when creating the portfolio, and as a consequence impossible to do some backtesting
There a different packages that could help you running a backtest. Which is most appropriate (and whether you will want to use a package at all) will depend on how fine-grained a backtest you want to run.
Here is one example, using the PMwR package (which I maintain).
I start by creating a dataset of five assets, using data from Kenneth French's website.
library("PMwR")
library("NMOF")
P <- French(tempdir(),
"5_Industry_Portfolios_daily_CSV.zip",
frequency = "daily",
price.series = TRUE)
head(P)
## Cnsmr Manuf HiTec Hlth Other
## 1926-06-30 1.000000 1.000000 1.000000 1.000000 1.000000
## 1926-07-01 0.999200 1.002200 0.998900 1.009700 1.002100
## 1926-07-02 1.003796 1.009115 1.001997 1.011013 1.003202
## 1926-07-06 1.006507 1.011941 1.005203 1.013338 1.001296
## 1926-07-07 1.006406 1.013054 1.006409 1.016682 1.002798
## 1926-07-08 1.008821 1.013966 1.010234 1.025934 1.006709
These five series are now stored in a data frame named P.
Running a backtest for an equal-weight portfolio could look as follows:
bt <- btest(prices = list(as.matrix(P)),
timestamp = as.Date(row.names(P)),
signal = function(k) rep(1/k, k),
do.signal = "lastofquarter",
initial.cash = 100,
convert.weights = TRUE,
k = 5)
Results:
journal(bt)
## instrument timestamp amount price
## 1 Cnsmr 1926-09-30 18.13189758568 1.1082127
## 2 Manuf 1926-09-30 19.15734113773 1.0465962
## 3 HiTec 1926-09-30 19.00858248070 1.0538398
## 4 Hlth 1926-09-30 18.63527183032 1.0685114
## 5 Other 1926-09-30 18.75046122697 1.0696270
## 6 Cnsmr 1926-12-31 -0.15078058427 1.1441818
## 7 Manuf 1926-12-31 -0.03046886314 1.0757494
## ....
summary(as.NAVseries(bt))
## ---------------------------------------------------------
## 30 Jun 1926 ==> 29 Jan 2021 (24,916 data points, 0 NAs)
## 100 1528568
## ---------------------------------------------------------
## High 1590130.44 (20 Jan 2021)
## Low 43.43 (08 Jul 1932)
## ---------------------------------------------------------
## Return (%) 10.7 (annualised)
## ---------------------------------------------------------
## Max. drawdown (%) 82.3
## _ peak 245.20 (03 Sep 1929)
## _ trough 43.43 (08 Jul 1932)
## _ recovery (13 Jun 1944)
## _ underwater now (%) 3.9
## ---------------------------------------------------------
## Volatility (%) 18.1 (annualised)
## _ upside 14.4
## _ downside 11.5
## ---------------------------------------------------------
##
## Monthly returns ▁▁▆█▁▁▁
##
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec YTD
## 1926 0.0 0.0 0.0 -2.4 3.4 2.2 3.1
## 1927 1.2 4.0 1.2 1.5 5.5 -1.2 8.0 1.9 5.0 -2.2 6.0 1.7 37.2
## 1928 0.1 -1.7 9.7 3.5 3.4 -4.2 1.0 8.4 1.7 1.2 10.5 0.7 38.8
## 1929 5.5 -0.2 -0.2 1.6 -5.5 9.2 5.1 7.6 -5.7 -18.8 -11.3 1.7 -14.1
## 1930 5.0 3.2 6.7 -2.3 -1.1 -14.6 4.6 2.1 -11.4 -8.0 -2.6 -7.3 -24.9
## 1931 7.2 10.0 -4.7 -8.0 -12.5 13.1 -4.9 0.1 -28.8 8.0 -8.8 -11.2 -39.4
## ....
## 2020 -0.5 -8.0 -12.7 13.4 5.0 1.6 5.4 6.6 -3.2 -2.2 12.3 4.4 20.6
## 2021 0.0 0.0
As I said, there are many different ways, and a number of decisions you'll have to take (transaction costs, how often to rebalance, ...); but I hope the example gets you started.
I have two XTS objects; one with a time series of deciles, one a time series of returns, sort of like the below. How can I create a time series of average returns for each month grouped by Decile (depicted in the Decile Average Return table).
Decile Series
A B C D
20180331 7 3 3
20180430 4 2 2
20180531 1 8 3 8
20180630 2 4 4 1
20180731 3 9 9
20180831 6 4 9
Return Series
A B C D
20180331 0.50% -4.80% NA 1.60%
20180430 1.50% -5.00% NA 0.10%
20180531 -1.80% 1.00% 1.80% 0.10%
20180630 -1.08% 2.00% 1.75% -2.00%
20180731 NA 1.50% 3.02% -1.50%
20180831 NA 1.00% 0.80% 1.00%
Decile Average Return
Date Min 1 2 3 4 5 6 7 8 9 Max
20180331 -4.80% 0.00% 0.00% -1.60% 0.00% 0.00% 0.00% 0.50% 0.00% 0.00% 1.60%
20180430 -5.00% 0.00% -2.45% 0.00% 1.50% 0.00% 0.00% 0.00% 0.00% 0.00% 1.50%
20180531 -1.80% -1.80% 0.00% 1.80% 0.00% 0.00% 0.00% 0.00% 0.55% 0.00% 1.80%
20180630 -2.00% -2.00% -1.08% 0.00% 1.87% 0.00% 0.00% 0.00% 0.00% 0.00% 2.00%
20180731 -1.50% 0.00% 0.00% 1.50% 0.00% 0.00% 0.00% 0.00% 0.00% 0.76% 3.02%
20180831 0.80% 0.00% 0.00% 0.00% 0.80% 0.00% 1.00% 0.00% 0.00% 1.00% 1.00%
I ended up find the answer through a for loop. x is equal to Decile Series, and y = Return Series
DecileCharacteristic <- function(x, y){
w <- xts(order.by=index(x))
for (i in 1:9){
r <- x == i
z <- y * r
z[z==0]<- NA
w <- merge(w,xts(apply(z,1,mean, na.rm=TRUE),order.by=index(x)))
}
w <- merge(w,xts(apply(z,1,max, na.rm=TRUE),order.by=index(x)))
w <- merge(w,xts(apply(z,1,min, na.rm=TRUE),order.by=index(x)))
colnames(w) <- c("1","2","3","4","5","6","7","8","9","max","min")
}
Hello currently I am using Infiniband and testing the performance with IMB-benchmark, I'am currently testing the parallel transfer test
and was wondering the results indeed reflect the parallel performance of the 8 processes.
The explanation of the results is too vague for me to understand.
Since ( 6 additional processes waiting in MPI_Barrier) is mentioned in every result, I suspect that it only runs 2 process each?
The throughput column t_avg[usec] result seems to get the proper result, but I need to make it sure that I am understanding correctly.
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
Is this passage above mean that I am running 8 processes parallel?
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
and this passage means that 4 processes are running on parallel?
Help from someone who is familiar with the IMB-benchmark is greatly appreciated thanks
Here is the full result below
# np - 8
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date : Mon Oct 16 14:14:20 2017
# Machine : x86_64
# System : Linux
# Release : 4.4.0-96-generic
# Version : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version : 3.0
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-MPI1 Sendrecv Exchange
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Sendrecv
# Exchange
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 13.85 13.85 13.85 0.00
1 1000 12.22 12.22 12.22 0.16
2 1000 10.08 10.08 10.08 0.40
4 1000 9.43 9.43 9.43 0.85
8 1000 8.89 8.91 8.90 1.80
16 1000 8.70 8.71 8.71 3.67
32 1000 9.00 9.00 9.00 7.11
64 1000 8.82 8.82 8.82 14.51
128 1000 8.90 8.90 8.90 28.77
256 1000 8.98 8.98 8.98 56.99
512 1000 9.78 9.78 9.78 104.75
1024 1000 12.65 12.65 12.65 161.91
2048 1000 18.31 18.32 18.31 223.63
4096 1000 20.05 20.05 20.05 408.52
8192 1000 21.15 21.16 21.16 774.11
16384 1000 27.46 27.47 27.46 1193.05
32768 1000 36.93 36.94 36.93 1774.31
65536 640 60.56 60.59 60.57 2163.39
131072 320 117.62 117.63 117.63 2228.57
262144 160 202.67 202.68 202.67 2586.78
524288 80 323.86 324.28 324.07 3233.56
1048576 40 615.05 615.47 615.26 3407.42
2097152 20 1214.74 1216.89 1215.82 3446.74
4194304 10 2471.83 2488.45 2480.14 3371.02
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 11.14 11.15 11.15 0.00
1 1000 11.16 11.16 11.16 0.18
2 1000 11.11 11.12 11.12 0.36
4 1000 11.10 11.11 11.10 0.72
8 1000 11.03 11.04 11.03 1.45
16 1000 11.21 11.22 11.22 2.85
32 1000 11.81 11.81 11.81 5.42
64 1000 11.58 11.58 11.58 11.05
128 1000 11.77 11.78 11.78 21.72
256 1000 11.88 11.89 11.89 43.05
512 1000 13.03 13.03 13.03 78.57
1024 1000 14.73 14.74 14.74 138.92
2048 1000 19.37 19.39 19.38 211.24
4096 1000 21.31 21.34 21.33 383.96
8192 1000 26.19 26.22 26.20 624.84
16384 1000 32.65 32.69 32.67 1002.26
32768 1000 48.71 48.78 48.75 1343.52
65536 640 75.14 75.22 75.18 1742.63
131072 320 174.66 175.15 174.94 1496.65
262144 160 301.22 302.02 301.44 1735.95
524288 80 539.40 542.68 540.78 1932.21
1048576 40 1015.45 1026.34 1020.59 2043.32
2097152 20 1959.53 1985.57 1971.34 2112.39
4194304 10 3549.00 3641.61 3590.76 2303.55
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.81 12.83 12.82 0.00
1 1000 12.82 12.84 12.83 0.16
2 1000 12.73 12.75 12.74 0.31
4 1000 12.82 12.85 12.84 0.62
8 1000 12.87 12.88 12.87 1.24
16 1000 12.83 12.86 12.84 2.49
32 1000 13.25 13.28 13.26 4.82
64 1000 13.44 13.46 13.45 9.51
128 1000 13.49 13.51 13.50 18.94
256 1000 13.72 13.74 13.73 37.27
512 1000 13.69 13.71 13.70 74.72
1024 1000 15.73 15.75 15.74 130.07
2048 1000 20.72 20.76 20.74 197.28
4096 1000 22.68 22.74 22.72 360.28
8192 1000 29.48 29.52 29.50 555.04
16384 1000 39.89 39.95 39.92 820.31
32768 1000 57.38 57.48 57.43 1140.24
65536 640 95.23 95.34 95.29 1374.78
131072 320 214.61 215.16 214.83 1218.38
262144 160 365.75 368.39 367.28 1423.18
524288 80 679.82 687.10 683.13 1526.08
1048576 40 1277.18 1309.22 1295.65 1601.83
2097152 20 2292.99 2377.56 2339.35 1764.12
4194304 10 4617.95 4919.67 4778.37 1705.12
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.41 12.42 12.42 0.00
1 1000 12.47 12.48 12.47 0.32
2 1000 11.93 11.94 11.94 0.67
4 1000 11.95 11.96 11.95 1.34
8 1000 11.91 11.92 11.92 2.69
16 1000 11.97 11.98 11.97 5.34
32 1000 12.80 12.81 12.80 10.00
64 1000 12.84 12.84 12.84 19.93
128 1000 12.90 12.91 12.91 39.67
256 1000 12.90 12.91 12.91 79.34
512 1000 14.04 14.04 14.04 145.82
1024 1000 17.13 17.14 17.13 239.02
2048 1000 21.06 21.06 21.06 389.05
4096 1000 23.32 23.33 23.32 702.41
8192 1000 28.07 28.07 28.07 1167.45
16384 1000 37.81 37.82 37.82 1732.64
32768 1000 55.23 55.24 55.24 2372.75
65536 640 101.04 101.06 101.05 2593.84
131072 320 212.88 212.88 212.88 2462.84
262144 160 362.37 362.38 362.37 2893.62
524288 80 668.88 668.89 668.88 3135.26
1048576 40 1286.48 1287.81 1287.15 3256.92
2097152 20 2463.56 2464.13 2463.84 3404.29
4194304 10 4845.24 4854.75 4849.99 3455.83
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 16.46 16.46 16.46 0.00
1 1000 16.42 16.43 16.42 0.24
2 1000 16.17 16.17 16.17 0.49
4 1000 16.17 16.17 16.17 0.99
8 1000 16.19 16.20 16.20 1.98
16 1000 16.21 16.22 16.22 3.94
32 1000 17.20 17.21 17.20 7.44
64 1000 17.09 17.10 17.10 14.97
128 1000 17.24 17.25 17.25 29.68
256 1000 17.40 17.41 17.40 58.83
512 1000 17.59 17.61 17.60 116.32
1024 1000 21.43 21.45 21.44 190.95
2048 1000 29.49 29.50 29.49 277.71
4096 1000 31.63 31.66 31.64 517.58
8192 1000 36.70 36.72 36.71 892.41
16384 1000 49.50 49.53 49.52 1323.07
32768 1000 68.35 68.36 68.36 1917.38
65536 640 108.80 108.85 108.82 2408.31
131072 320 314.38 314.72 314.56 1665.91
262144 160 521.71 522.24 521.94 2007.84
524288 80 930.03 933.47 931.82 2246.62
1048576 40 1729.81 1738.30 1734.66 2412.87
2097152 20 3384.33 3414.99 3403.61 2456.41
4194304 10 6972.50 7058.12 7028.16 2377.01
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 18.91 18.93 18.92 0.00
1 1000 19.06 19.08 19.07 0.21
2 1000 18.91 18.92 18.92 0.42
4 1000 19.07 19.09 19.08 0.84
8 1000 18.81 18.83 18.82 1.70
16 1000 19.02 19.03 19.03 3.36
32 1000 19.85 19.85 19.85 6.45
64 1000 19.76 19.78 19.77 12.94
128 1000 19.94 19.96 19.95 25.65
256 1000 20.16 20.18 20.17 50.75
512 1000 20.50 20.51 20.50 99.86
1024 1000 24.52 24.55 24.54 166.83
2048 1000 36.35 36.39 36.37 225.14
4096 1000 38.77 38.81 38.79 422.20
8192 1000 44.79 44.82 44.81 731.12
16384 1000 59.28 59.33 59.31 1104.68
32768 1000 86.39 86.47 86.42 1515.87
65536 640 142.47 142.60 142.53 1838.29
131072 320 402.11 402.98 402.57 1301.04
262144 160 648.90 650.30 649.68 1612.44
524288 80 1209.17 1213.71 1211.74 1727.89
1048576 40 2332.69 2355.17 2344.35 1780.89
2097152 20 4686.88 4767.48 4733.77 1759.55
4194304 10 9457.18 9674.69 9567.31 1734.13
# All processes entering MPI_Finalize
The IMB benchmark test all at once
various MPI subroutines (MPI_Sendrecv and MPI_Exchange here)
various message sizes (from 0 to 4MB here)
various communicator sizes (2, 4 and 8 here)
Since mpirun is invoked once with -np 8, it means there 8 MPI tasks are created.
So when testing a size 2 communicator, an extra size 6 communicator is created under the hood, and its 6 MPI tasks are simply hanging in MPI_Barrier, hence the message
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
for monitoring purpose of system, i need to redirect the output of top command in a file so i will use/parse it.
i am trying to do same thing but CPU performance stats are not getting saved in a file see
screen shots.
expected output:
[root#v100 /usr/local/bin]# top
last pid: 6959; load averages: 0.01, 0.03, 0.03 up 0+02:47:34 17:51:16
114 processes: 1 running, 108 sleeping, 5 zombie
CPU: 0.0% user, 0.0% nice, 1.6% system, 0.0% interrupt, 98.4% idle
Mem: 734M Active, 515M Inact, 226M Wired, 212M Buf, 491M Free
Swap: 4095M Total, 4095M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1953 root 150 20 0 3084M 635M uwait 2:44 0.00% java
1663 mysql 46 20 0 400M 139M sbwait 1:29 0.00% mysqld
1354 root 31 20 0 94020K 50796K uwait 0:24 0.00% beam
4233 root 1 20 0 122M 23940K select 0:06 0.00% python
1700 zabbix 1 20 0 20096K 2436K nanslp 0:03 0.00% zabbix_agentd
1799 zabbix 1 20 0 103M 7240K nanslp 0:02 0.00% zabbix_server
4222 root 1 30 0 122M 23300K select 0:02 0.00% python
1696 zabbix 1 20 0 19968K 2424K nanslp 0:02 0.00% zabbix_agentd
2853 root 1 20 0 126M 29780K select 0:02 0.00% python
1793 zabbix 1 20 0 103M 7152K nanslp 0:01 0.00% zabbix_server
1797 zabbix 1 20 0 103M 8348K nanslp 0:01 0.00% zabbix_server
1752 root 1 20 0 122M 22344K select 0:01 0.00% python
1796 zabbix 1 20 0 103M 8136K nanslp 0:01 0.00% zabbix_server
1795 zabbix 1 20 0 103M 8208K nanslp 0:01 0.00% zabbix_server
1801 zabbix 1 20 0 103M 7100K nanslp 0:01 0.00% zabbix_server
3392 root 1 20 0 122M 23392K select 0:01 0.00% python
1798 zabbix 1 20 0 103M 7860K nanslp 0:01 0.00% zabbix_server
2812 root 1 20 0 134M 25184K select 0:01 0.00% python
1791 zabbix 1 20 0 103M 7188K nanslp 0:01 0.00% zabbix_server
1827 root 1 -52 r0 14368K 1400K nanslp 0:01 0.00% watchdogd
1790 zabbix 1 20 0 103M 7164K nanslp 0:01 0.00% zabbix_server
1778 zabbix 1 20 0 103M 8608K nanslp 0:01 0.00% zabbix_server
1780 zabbix 1 20 0 103M 8608K nanslp 0:01 0.00% zabbix_server
2928 root 1 20 0 122M 23272K select 0:01 0.00% python
2960 root 1 20 0 116M 22288K select 0:01 0.00% python
1776 zabbix 1 20 0 103M 7248K nanslp 0:01 0.00% zabbix_server
2892 root 1 20 0 122M 22648K select 0:01 0.00% python
1789 zabbix 1 20 0 103M 7128K nanslp 0:01 0.00% zabbix_server
1814 root 1 20 0 216M 15796K select 0:01 0.00% httpd
1779 zabbix 1 20 0 103M 8608K nanslp 0:01 0.00% zabbix_server
1783 zabbix 1 20 0 103M 8608K nanslp 0:01 0.00% zabbix_server
1800 zabbix 1 20 0 103M 7124K nanslp 0:01 0.00% zabbix_server
1782 zabbix 1 20 0 103M 8608K nanslp 0:01 0.00% zabbix_server
1781 zabbix 1 20 0 103M 8608K nanslp 0:00 0.00% zabbix_server
1792 zabbix 1 20 0 103M 7172K nanslp 0:00 0.00% zabbix_server
2259 root 2 20 0 48088K 4112K uwait 0:00 0.00% cb_heuristics
If i do:
[root#v100 /usr/local/bin]# top > /tmp/top.output
then it shows:
[root#v100 /usr/local/bin]# cat /tmp/top.output
last pid: 7080; load averages: 0.09, 0.06, 0.03 up 0+02:52:24 17:56:06
114 processes: 1 running, 108 sleeping, 5 zombie
Mem: 731M Active, 515M Inact, 219M Wired, 212M Buf, 501M Free
Swap: 4095M Total, 4095M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1953 root 150 20 0 3084M 633M uwait 2:17 0.00% java
1663 mysql 46 20 0 400M 136M sbwait 1:08 0.00% mysqld
1354 root 31 20 0 94020K 49924K uwait 0:18 0.00% beam
4233 root 1 20 0 122M 23776K select 0:04 0.00% python
1700 zabbix 1 20 0 20096K 2436K nanslp 0:02 0.00% zabbix_agentd
1799 zabbix 1 20 0 103M 7240K nanslp 0:01 0.00% zabbix_server
2853 root 1 20 0 126M 29780K select 0:01 0.00% python
1696 zabbix 1 20 0 19968K 2424K nanslp 0:01 0.00% zabbix_agentd
4222 root 1 28 0 122M 23264K select 0:01 0.00% python
1793 zabbix 1 20 0 103M 7152K nanslp 0:01 0.00% zabbix_server
1752 root 1 20 0 122M 22344K select 0:01 0.00% python
1797 zabbix 1 20 0 103M 8088K nanslp 0:01 0.00% zabbix_server
1796 zabbix 1 20 0 103M 7944K nanslp 0:01 0.00% zabbix_server
1795 zabbix 1 20 0 103M 8044K nanslp 0:01 0.00% zabbix_server
1801 zabbix 1 20 0 103M 7100K nanslp 0:01 0.00% zabbix_server
3392 root 1 20 0 122M 23312K select 0:01 0.00% python
2812 root 1 20 0 134M 25184K select 0:01 0.00% python
1798 zabbix 1 20 0 103M 7628K nanslp 0:01 0.00% zabbix_server
so here, I am able to monitor Memory but not CPU
reason is during redirect output of top CPU stats did not update
How can i capture CPU stats also?
if you have any suggestion pls tell me.
top -b -n 1 seems to work on my Linux box here (-b: batch mode operation, -n: number of iterations).
Edit:
I just tried it on FreeBSD 9.2 which uses the 3.5beta12 version of top. It seems it needs at least one additional iteration to get CPU stats. So you might want to use:
top -b -d2 -s1 | sed -e '1,/USERNAME/d' | sed -e '1,/^$/d'
-b: batch mode, -d2: 2 displays (the first one does not contain CPU stats, second one does), -s1: wait one seconds between displays
The sed pipeline removes the first display which does not contain CPU stats (by skipping header and process list).
Output of # top -o size
last pid: 61935; load averages: 0.82, 0.44, 0.39 up 10+13:28:42 16:49:43
152 processes: 2 running, 150 sleeping
CPU: 10.3% user, 0.0% nice, 1.8% system, 0.2% interrupt, 87.7% idle
Mem: 5180M Active, 14G Inact, 2962M Wired, 887M Cache, 2465M Buf, 83M Free
Swap: 512M Total, 26M Used, 486M Free, 5% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1471 mysql 62 44 0 763M 349M ucond 3 222:19 74.76% mysqld
1171 root 4 44 0 645M 519M sbwait 0 20:56 3.86% tfs
41173 root 4 44 0 629M 516M sbwait 4 19:17 0.59% tfs
41350 root 4 44 0 585M 467M sbwait 7 15:17 0.10% tfs
36382 root 4 45 0 581M 401M sbwait 1 206:50 0.10% tfs
41157 root 4 44 0 551M 458M sbwait 5 16:23 0.98% tfs
36401 root 4 45 0 199M 108M uwait 2 17:50 0.00% tfs
36445 root 4 44 0 199M 98M uwait 4 20:11 0.00% tfs
36420 root 4 45 0 191M 98M uwait 4 19:57 0.00% tfs
3491 root 9 45 0 79320K 41292K uwait 4 40:22 0.00% tfs_db
40690 root 1 44 0 29896K 4104K select 1 0:05 0.00% sshd
44636 root 1 44 0 29896K 3896K select 4 0:00 0.00% sshd
22224 root 1 44 0 29896K 3848K select 6 0:00 0.00% sshd
42956 root 1 44 0 29896K 3848K select 4 0:00 0.00% sshd
909 bind 11 76 0 27308K 14396K kqread 1 0:00 0.00% named
1586 root 1 44 0 26260K 3464K select 4 0:00 0.00% sshd
40590 root 4 45 0 23480K 7592K uwait 1 5:11 0.00% auth
1472 root 1 44 0 22628K 8776K select 0 0:41 0.00% perl5.8.9
22229 root 1 44 0 20756K 2776K select 0 0:00 0.00% sftp-server
42960 root 1 44 0 20756K 2772K select 2 0:00 0.00% sftp-server
44638 root 1 44 0 10308K 2596K pause 2 0:00 0.00% csh
42958 root 1 47 0 10308K 1820K pause 3 0:00 0.00% csh
22227 root 1 48 0 10308K 1820K pause 0 0:00 0.00% csh
36443 root 1 57 0 10248K 1792K wait 0 0:00 0.00% bash
36418 root 1 51 0 10248K 1788K wait 2 0:00 0.00% bash
41171 root 1 63 0 10248K 1788K wait 0 0:00 0.00% bash
36399 root 1 50 0 10248K 1784K wait 2 0:00 0.00% bash
41155 root 1 56 0 10248K 1784K wait 0 0:00 0.00% bash
40588 root 1 76 0 10248K 1776K wait 6 0:00 0.00% bash
36380 root 1 50 0 10248K 1776K wait 2 0:00 0.00% bash
41348 root 1 54 0 10248K 1776K wait 0 0:00 0.00% bash
1169 root 1 54 0 10248K 1772K wait 0 0:00 0.00% bash
3485 root 1 76 0 10248K 1668K wait 4 0:00 0.00% bash
61934 root 1 44 0 9372K 2356K CPU4 4 0:00 0.00% top
1185 mysql 1 76 0 8296K 1356K wait 3 0:00 0.00% sh
1611 root 1 44 0 7976K 1372K nanslp 0 0:08 0.00% cron
824 root 1 44 0 7048K 1328K select 0 0:03 0.00% syslogd
1700 root 1 76 0 6916K 1052K ttyin 3 0:00 0.00% getty
1703 root 1 76 0 6916K 1052K ttyin 2 0:00 0.00% getty
1702 root 1 76 0 6916K 1052K ttyin 5 0:00 0.00% getty
1706 root 1 76 0 6916K 1052K ttyin 0 0:00 0.00% getty
1705 root 1 76 0 6916K 1052K ttyin 1 0:00 0.00% getty
1701 root 1 76 0 6916K 1052K ttyin 6 0:00 0.00% getty
1707 root 1 76 0 6916K 1052K ttyin 4 0:00 0.00% getty
1704 root 1 76 0 6916K 1052K ttyin 7 0:00 0.00% getty
490 root 1 44 0 3204K 556K select 1 0:00 0.00% devd
My game server lag so much and I have noticed that there is only 83M of free ram.
Its not just top because I have also tried to use other app:
# /usr/local/bin/freem
SYSTEM MEMORY INFORMATION:
mem_wire: 3104976896 ( 2961MB) [ 12%] Wired: disabled for paging out
mem_active: + 5440778240 ( 5188MB) [ 21%] Active: recently referenced
mem_inactive:+ 15324811264 ( 14614MB) [ 61%] Inactive: recently not referenced
mem_cache: + 1015689216 ( 968MB) [ 4%] Cached: almost avail. for allocation
mem_free: + 86818816 ( 82MB) [ 0%] Free: fully available for allocation
mem_gap_vm: + 946176 ( 0MB) [ 0%] Memory gap: UNKNOWN
-------------- ------------ ----------- ------
mem_all: = 24974020608 ( 23817MB) [100%] Total real memory managed
mem_gap_sys: + 772571136 ( 736MB) Memory gap: Kernel?!
-------------- ------------ -----------
mem_phys: = 25746591744 ( 24553MB) Total real memory available
mem_gap_hw: + 23212032 ( 22MB) Memory gap: Segment Mappings?!
-------------- ------------ -----------
mem_hw: = 25769803776 ( 24576MB) Total real memory installed
SYSTEM MEMORY SUMMARY:
mem_used: 9342484480 ( 8909MB) [ 36%] Logically used memory
mem_avail: + 16427319296 ( 15666MB) [ 63%] Logically available memory
-------------- ------------ ----------- ------
mem_total: = 25769803776 ( 24576MB) [100%] Logically total memory
As you can see, the output is similar:
mem_free: + 86818816 ( 82MB) [ 0%] Free: fully available for allocation.
My dedicated has 24GB of RAM and it's pretty much for my game server.
How can I find out which process is eating that amount of memory?
I am using FreeBSD 8.2.
According to top's output, you are only using 5% of your swap. This means, you are not short on RAM -- whatever is slowing you down, it is not the memory shortage. If anything, I'd be suspecting mysqld -- not only was it quite busy, when you took the snapshot, it also accumulated quite a bit of CPU-time prior to that.
Perhaps, some frequently-running queries can be helped by a new index or two?