I'm having performace issues with a query in Gremlin using AWS Neptune graph database.
This is the scenario:
Basically, there are 5000+ users connected by the same IP node.
I want all users that have a connection with the ip node at a date that matches one of the dates of the connections from user-1 with a window of 1 day. For example, starting from user-1 I want to find only user-2 and user-4.
I already have a query that works, thanks to the responses in this question I posted a while back, and looks something like this:
g.V('user-1')
.outE().as('ip_edges')
.inV().inE('uses_ip').as('related')
.where(P.lte('related')).by(math('ip_edges - 86400000').by('date_millis')).by('date_millis')
.where(P.gte('related')).by(math('ip_edges + 86400000').by('date_millis')).by('date_millis')
.outV()
But I'm experiencing performance issues in this scenario because the query is traversing through all of the 5000+ edges of the ip node.
I understand that Neptune has indexes that should allow me to filter edges by the property date_millis without having to go through all 5000+ edges. But I'm failing to write a query that actually uses those indexes.
This is how the profiling of the query looks like (the node ids are a bit different because i simplified it for the example here):
*******************************************************
Neptune Gremlin Profile
*******************************************************
Query String
==================
g.V('user-lt1001').outE().as('ip_edges').inV().inE('uses_ip').as('related').where(P.lte('related')).by(math('ip_edges - 86400000').by('at_millis')).by('at_millis').where(P.gte('related')).by(math('ip_edges + 86400000').by('at_millis')).by('at_millis').outV()
Original Traversal
==================
[GraphStep(vertex,
[user-lt1001
]), VertexStep(OUT,edge)#[ip_edges
], EdgeVertexStep(IN), VertexStep(IN,
[uses_ip
],edge)#[related
], WherePredicateStep(lte(related),
[
[MathStep(ip_edges - 86400000,
[value(at_millis)
])
], value(at_millis)
]), WherePredicateStep(gte(related),
[
[MathStep(ip_edges + 86400000,
[value(at_millis)
])
], value(at_millis)
]), EdgeVertexStep(OUT)
]
Optimized Traversal
===================
Neptune steps: [
NeptuneGraphQueryStep(Edge) {
JoinGroupNode {
PatternNode[(?1=<user-lt1001>, ?5, ?3, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=3, expectedTotalOutput=2, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=2
}
PatternNode[(?8, ?10=<uses_ip>, ?3, ?11) . project ?3,?11 . IsEdgeIdFilter(?11) .
],
{estimatedCardinality=10022, indexTime=0, joinTime=13, numSearches=1
}
}, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep#[ip_edges
], Vertex(?3):EdgeVertexStep, Edge(?11):VertexStep#[related
]
], joinStats=true, optimizationTime=1, maxVarId=12, executionTime=633
}
},
NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [WherePredicateStep(lte(related),
[
[MathStep(ip_edges - 86400000,
[value(at_millis)
]), ProfileStep
], value(at_millis)
]), NoOpBarrierStep(2500), WherePredicateStep(gte(related),
[
[MathStep(ip_edges + 86400000,
[value(at_millis)
]), ProfileStep
], value(at_millis)
]), NoOpBarrierStep(2500), EdgeVertexStep(OUT)
]
WARNING: >> WherePredicateStep(lte(related),
[
[MathStep(ip_edges - 86400000,
[value(at_millis)
]), ProfileStep
], value(at_millis)
]) << (or one of its children) is not supported natively yet
Physical Pipeline
=================
NeptuneGraphQueryStep
|-- StartOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1=<user-lt1001>, ?5, ?3, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=3, expectedTotalOutput=2
})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8, ?10=<uses_ip>, ?3, ?11) . project ?3,?11 . IsEdgeIdFilter(?11) .
],
{estimatedCardinality=10022
})
Runtime (ms)
============
Query Execution: 633.282
Traversal Metrics
=================
Step Count Traversers Time (ms) % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Edge) 9358 9358 20.431 3.23
NeptuneTraverserConverterStep 9358 9358 23.427 3.70
WherePredicateStep(lte(related),
[
[MathStep(ip_e... 7 7 588.350 92.96
MathStep(ip_edges - 86400000,
[value(at_millis)
]) 9358 9358 293.918
NoOpBarrierStep(2500) 7 7 0.036 0.01
WherePredicateStep(gte(related),
[
[MathStep(ip_e... 5 5 0.542 0.09
MathStep(ip_edges + 86400000,
[value(at_millis)
]) 7 7 0.285
NoOpBarrierStep(2500) 5 5 0.023 0.00
EdgeVertexStep(OUT) 5 5 0.118 0.02
>TOTAL - - 632.929 -
Predicates
==========
# of predicates: 38
Results
=======
Count: 5
Output: [v[user-lt1001
], v[user-lt1004
], v[user-lt1001
], v[user-lt1003
], v[user-lt1002
]
]
Index Operations
================
Query execution:
# of statement index ops: 18737
# of unique statement index ops: 4686
Duplication ratio: 4.00
# of terms materialized: 0
To compare execution times, while this query takes 600+ ms, the same query without those 5000 extra edges takes 8ms.
EDIT 1
Here's a query that improves the execution times, but stills traverses all the edges.
g.V('user-1')
.outE().as('ip_edges')
.values('at_millis').math('_ + 86400001').as('plus_one_day')
.select('ip_edges').values('at_millis').math('_ - 86400001').as_('minus_one_day')
.select('ip_edges')
.inV().inE('uses_ip').as('result')
.values('at_millis')
.where(P.between('minus_one_day', 'plus_one_day'))
.select('result')
.outV()
And this is the profiling of this query:
*******************************************************
Neptune Gremlin Profile
*******************************************************
Query String
==================
g.V('user-lt1001').outE().as('ip_edges').values('at_millis').math('_ + 86400001').as('plus_one_day').select('ip_edges').values('at_millis').math('_ - 86400001').as_('minus_one_day').select('ip_edges').inV().inE('uses_ip').as('result').values('at_millis').where(P.between('minus_one_day', 'plus_one_day')).select('result').outV()
Original Traversal
==================
[GraphStep(vertex,
[user-lt1001
]), VertexStep(OUT,edge)#[ip_edges
], PropertiesStep([at_millis
],value), MathStep(_ + 86400001)#[plus_one_day
], SelectOneStep(last,ip_edges), PropertiesStep([at_millis
],value), MathStep(_ - 86400001)#[minus_one_day
], SelectOneStep(last,ip_edges), EdgeVertexStep(IN), VertexStep(IN,
[uses_ip
],edge)#[result
], PropertiesStep([at_millis
],value), WherePredicateStep(and(gte(minus_one_day), lt(plus_one_day))), SelectOneStep(last,result), EdgeVertexStep(OUT)
]
Optimized Traversal
===================
Neptune steps: [
NeptuneGraphQueryStep(PropertyValue) {
JoinGroupNode {
PatternNode[(?1=<user-lt1001>, ?5, ?3, ?6) . project ?1,?6 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=3, expectedTotalOutput=2, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=2
}
PatternNode[(?6, ?7=<at_millis>, ?8, <~>) . project ?6,?8 .
],
{estimatedCardinality=8892, indexTime=0, joinTime=0, numSearches=1
}
}, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep#[ip_edges
], PropertyValue(?8):PropertiesStep
], joinStats=true, optimizationTime=1, maxVarId=9, executionTime=271
}
},
NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [MathStep(_ + 86400001)#[plus_one_day
], NoOpBarrierStep(2500), SelectOneStep(last,ip_edges), NoOpBarrierStep(2500), PropertiesStep([at_millis
],value), MathStep(_ - 86400001)#[minus_one_day
], NoOpBarrierStep(2500), SelectOneStep(last,ip_edges), NoOpBarrierStep(2500), EdgeVertexStep(IN), VertexStep(IN,
[uses_ip
],edge)#[result
], PropertiesStep([at_millis
],value), WherePredicateStep(and(gte(minus_one_day), lt(plus_one_day))), NoOpBarrierStep(2500), SelectOneStep(last,result), NoOpBarrierStep(2500), EdgeVertexStep(OUT)
]
WARNING: >> MathStep(_ + 86400001)#[plus_one_day
] << (or one of its children) is not supported natively yet
Physical Pipeline
=================
NeptuneGraphQueryStep
|-- StartOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1=<user-lt1001>, ?5, ?3, ?6) . project ?1,?6 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=3, expectedTotalOutput=2
})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?6, ?7=<at_millis>, ?8, <~>) . project ?6,?8 .
],
{estimatedCardinality=8892
})
Runtime (ms)
============
Query Execution: 271.410
Traversal Metrics
=================
Step Count Traversers Time (ms) % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(PropertyValue) 2 2 0.338 0.12
NeptuneTraverserConverterStep 2 2 0.058 0.02
MathStep(_ + 86400001)#[plus_one_day
] 2 2 0.085 0.03
NoOpBarrierStep(2500) 2 2 0.027 0.01
SelectOneStep(last,ip_edges) 2 2 0.015 0.01
NoOpBarrierStep(2500) 2 2 0.012 0.00
PropertiesStep([at_millis
],value) 2 2 0.215 0.08
MathStep(_ - 86400001)#[minus_one_day
] 2 2 0.064 0.02
NoOpBarrierStep(2500) 2 2 0.051 0.02
SelectOneStep(last,ip_edges) 2 2 0.014 0.01
NoOpBarrierStep(2500) 2 2 0.012 0.00
EdgeVertexStep(IN) 2 2 0.097 0.04
VertexStep(IN,
[uses_ip
],edge)#[result
] 9358 9358 28.307 10.45
PropertiesStep([at_millis
],value) 9358 9358 233.549 86.18
WherePredicateStep(and(gte(minus_one_day), lt(p... 5 5 8.080 2.98
NoOpBarrierStep(2500) 5 5 0.042 0.02
SelectOneStep(last,result) 5 5 0.013 0.01
NoOpBarrierStep(2500) 5 5 0.013 0.00
EdgeVertexStep(OUT) 5 5 0.012 0.00
>TOTAL - - 271.012 -
Predicates
==========
# of predicates: 38
Results
=======
Count: 5
Output: [v[user-lt1001
], v[user-lt1003
], v[user-lt1002
], v[user-lt1004
], v[user-lt1001
]
]
Index Operations
================
Query execution:
# of statement index ops: 9366
# of unique statement index ops: 4686
Duplication ratio: 2.00
# of terms materialized: 0
Any help will be really appreciated! Thanks!
Looking at the profile, there is quite a bit more going on than just looking things up in an index. First of all there are a lot more than 5K edges being found. The number of traversers indicates the actual number is 9,358 (so almost 10K). For each of those edges the time property has to be found and the math step applied. This is done twice, once for each where, but note that most of the time is spent on the first where as that filters out most of the edges. The second where has a lot less work to do. If this IP address node has the potential to grow and grow in degree (number of edges connected) you likely will want to change your data model to add some intermediate nodes that break the times up into chunks or ranges something similar and then precompute a value and use that to home in on the data you need. Given the amount of work the query is having to do the time does not seem that unreasonable.
Sorry for asking this topic, but after reading the tool's documentation and the similar ticket to my question (https://github.com/esnet/iperf/issues/343), I still don't really understand/know the meaning of the Retr column in a TCP measurement, and I do not get how to "use" it :-(
Let's say there is a result, like below, where are 5 retries. I got, these are the number of TCP segments retransmitted, but
were these retransmitted successfully, or they were just retried to send and not know about the result of that?
If I would like to see some kind of summa at the end in percentage (%), can the tool print it, similar to the UDP measurement? If not, how can I get the summa sent/received segments for compute the failure ratio?
Version of the tool:
>batman#bat-image:~$ iperf3 -v
iperf 3.8.1 (cJSON 1.7.13)
Linux bat-image 4.15.0-106-generic #107-Ubuntu SMP Thu Jun 4 11:27:52 UTC 2020 x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing
batman#bat-image:~$
OS:
Ubuntu-18.04
batman#bat-image:~$ uname -aLinux bat-image 4.15.0-106-generic #107-Ubuntu SMP Thu Jun 4 11:27:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
batman#bat-image:~$
The log:
batman#bat-image:~$iperf3 -c 192.168.122.1 -f K -B 192.168.122.141 -b 10m -t 10
Connecting to host 192.168.122.1, port 5201
[ 5] local 192.168.122.141 port 34665 connected to 192.168.122.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 1.00-2.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 2.00-3.00 sec 1.12 MBytes 9.43 Mbits/sec 0 297 KBytes
[ 5] 3.00-4.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 4.00-5.00 sec 1.12 MBytes 9.43 Mbits/sec 0 297 KBytes
[ 5] 5.00-6.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 6.00-7.00 sec 1.12 MBytes 9.44 Mbits/sec 2 1.41 KBytes
[ 5] 7.00-8.00 sec 512 KBytes 4.19 Mbits/sec 1 1.41 KBytes
[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 Mbits/sec 1 1.41 KBytes
[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 Mbits/sec 1 1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 8.87 MBytes 7.44 Mbits/sec 5 sender
[ 5] 0.00-16.91 sec 7.62 MBytes 3.78 Mbits/sec receiver
iperf Done.
thanks for your help,
/Robi
In iperf3 the column Retr stands for Retransmitted TCP packets and indicates the number of TCP packets that had to be sent again (=retransmitted).
The lower the value in Retr the better. An optimal value would be 0, meaning that no matter how many TCP packets have been sent, not a single one had to be resent. A value greater than zero indicates packet losses which might arise from network congestion (too much traffic) or corruption due to faulty hardware.
Your original issue on Github has also been answered (source): https://github.com/esnet/iperf/issues/343
You are asking about the different outputs of iperf3 based on whether you test UDP or TCP.
When using UDP it's acceptable for packets to not arrive at the destination. To indicate the quality of the connection/ data transfer you get a percentage of how many packets did not arrive at the destination.
When using TCP all packets are supposed to reach the destination and are checked for missing or corrupted ones (hence Transmission Control Protocol). If a packet is missing it get's retransmitted. To indicate the quality of the connection you get a number of how many packets had to be retransmitted.
So both the percentage with UDP and the Retr count with TCP are quality indicators that are adjusted to the specifics of each protocol.
If you are wondering what the Cwnd column means, it stands for Congestion Window. The Congestion Window is a TCP state variable that limits the amount of data the TCP can send into the network before receiving an ACK.
Source: https://blog.stackpath.com/glossary-cwnd-and-rwnd/
i'm trying to subset a large dataframe by date field ad facing strange behaviour:
1) find interesting time interval:
> ld[ld$bps>30000000,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1400199 2015-03-31 13:52:24 0.008 TCP 3.3.3.3 3128 4.4.4.4 65115 0 39 32507 32500000
1711899 2015-03-31 14:58:10 0.004 TCP 3.3.3.3 3128 4.4.4.7 49357 0 29 23830 47700000
2) and try to look whats happening on that second:
> ld[ld$Date.first.seen=="2015-03-31 13:52:24",]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1401732 2015-03-31 13:52:24 17.436 TCP 3.3.3.3 3128 6.6.6.6 51527 0 3 1608 737
don't really understand the behavior - i should get way more results.
for example
> ld[1399074,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1399074 2015-03-31 13:52:24 0.152 TCP 10.10.10.10 3128 11.11.11.11 62375 0 8 3910 205789
for date i use POSIXlt
> str(ld)
'data.frame': 2657583 obs. of 11 variables:
$ Date.first.seen: POSIXlt, format: "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:01" ...
...
would appreciate any assistance. thanks!
POSIXlt may carry additional info which is supressed when printing the entire data.frame, timezone, daylight savings etc. Have a look at https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html.
Printing only the POSIXlt variable (ld$Date.first.seen) does generally supply at least some of this additional information.
If you're not for some particular reason required to keep your variable in the POSIXlt and if you don't need the extra functionality the format enables, a simple:
ld$Date.first.seen = as.character(ld$Date.first.seen)
Added before your subset statement will probably solve your problem.
I am using R 2.14.0 64 bit on Linux. I went ahead and used the example described here. I am then running the example -
library(doMC)
registerDoMC()
system.time({
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
} })
However, I see in top that it is using only one CPU core. To prove it another way, if I check a process that uses all cores, I see -
ignorant#mybox: ~/R$ ps -p 5369 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
5369 5369 0 0.1
5369 5371 1 0.0
5369 5372 2 0.0
5369 5373 3 0.0
5369 5374 4 0.0
5369 5375 5 0.0
5369 5376 6 0.0
5369 5377 7 0.0
But in this case, I see -
ignorant#mybox: ~/R$ ps -p 7988 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
7988 7988 0 19.9
ignorant#mybox: ~/R$ ps -p 7991 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
7991 7991 0 19.9
How can I get it to use multiple cores? I am using multicore instead of doSMP or something else, because I do not want to have copies of my data for each process.
You could try executing your script using the command:
$ taskset 0xffff R --slave -f parglm.R
If this fixes the problem, then you may have a version of R that was built with OpenBLAS or GotoBlas2 which sets the CPU affinity so that you can only use one core, which is a known problem.
If you want to run your example interactively, start R with:
$ taskset 0xffff R
First, you might want to look at htop, which is probably available for your distribution. You can clearly see the usage for each CPU.
Second, have you tried setting the number of cores on the machine directly?
Run this with htop open:
library(doMC)
registerDoMC(cores=12) # Try setting this appropriately.
system.time({
r <- foreach(1:1000, .combine=cbind) %dopar% {
mean(rnorm(100000))
} })
# I get:
# user system elapsed
# 12.789 1.136 1.860
If the user time is much higher than elapsed (not always -- I know, but a rule of thumb), you are probably using more than one core.