Can't understand link between In/Out traffic with graph - graph

I feel dumb but I can't see the timestep used to print the following graph from this data (it's been retrieved via tcpdump and I'm supposed to do the same kind of plot on my own for various websites):
18:43:39.577369 0 out
18:43:39.577449 0 out
18:43:39.819272 0 in
18:43:39.819300 0 out
18:43:39.819531 194 out
18:43:39.827914 0 out
18:43:39.829722 0 in
18:43:39.829741 0 out
18:43:39.829944 194 out
18:43:40.059952 0 in
18:43:40.061021 1448 in
18:43:40.061050 0 out
18:43:40.061108 1448 in
18:43:40.061124 0 out
18:43:40.061163 1200 in
18:43:40.061176 0 out
18:43:40.064159 0 in
18:43:40.064225 0 out
18:43:40.064864 194 out
18:43:40.069418 1448 in
18:43:40.069436 0 out
18:43:40.070015 859 in
18:43:40.070023 0 out
18:43:40.076474 126 out
18:43:40.081113 0 in
18:43:40.082162 1448 in
18:43:40.082174 0 out
18:43:40.082194 1448 in
18:43:40.082199 0 out
18:43:40.082208 1200 in
18:43:40.082212 0 out
18:43:40.094615 1448 in
18:43:40.094636 0 out
etc
Any help would be greatly appreciated, I really need to know this quickly !

The data has time stamp package size (bytes) and an indication for in or out.
The graph divides time into slots of 10 ms and sums up the bytes sent (out) and received (in) within each time slot. A data point is created at the end of each time slot.
E.g. between 30 and 40 ms packages of sizes 1448, 1448 and 1200 are received accounting for a data point of ca. 4100 at 40 ms in the red graph.

Related

Execute a condition after leaving a certain number of values in a column

I have a data frame as shown below which has around 130k data values.
Eng_RPM Veh_Spd
340 56
450 65
670 0
800 0
890 0
870 0
... ..
800 0
790 0
940 0
... ...
1490 67
1540 78
1880 81
I need to have another variable called Idling Count which increments the value when ever it finds value in Eng_RMP > = 400 and Veh_Spd ==0 , the condition is the counter has to start after 960 Data points from the data point which has satisfied the condition, also the above mentioned condition should not be applicable for the first 960 data points as shown below
Expected Output
Eng_RPM Veh_Spd Idling_Count
340 56 0
450 65 0
670 0 0
... ... 0 (Upto first 960 values)
600 0 0(The Idling time starts but counter should wait for another 960 values to increment the counter value)
... ... 0
800 0 1(This is the 961st Values after start of Idling time i.e Eng_RPM>400 and Veh_Spd==0)
890 0 2
870 0 3
... .. ..
800 1 0
790 2 0
940 3 0
450 0 0(Data point which satisfies the condition but counter should not increment for another 960 values)
1490 0 4(961st Value from the above data point)
1540 0 5
1880 81 0
.... ... ... (This cycle should continue for rest of the data points)
Here is how to do with data.table (not using for which is known to be slow in R).
library(data.table)
setDT(df)
# create a serial number for observation
df[, serial := seq_len(nrow(df))]
# find series of consective observations matching the condition
# then create internal serial id within each series
df[Eng_RPM > 400 & Veh_Spd == 0, group_serial:= seq_len(.N),
by = cumsum((serial - shift(serial, type = "lag", fill = 1)) != 1) ]
df[is.na(group_serial), group_serial := 0]
# identify observations with group_serial larger than 960, add id
df[group_serial > 960, Idling_Count := seq_len(.N)]
df[is.na(Idling_Count), Idling_Count := 0]
you can do this by for cycle like this
Creating sample data and empty column Indling_Cnt
End_RMP <- round(runif(1800,340,1880),0)
Veh_Spd <- round(runif(1800,0,2),0)
dta <- data.frame(End_RMP,Veh_Spd)
dta$Indling_Cnt <- rep(0,1800)
For counting in Indling_Cnt you can use forcycle with few if conditions, this is probably not most efficient way to do it, but it should work. There are better and yet more complex solutions. For example using packages as data.table as mentioned in other answers.
for(i in 2:dim(dta)[1]){
n <- which(dta$End_RMP[-(1:960)]>=400&dta$Veh_Spd[-(1:960)]==0)[1]+960+960
if(i>=n){
if(dta$End_RMP[i]>=400&dta$Veh_Spd[i]==0){
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]+1
}else{
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]
}
}
}

How does wildcard mask really work?

I'm studying for my cisco CCENT and I'm having a real hard time understanding wildcard masking this video was pretty straight forward and simple:
Wildcard Mask Video
However when applied to this question it yields the wrong answer:
You need to create a wildcard mask for the entire Class B private IPv4 address space
172.16.0.0/16 through 172.31.0.0/16. What is the wildcard mask?
One is tempted based upon the video to answer 0.0.255.255 however this is the incorrect answer with the correct answer being 0.15.255.255???
No matter how much I tinker with the bits to try and make sense of it I feel I am missing something can someone explain?
It occurs to me they could be super strict and only want the wildcard mask for the given range but in which case I would answer 2^5 for 32 = 255.248.0.0 = WCM 0.7.255.255 but alas this was not the answer either not even close what am I missing?
Here is another similar question...
Which address and wildcard mask combination will match all IPv4 addresses in the networks
192.168.0.0/24 through 192.168.63.0/24?
What I wanted to say: 192.168.0.0 0.0.0.255
Answer: 192.168.0.0 0.0.63.255
Create an IP Slide Rule with the mask on the first row and the bit values on the second row.
Mask 128 192 224 240 248 252 254 255
128 64 32 16 8 4 2 1
Now you need to convert the IP address to binary
I hope my paste doesn't go funky. Here are the examples
Mask 128 192 224 240 248 252 254 255
128 64 32 16 8 4 2 1
192.168.0 0 0 0 0 0 0 0.00000000
192.168.0 0 1 1 1 1 1 1.00000000 x.x=192.168 since they are the same
Now you need to find from the IP slide rule the last bit where they are identical.
In this case it is the 64 and the mask for 64 is 192. Now you will subtract the new mask from 255
255 255 255 255
255 255 192 0 =
0. 0. 63.255 Wildcard mask is 0.0.63.255 answer is 192.168.0.0 0.0.63.255
You need to create a wildcard mask for the entire Class B private IPv4 address space
172.16.0.0/16 through 172.31.0.0/16. What is the wildcard mask?
Answer 0.15.255.255
Mask 128 192 224 240 248 252 254 255
128 64 32 16 8 4 2 1
172. 0 0 0 1 0 0 0 0 0.0
172. 0 0 0 1 1 1 1 1 0.0
Now you need to find in the ip slide rule the last bit where they are identical.
In this case it is the 16 and the mask for 16 is 240. Now you will subtract the new mask from 255
255 255 255 255
255 240 0 0
0. 15.255.255 Wildcard mask is 0.15.255.255 answer is 172.0.0.0 0.15.255.255
Results of the wildcard mask calculation provide the first IP address and last IP address in the wildcard mask network range.
If it was only 192.168.0.0 /24 your output would be 0.0.0.255 but your question is combination of 192.168.0.0/24 through 192.168.63.0/24 so you must calculate it.

adding and subtracting values in multiple data frames of different lengths - flow analysis

Thank you jakub and Hack-R!
Yes, these are my actual data. The data I am starting from are the following:
[A] #first, longer dataset
CODE_t2 VALUE_t2
111 3641
112 1691
121 1271
122 185
123 522
124 0
131 0
132 0
133 0
141 626
142 170
211 0
212 0
213 0
221 0
222 0
223 0
231 95
241 0
242 0
243 0
244 0
311 129
312 1214
313 0
321 0
322 0
323 565
324 0
331 0
332 0
333 0
334 0
335 0
411 0
412 0
421 0
422 0
423 0
511 6
512 0
521 0
522 0
523 87
In the above table, we can see the 44 land use CODES (which I inappropriately named "class" in my first entry) for a certain city. Some values are just 0, meaning that there are no land uses of that type in that city.
Starting from this table, which displays all the land use types for t2 and their corresponding values ("VALUE_t2") I have to reconstruct the previous amount of land uses ("VALUE_t1") per each type.
To do so, I have to add and subtract the value per each land use (if not 0) by using the "change land use table" from t2 to t1, which is the following:
[B] #second, shorter dataset
CODE_t2 CODE_t1 VALUE_CHANGE1
121 112 2
121 133 12
121 323 0
121 511 3
121 523 2
123 523 4
133 123 3
133 523 4
141 231 12
141 511 37
So, in order to get VALUE_t1 from VALUE_t2, I have, for instance, to subtract 2 + 12 + 0 + 3 + 2 hectares (first 5 values of the second, shorter table) from the value of land use type/code 121 of the first, longer table (1271 ha), and add 2 hectares to land type 112, 12 hectares to land type 133, 3 hectares to land type 511 and 2 hectares to land type 523. And I have to do that for all the land use types different than 0, and later also from t1 to t0.
What I have to do is a sort of loop that would both add and subtract, per each land use type/code, the values from VALUE_t2 to VALUE_t1, and from VALUE_t1 to VALUE_t0.
Once I estimated VALUE_t1 and VALUE_t0, I will put the values in a simple table showing the relative variation (here the values are not real):
CODE VALUE_t0 VALUE_t2 % VAR t2-t0
code1 50 100 ((100-50)/50)*100
code2 70 80 ((80-70)/70)*100
code3 45 34 ((34-45)/45)*100
What I could do so far is:
land_code <- names(A)[-1]
land_code
A$VALUE_t1 <- for(code in land_code{
cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
}
If I use the loop I get an error, while if I take it away:
A$VALUE_t1 <- cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
it works but I don't really get what I want to get... so far I was working on how to get a new column which would contain the new "add & subtract" values, but haven't succeeded yet. So I worked on how to get a new column which would at least match the land use types first, to then include the "add and subtract" formula.
Another problem is that, by using "match", I get a shorter A$VALUE_t1 table (13 rows instead of 44), while I would like to keep all the land use types in dataset A, because I will have then to match it with the table including VALUES_t0 (which I haven't shown here).
Sorry that I cannot do better than this at the moment... and I hope to have explained better what I have to do. I am extremely grateful for any help you can provide to me.
thanks a lot

Parsing out all repeat and consecutive numbers in R

Suppose I have a dataframe like this:
1360 C 0 403
1361 A 0 403
1362 G 0 403
1402 0 A 444
2019 T 0 1060
2020 T 0 1060
2021 G 0 1060
2022 T 0 1060
2057 T 0 1085
2062 0 A 1093
2062 0 C 1094
2062 0 C 1095
Desired Output
1402 0 A 444
2057 0 0 1085
I was trying to parse out all the rows with repeats or consecutive numbers in the column 1. So, I want only the rows with the numbers which were not a repeat number or a consecutive number in the dataset. Any help will be much appreciated.
You can use diff to find the difference between adjacent elements in a vector. Assuming the vector is sorted, diff will return zero for repeat numbers and one for consecutive numbers.
keep1 <- diff(df[,1]) > 1
This will include values that are after a jump, but at the start of a new sequence, so we need to check the lag1 value, and pad the logical vector to make it as long as the original.
keep <- c(keep1, TRUE) & c(TRUE, keep1)
df[keep,]

mistake in multivePenal but not in frailtyPenal

The libraries used are: library(survival)
library(splines)
library(boot)
library(frailtypack) and the function used is in the library frailty pack.
In my data I have two recurrent events(delta.stable and delta.unstable) and one terminal event (delta.censor). There are some time-varying explanatory variables, like unemployment rate(u.rate) (is quarterly) that's why my dataset has been splitted by quarters.
Here there is a link to the subsample used in the code just below, just in case it may be helpful to see the mistake. https://www.dropbox.com/s/spfywobydr94bml/cr_05_males_services.rda
The problem is that it takes a lot of time running until the warning message appear.
Main variables of the Survival function are:
I have two recurrent events:
delta.unstable (unst.): takes value one when the individual find an unstable job.
delta.stable (stable): takes value one when the individual find a stable job.
And one terminal event
delta.censor (d.censor): takes value one when the individual has death, retired or emigrated.
row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392
When I apply multivePenal I obtain the following message:
Error en aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
Además: Mensajes de aviso perdidos
In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created
#### multivePenal function
fit.joint.05_malesP<multivePenal(Surv(.t0,.t,delta.stable)~cluster(contadorbis)+terminal(as.factor(delta.censor))+event2(delta.unstable),formula.terminalEvent=~1, formula2=~as.factor(h.skill),data=cr_05_males_serv,Frailty=TRUE,recurrentAG=TRUE,cross.validation=F,n.knots=c(7,7,7), kappa=c(1,1,1), maxit=1000, hazard="Splines")
I have checked if Surv(.t0,.t,delta.stable) contains NA, and there are no NA's.
In addition, when I apply for the same data the function frailtyPenal for both possible combinations, the function run well and I get results. I take one week looking at this and I do not find the key. I would appreciate some of light to this problem.
#delta unstable+death
enter code here
fit.joint.05_males<-frailtyPenal(Surv(.t0,.t,delta.unstable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+ as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+ as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities)+
terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###Be patient. The program is computing ...
###The program took 2259.42 seconds
#delta stable+death
fit.joint.05_males<frailtyPenal(Surv(.t0,.t,delta.stable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities)+terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###The program took 3167.15 seconds
Because you neither provide information about the packages used, nor the data necessary to run multivepenal or frailtyPenal, I can only help you with the Surv part (because I happened to have that package loaded).
The Surv warning message you provided (In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created) suggests that something is strange with your variables .t0 (the time argument in Surv, refered to as 'start time' in the warning), and/or .t (time2 argument, 'Stop time' in the warning). I check this possibility with a simple example
# read the data you feed `Surv` with
df <- read.table(text = "row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392", header = TRUE)
# create survival object
mysurv <- with(df, Surv(time = .t0, time2 = .t, event = stable))
mysurv
# create a new data set where one .t for some reason is less than .to
# on row five .t0 is 61, so I set .t to 60
df2 <- df
df2$.t[df2$.t == 86] <- 60
# create survival object using new data which contains at least one Stop time that is less than Start time
mysurv2 <- with(df2, Surv(time = .t0, time2 = .t, event = stable))
# Warning message:
# In Surv(time = .t0, time2 = .t, event = stable) :
# Stop time must be > start time, NA created
# i.e. the same warning message as you got
# check the survival object
mysurv2
# as you can see, the fifth interval contains NA
# I would recommend you check .t0 and .t in your data set carefully
# one way to examine rows where Stop time (.t) is less than start time (.t0) is:
df2[which(df2$.t0 > df2$.t), ]
I am not familiar with multivepenal but it seems that it does not accept a survival object which contains intervals with NA, whereas might frailtyPenal might do so.
The authors of the package have told me that the function is not finished yet, so perhaps that is the reason that it is not working well.
I encountered the same error and arrived at this solution.
frailtyPenal() will not accept data.frames of different length. The data.frame used in Surv and data.frame named in data= in frailtyPenal must be the same length. I used a Cox regression to identify the incomplete cases, reset the survival object to exclude the missing cases and, finally, run frailtyPenal:
library(survival)
library(frailtypack)
data(readmission)
#Reproduce the error
#change the first start time to NA
readmission[1,3] <- NA
#create a survival object with one missing time
surv.obj1 <- with(readmission, Surv(t.start, t.stop, event))
#observe the error
frailtyPenal(surv.obj1 ~ cluster(id) + dukes,
data=readmission,
cross.validation=FALSE,
n.knots=10,
kappa=1,
hazard="Splines")
#repair by resetting the surv object to omit the missing value(s)
#identify NAs using a Cox model
cox.na <- coxph(surv.obj1 ~ dukes, data = readmission)
#remove the NA cases from the original set to create complete cases
readmission2 <- readmission[-cox.na$na.action,]
#reset the survival object using the complete cases
surv.obj2 <- with(readmission2, Surv(t.start, t.stop, event))
#run frailtyPenal using the complete cases dataset and the complete cases Surv object
frailtyPenal(surv.obj2 ~ cluster(id) + dukes,
data = readmission2,
cross.validation = FALSE,
n.knots = 10,
kappa = 1,
hazard = "Splines")

Resources