Subset a dataframe with specific condition in R - r

hello I have this df
res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE
I want to subset it considering only rows in which res1 are in sequence as i and i <= i+4, as :
res1 res4 aa1234
29 8 11 RPRQ
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
I tried something woth functions "filter" and "subset" but I didn't got the result expected.
So in general, I need to have the overlap between two rows in a range (i-i+4) including i+4.
For example, in this 3 lines there is the overlap between rows [9] and [10] (150-153 overlaps with 151-154), but also row [11] corresponds to res1[10] + 4 (151+4 = 155). So maybe an idea should be to consider res1[i] and check if res1[i+1] is =< res[i].
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN

why not we are simply doing this?
df[df$res1 %in% c(df$res1 -4,df$res1 -3, df$res1-2, df$res1 -1, df$res1+1,df$res1 +2, df$res1 +3, df$res1 +4),]
res1 res4 aa1234
2 10 13 RQFP
6 12 15 FPFL
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
edited scenario just order the df, and rest will be same. See
df <- df[order(df$res1),]
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
29 8 11 RPRQ
2 10 13 RQFP
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
old answer Use this
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
Data used
df <- read.table(text = "res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE", header = T)

Related

K-nearest neighbor for spatial weights R

I was wondering if you could help me with this problem. I have a dataset of US counties that I am trying to do k-nearest neighbor analysis for spatial weighting, following the method proposed here (section 4.5), but the results aren't making sense, or potentially I'm not understanding them.
library(spdep)
library(tigris)
library(sf)
counties <- counties("Georgia", cb = TRUE)
coords <- st_centroid(st_geometry(counties), of_largest_polygon=TRUE)
col.knn <- knearneigh(coords)
gck4.nb <- knn2nb(knearneigh(coords, k=4, longlat=TRUE))
summary(gck4.nb, coords, longlat=TRUE, scale=0.5)
However, the output I'm getting, with regards to the distances, seems rather small, on the order of less than 1 km:
Neighbour list object:
Number of regions: 159
Number of nonzero links: 636
Percentage nonzero weights: 2.515723
Average number of links: 4
Non-symmetric neighbours list
Link number distribution:
4
159
159 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
159 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
Summary of link distances:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1355 0.2650 0.3085 0.3112 0.3482 0.6224
The decimal point is 1 digit(s) to the left of the |
1 | 44
1 | 7799999999999999
2 | 00000000000011111111112222222222222233333333333333333333333333444444
2 | 55555555555555555555555555556666666666666666666666666666666666667777+92
3 | 00000000000000000000000000000001111111111111111111111111111111111111+121
3 | 55555555555555555555555555555556666666666667777777777777777777777777+19
4 | 00000000000111111111112222222222223333333444
4 | 555667777999
5 | 0000014
5 | 7888
6 | 2

When using table on a vector, the numbers in the names are out of order

I have a data frame with a column Session. There are 215 unique values for Session, and I am trying to treat it as a categorical variable.
However, when I run table(df$Session), the sessions are not appearing in order and some appear to be missing:
table(df$Session)
1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 113 114 115 116 117 118
6 11 20 14 17 8 14 11 8 14 15 17 12 16 15 17 19 26 24 31 28
12 120 121 122 123 124 125 126 127 128 13 130 131 132 133 134 135 136 137 138 139
13 36 27 20 23 18 12 12 40 52 19 91 78 88 78 8 7 74 5 8 6
14 140 141 142 143 144 145 146 147 148 149 15 150 151 152 153 154 155 156 157 158
14 7 6 7 5 3 75 3 70 75 68 16 68 67 67 68 58 69 70 68 26
159 16 160 161 162 163 164 165 166 167 168 169 17 170 171 172 173 174 175 176 177
75 17 65 70 63 76 57 43 45 32 31 18 18 20 17 22 13 15 12 7 7
178 179 18 180 181 182 183 184 185 186 187 188 189 19 190 191 192 193 194 195 196
6 7 17 9 9 13 12 18 19 22 15 3 10 3 21 32 43 54 66 77 84
197 198 199 2 20 200 201 202 203 204 205 206 207 208 209 21 210 211 212 213 215
77 85 79 6 17 89 87 93 85 85 98 80 78 68 54 17 34 24 50 50 65
22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40
11 12 12 10 11 7 7 10 4 7 8 7 6 9 11 10 23 27 14 3 21
41 42 43 44 45 46 47 48 49 5 50 51 52 53 54 55 56 57 58 59 6
27 16 16 18 10 12 19 7 6 4 5 13 21 17 25 31 32 30 15 10 3
60 61 62 63 64 65 66 67 68 69 7 70 71 73 74 75 76 77 78 79 8
18 17 11 14 14 15 18 11 13 9 7 13 12 7 8 8 9 12 8 9 6
80 81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95 97 98 99
1 11 8 17 20 13 14 18 19 19 9 14 16 12 15 17 19 13 7 16
If we only look at a couple of columns:
table(df$Session)
# 1 10 100 101 ... 197 198 199 2 20 200 201 202 ...
# 6 11 20 14 ... 77 85 79 6 17 89 87 93 ...
Why are they not ordered by number (1, 2, 3 instead of 1, 10, 100)? And how can I correct this?
Answer
The variable will be sorted correctly if you make it numeric first:
table(as.numeric(df$Session))
table(as.factor(as.numeric(df$Session)))
Explanation
Your variable is or was of the class character. The order of your variable is alphabetically, i.e. what would happen if you sort a character vector. Try: sort(c("1", "11", "2")). When you apply factor or as.factor to a character vector, the levels will be ordered as such (see ?factor):
levels: an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).
Keep in mind that R reads in numbers as numeric by default. If you expected the column to be numeric from the start but R made it character, then you likely have values in there that are not strictly numbers. It is important to find out why the vector was character.
Reproducible example
vec <- c(22, 11, 3, 2, 1)
table(vec) # correct: numeric
# 1 2 3 11 22
# 1 1 1 1 1
table(as.character(vec)) # incorrect: character
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(as.character(vec))) # incorrect: character -> factor
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(vec)) # correct: numeric -> factor
# 1 2 3 11 22
# 1 1 1 1 1

How can we determine a transition point in a curve for a bar graph

I have plotted a bar graph and knowing that the transition in the curve takes place around values 21 to 25 , I want to find such point in general.
barplot(Views$V2,names=Views$V1,las=2,cex.names=0.2,border="blue",xpd=FALSE)
abline(v=25,col="red")
Any help is appreciated
Update :
A part of my data frame is given below. It is sorted by values V2 . By transition what I want to say is that a point in the sorted data where there is a large deviation and after which it continues in that pattern. So a point where there is difference in patterns.
V1 V2
1 1 16154424
2 2 3701944
3 3 1618377
4 4 903302
5 5 569824
6 6 389772
7 7 281751
8 8 212450
9 9 166364
10 10 133339
11 11 109410
12 12 90934
13 13 77155
14 14 66124
15 15 57861
16 16 50765
17 17 44805
18 18 39996
19 19 35850
20 20 32492
21 21 29522
22 22 27152
23 23 24821
24 24 22619
25 25 21238
26 26 19639
27 27 18320
28 28 16867
29 29 15890
30 30 14936
31 31 14252
32 32 13150
33 33 12696
34 34 11656
35 35 11191
36 36 10951
37 37 10232
38 38 9605
39 39 9058
40 40 8916
41 41 8531
42 42 8010
43 43 7932
44 44 7436
45 45 6991
46 46 6750
47 47 6613
48 48 6254
49 49 6292
50 50 5731
51 51 5659
52 52 5551
53 53 5396
54 54 5122
55 55 4845
56 56 4860
57 57 4591
58 58 4504
59 59 4233
60 60 4371
61 61 4014
62 62 4083
63 63 3923
64 64 3796
65 65 3616
66 66 3519
67 67 3466
68 68 3409
69 69 3357
70 70 3215
71 71 3118
72 72 3081
73 73 3040
74 74 2951
75 75 2808
76 76 2797
77 77 2829
78 78 2714
79 79 2564
80 80 2563
81 81 2445
82 82 2528
83 83 2443
84 84 2316
85 85 2314
86 86 2212
87 87 2215
88 88 2102
89 89 2172
90 90 2020
91 91 2108
92 92 2020
93 93 2027
94 94 1982
95 95 1936
96 96 1836
97 97 1801
98 98 1850
99 99 1751
100 100 1810
It has to be something to do with the ratios of differences maybe and something which can be evident after observing the graph.

Sequence with different intervals in R: matching sensor data

I need a vector that repeats numbers in a sequence at varying intervals. I basically need this
c(rep(1:42, each=6), rep(43:64, each = 7),
rep(65:106, each=6), rep(107:128, each = 7),
.... but I need to this to keep going, until almost 2 million.
So I want a vector that looks like
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 ...
.....
[252] 43 43 43 43 43 43 43 44 44 44 44 44 44 44
....
[400] 64 64 64 64 64 64 65 65 65 65 65 65...
and so on. Not just alternating between 6 and 7 repetitions, rather mostly 6s and fewer 7s until the whole vector is something like 1.7 million rows. So, is there a loop I can do? Or apply, replicate? I need the 400th entry in the vector to be 64, the 800th entry to be 128, and so on, in somewhat evenly spaced integers.
UPDATE
Thank you all for the quick clever tricks there. It worked, at least well enough for the deadline I was dealing with. I realize repeating 6 xs and 7 xs are a really dumb way to try to solve this, but it was quick at least. But now that I have some time, I would like to get everyone's opinions /ideas on my real underlying issue here.
I have two datasets to merge. They are both sensor datasets, both with stopwatch time as primary keys. But one records every 1/400 of a second, and the other records every 1/256 of a second. I have trimmed the top of each so that they are starting the exact same moment. But.. now what? I have 400 records for each second in one set, and 256 records for 1 second in the other. Is there a way to merge these without losing data? Interpolating or just repeating obs is a-ok, necessary, I think, but I'd rather not throw any data out.
I read this post here, that had to do with using xts and zoo for a very similar problem to mine. But they have nice epoch date/times for each. I just have these awful fractions of seconds!
sample data (A):
time dist a_lat
1 139.4300 22 0
2 139.4325 22 0
3 139.4350 22 0
4 139.4375 22 0
5 139.4400 22 0
6 139.4425 22 0
7 139.4450 22 0
8 139.4475 22 0
9 139.4500 22 0
10 139.4525 22 0
sample data (B):
timestamp hex_acc_x hex_acc_y hex_acc_z
1 367065215501 -0.5546875 -0.7539062 0.1406250
2 367065215505 -0.5468750 -0.7070312 0.2109375
3 367065215509 -0.4218750 -0.6835938 0.1796875
4 367065215513 -0.5937500 -0.7421875 0.1562500
5 367065215517 -0.6757812 -0.7773438 0.2031250
6 367065215521 -0.5937500 -0.8554688 0.2460938
7 367065215525 -0.6132812 -0.8476562 0.2109375
8 367065215529 -0.3945312 -0.8906250 0.2031250
9 367065215533 -0.3203125 -0.8906250 0.2226562
10 367065215537 -0.3867188 -0.9531250 0.2578125
(oh yeah, and btw, the B dataset timestamps are epoch format * 256, because life is hard. i haven't converted it for this because dataset A has nothing like that, only just 0.0025 intervals. Also the B data sensor was left on for hours later the A data sensor turned off, so that doesn't help)
Or if you like, you can try this using apply
# using this sample data
df <- data.frame(from=c(1,4,7,11), to = c(3,6,10,13),rep=c(6,7,6,7));
> df
# from to rep
#1 1 3 6
#2 4 6 7
#3 7 10 6
#4 11 13 7
unlist(apply(df, 1, function(x) rep(x['from']:x['to'], each=x['rep'])))
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
#[26] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8
#[51] 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12
#[76] 12 12 13 13 13 13 13 13 13
Now that you put it that way ... I have absolutely no idea how you are planning on using all of the 6s and 7s. :-)
Regardless, I recommend standardizing the time, adding a "sample" column, and merging on them. Having the "sample" column may facilitate your processing later on, perhaps.
Your data:
df400 <- structure(list(time = c(139.43, 139.4325, 139.435, 139.4375, 139.44, 139.4425,
139.445, 139.4475, 139.45, 139.4525),
dist = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L),
a_lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("time", "dist", "a_lat"),
class = "data.frame", row.names = c(NA, -10L))
df256 <- structure(list(timestamp = c(367065215501, 367065215505, 367065215509, 367065215513,
367065215517, 367065215521, 367065215525, 367065215529,
367065215533, 367065215537),
hex_acc_x = c(-0.5546875, -0.546875, -0.421875, -0.59375, -0.6757812,
-0.59375, -0.6132812, -0.3945312, -0.3203125, -0.3867188),
hex_acc_y = c(-0.7539062, -0.7070312, -0.6835938, -0.7421875,
-0.7773438, -0.8554688, -0.8476562, -0.890625,
-0.890625, -0.953125),
hex_acc_z = c(0.140625, 0.2109375, 0.1796875, 0.15625, 0.203125,
0.2460938, 0.2109375, 0.203125, 0.2226562, 0.2578125)),
.Names = c("timestamp", "hex_acc_x", "hex_acc_y", "hex_acc_z"),
class = "data.frame", row.names = c(NA, -10L))
Standardize your time frames:
colnames(df256)[1] <- 'time'
df400$time <- df400$time - df400$time[1]
df256$time <- (df256$time - df256$time[1]) / 256
Assign a label for easy reference (not that the NAs won't be clear enough):
df400 <- cbind(sample='A', df400, stringsAsFactors=FALSE)
df256 <- cbind(sample='B', df256, stringsAsFactors=FALSE)
And now for the merge and sorting:
dat <- merge(df400, df256, by=c('sample', 'time'), all.x=TRUE, all.y=TRUE)
dat <- dat[order(dat$time),]
dat
## sample time dist a_lat hex_acc_x hex_acc_y hex_acc_z
## 1 A 0.000000 22 0 NA NA NA
## 11 B 0.000000 NA NA -0.5546875 -0.7539062 0.1406250
## 2 A 0.002500 22 0 NA NA NA
## 3 A 0.005000 22 0 NA NA NA
## 4 A 0.007500 22 0 NA NA NA
## 5 A 0.010000 22 0 NA NA NA
## 6 A 0.012500 22 0 NA NA NA
## 7 A 0.015000 22 0 NA NA NA
## 12 B 0.015625 NA NA -0.5468750 -0.7070312 0.2109375
## 8 A 0.017500 22 0 NA NA NA
## 9 A 0.020000 22 0 NA NA NA
## 10 A 0.022500 22 0 NA NA NA
## 13 B 0.031250 NA NA -0.4218750 -0.6835938 0.1796875
## 14 B 0.046875 NA NA -0.5937500 -0.7421875 0.1562500
## 15 B 0.062500 NA NA -0.6757812 -0.7773438 0.2031250
## 16 B 0.078125 NA NA -0.5937500 -0.8554688 0.2460938
## 17 B 0.093750 NA NA -0.6132812 -0.8476562 0.2109375
## 18 B 0.109375 NA NA -0.3945312 -0.8906250 0.2031250
## 19 B 0.125000 NA NA -0.3203125 -0.8906250 0.2226562
## 20 B 0.140625 NA NA -0.3867188 -0.9531250 0.2578125
I'm guessing your data was just a small representation. If I've guessed poorly (that A's integers are seconds and B's integers are 1/400ths of a second) then just scale differently. Either way, by resetting the first value to zero and then merging/sorting, they are easy to merge and sort.
alt <- data.frame(len=c(42,22),rep=c(6,7));
alt;
## len rep
## 1 42 6
## 2 22 7
altrep <- function(alt,cyc,len) {
cyclen <- sum(alt$len*alt$rep);
if (missing(cyc)) {
if (missing(len)) {
cyc <- 1;
len <- cyc*cyclen;
} else {
cyc <- ceiling(len/cyclen);
};
} else if (missing(len)) {
len <- cyc*cyclen;
};
if (isTRUE(all.equal(len,0))) return(integer());
result <- rep(1:(cyc*sum(alt$len)),rep(rep(alt$rep,alt$len),cyc));
length(result) <- len;
result;
};
altrep(alt,2);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128
altrep(alt,len=1000);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128 129 129 129 129
## [817] 129 129 130 130 130 130 130 130 131 131 131 131 131 131 132 132 132 132 132 132 133 133 133 133 133 133 134 134 134 134 134 134 135 135 135 135 135 135 136 136 136 136 136 136 137 137 137 137 137 137 138
## [868] 138 138 138 138 138 139 139 139 139 139 139 140 140 140 140 140 140 141 141 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 143 144 144 144 144 144 144 145 145 145 145 145 145 146 146 146 146
## [919] 146 146 147 147 147 147 147 147 148 148 148 148 148 148 149 149 149 149 149 149 150 150 150 150 150 150 151 151 151 151 151 151 152 152 152 152 152 152 153 153 153 153 153 153 154 154 154 154 154 154 155
## [970] 155 155 155 155 155 156 156 156 156 156 156 157 157 157 157 157 157 158 158 158 158 158 158 159 159 159 159 159 159 160 160
You can specify len=1.7e6 (and omit the cyc argument) to get exactly 1.7 million elements, or you can get a whole number of cycles using cyc.
How about
len <- 2e6
step <- 400
x <- rep(64 * seq(0, ceiling(len / step) - 1), each = step) +
sort(rep(1:64, length.out = step))
x <- x[seq(len)] # to get rid of extra elements

Using barplot in R doesn't not match the data?

I want to use barplot (or any other better options) to plot the following data:
action_number times
1 1 13408
2 2 5550
3 3 2757
4 4 1782
5 5 1114
6 6 847
7 7 582
8 8 410
9 9 306
10 10 278
11 11 212
12 12 165
13 13 139
14 14 112
15 15 106
16 16 82
17 17 64
18 18 61
19 19 69
20 20 47
21 21 31
22 22 40
23 23 34
24 24 31
25 25 28
26 26 26
27 27 21
28 28 16
29 29 14
30 30 16
31 31 11
32 32 10
33 33 11
34 34 10
35 35 4
36 36 6
37 37 5
38 38 8
39 39 6
40 40 3
41 41 6
42 42 8
43 43 3
44 44 3
45 45 7
46 46 8
47 47 4
48 48 4
49 49 1
50 50 4
51 51 2
52 52 4
53 53 3
54 54 1
55 55 2
56 56 1
57 58 2
58 59 4
59 60 1
60 62 2
61 63 1
62 66 1
63 67 4
64 68 2
65 69 1
66 70 1
67 71 1
68 73 1
69 74 1
70 77 1
71 79 1
72 80 1
73 82 1
74 92 2
75 97 1
76 98 1
77 103 1
78 106 1
79 114 1
80 118 1
81 128 1
82 142 1
83 148 1
84 153 1
85 155 1
86 166 1
87 183 1
88 218 1
89 224 1
90 298 1
91 536 1
I am using the following, but it does not match the data correctly:
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=data$action_number,labels=data$action_number)
#??? Should I use at=data$action_number to at=data$times
axis(2,seq(0,91,3),c(0:30))
![enter image description here][1]
Problems:
- the x-axis does not have 536, it only goes to 224
- the Y axis only shows one number
Can you please give me advice and if I should use any package?
still, unclear but may be something like this
barplot(data$times, xlab=data$action_number)
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=seq(1,91,10),labels=data$action_number[seq(1,91,10)])
axis(2,seq(0,13408,500),seq(0,13408,500))

Resources