How to show all threads on a specific CPU on Solaris? - unix

Some process (or threads) is hammering CPU0 as you can see in mpstat 30 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 13 0 2 7 0 151 0 4250 99 1 0 0
1 114 0 2 197 84 5220 5 10 109 0 10518 30 2 0 67
2 79 0 1 184 83 5208 5 5 89 0 9788 30 2 0 68
3 67 0 1 181 84 5150 5 4 87 0 9510 30 2 0 69
4 53 0 3 171 72 12238 4 7 183 0 22214 3 3 0 94
5 43 0 3 135 7 218 2 6 16 0 162 0 1 0 99
6 110 0 2 172 79 4918 5 3 164 0 9553 34 2 0 64
7 120 0 1 180 80 4873 4 4 194 0 9494 32 2 0 66
8 53 0 1 23 2 28665 5 7 494 0 62023 12 9 0 79
9 43 0 0 34 2 21469 6 8 676 0 58090 10 13 0 77
10 59 0 1 210 2 33462 4 4 227 0 63500 7 16 0 78
11 93 0 2 16940 16627 1261 2 6 1027 0 2043 0 10 0 90
12 17 0 1 65 3 59 0 3 3 0 19 0 0 0 100
13 6 0 1 89 4 104 0 3 2 0 9 0 0 0 100
14 4 0 10 65 5 54 0 3 1 0 12 0 0 0 100
15 4 0 1 66 6 56 0 3 2 0 21 0 0 0 100
16 2 0 0 91 16 78 0 3 2 0 30 0 0 0 100
17 17 0 1 80 15 70 0 4 2 0 79 0 0 0 100
18 76 0 3 14946 14928 25 0 4 24 0 102 0 4 0 96
19 57 0 0 20 2 17 0 3 15 0 107 0 0 0 100
20 18 0 0 26 0 25 0 3 10 0 21 0 0 0 100
21 0 0 0 106 70 46 0 3 4 0 40 0 1 0 99
22 13 0 0 31 3 28 0 3 4 0 49 0 0 0 100
23 0 0 0 35 5 24 0 3 5 0 54 0 0 0 100
but with prstat -P0 only see the ndbmtd running wit around 15% on CPU0
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
20028 root 77G 75G cpu0 40 0 8369:33:0 15% ndbmtd/44
660 root 6200K 3700K sleep 59 0 0:00:53 0.0% inetd/4
159 daemon 4540K 2408K sleep 59 0 0:00:09 0.0% kcfd/3
11 root 11M 10M sleep 59 0 0:00:58 0.0% svc.configd/15
Is there a way to show all processes and treads on CPU0?

To show all processes and threads (LWPs) on CPU0:
prstat -P0 -L

Related

Producing dataframe in R with rows summing to same number, including all possible combinations of numbers in each column

I am trying to create a dataframe in R.
I have 4 categories (e.g. Blue, Red, Yellow, Green) and I would like each row to sum to 100%. For each category I want to create incrimental differences of 5% units and produce a dataframe which has all possible combinations of numbers (to nearest 5%) for the 4 categories. I realise I am not explaining this well at all, so I have tried to show what I mean in the following table:
Blue
Red
Yellow
Green
95
5
0
0
95
0
5
0
95
0
0
5
5
95
0
0
0
95
5
0
0
95
0
5
5
0
95
0
0
5
95
0
0
0
95
5
5
0
0
95
0
5
0
95
0
0
5
95
90
10
0
0
90
0
10
0
90
0
0
10
10
90
0
0
0
90
10
0
0
90
0
10
10
0
90
0
0
10
90
0
0
0
90
10
10
0
0
90
0
10
0
90
0
0
10
90
90
5
5
0
90
5
0
5
90
0
5
5
5
90
5
0
5
90
0
5
0
90
5
5
5
5
90
0
5
0
90
5
0
5
90
5
5
5
0
90
5
0
5
90
0
5
5
90
85
15
0
0
85
0
15
0
85
0
0
15
15
85
0
0
0
85
15
0
0
85
0
15
15
0
85
0
0
15
85
0
0
0
85
15
15
0
0
85
0
15
0
85
0
0
15
85
85
10
5
0
85
10
0
5
85
5
10
0
85
0
10
5
85
5
0
10
85
0
5
10
10
85
5
0
10
85
0
5
5
85
10
0
0
85
10
5
5
85
0
10
0
85
5
10
10
5
85
0
10
0
85
5
5
10
85
0
0
10
85
5
5
0
85
10
0
5
85
10
10
5
0
85
10
0
5
85
5
10
0
85
0
10
5
85
5
0
10
85
0
5
10
85
85
5
5
5
I am struggling to know where to start here...
You could nest three for loops and bind the results together:
target_df <- data.frame()
for (i in seq(95, 0, by = -5)) {
for (j in seq(100 - i, 0, by = -5)) {
for(k in seq(100 - i - j, 0, by = -5)) {
target_df <- rbind(target_df, data.frame(Blue = i, Red = j, Yellow = k, Green = 100 - i - j - k))
}
}
}
This returns
Blue Red Yellow Green
1 95 5 0 0
2 95 0 5 0
3 95 0 0 5
4 90 10 0 0
5 90 5 5 0
6 90 5 0 5
7 90 0 10 0
8 90 0 5 5
9 90 0 0 10
10 85 15 0 0
You might want to remove three rows containing 100 in columns Red, Yellow and Green.

How to make a random strata sample in R?

I have a data.frame calls "per" who has three variables: nrodocumento, cod_jer(42 groups) and grupo_fict(8 groups). I would like to have a random sample (data.frame)for each cod_jer and inside each grupo_fict.
> dput(head(per))
structure(list(nrodocumento = c(49574917L, 54692750L, 54731807L,
57364176L, 57364198L, 46867674L), cod_jer = c(1146L, 32L, 0L,
0L, 0L, 0L), grupo_fict = c(3L, 1L, 8L, 1L, 1L, 1L)), .Names =
c("nrodocumento",
"cod_jer", "grupo_fict"), row.names = c(NA, 6L), class = "data.frame")
> head(per,n=100)
nrodocumento cod_jer grupo_fict
1 49574917 1146 3
2 54692750 32 1
3 54731807 0 8
4 57364176 0 1
5 57364198 0 1
6 46867674 0 1
7 46867668 0 1
8 57364201 0 1
9 53767871 0 1
10 55339012 0 1
11 49204318 0 8
12 53743017 0 1
13 47622958 0 1
14 49019862 0 1
15 50167428 0 2
16 48783260 0 4
17 52020945 433 5
18 54486680 236 4
19 51402916 0 4
20 48543242 0 2
21 54671603 0 1
22 50644599 0 8
23 53293608 0 1
24 52742799 0 4
25 49815210 0 8
26 50967719 236 3
27 51938997 0 8
28 50057188 324 3
29 52754706 0 6
30 55322102 0 3
31 53040748 0 1
32 50321642 0 5
33 51621354 236 8
34 49611806 0 7
35 53347667 0 8
36 52462498 0 3
37 54158570 0 8
38 54034849 0 8
39 52507674 321 3
40 50218598 317 7
41 45078442 432 7
42 51491066 0 8
43 53278953 0 2
44 52661658 0 2
45 50092873 236 3
46 50308064 0 7
47 51941635 0 7
48 53527966 0 1
49 49614579 0 1
50 49450678 318 8
51 52953427 1146 7
52 52133221 0 8
53 53363128 0 7
54 52819643 0 1
55 47516589 0 1
56 52563137 0 3
57 49511296 0 7
58 54154013 0 2
59 50822420 1349 4
60 50822408 1349 4
61 50822414 1349 6
62 52339683 0 1
63 50026113 0 7
64 47328586 0 7
65 56041961 0 7
66 47756955 432 8
67 53158397 0 7
68 53151167 0 7
69 54710039 0 3
70 54408844 114 4
71 46286323 114 4
72 50310877 0 1
73 50929135 0 7
74 49817218 0 1
75 53604540 0 8
76 52812736 1147 1
77 53726314 1147 1
78 50835936 0 8
79 55429334 0 1
80 48421020 329 8
81 49800217 0 3
82 52818263 0 1
83 45884978 0 1
84 50203385 0 1
85 53433610 0 2
86 54515938 0 1
87 50263935 0 8
88 52439152 0 2
89 48424129 236 3
90 47031563 0 8
91 53577610 11 1
92 48759083 11 1
93 50344731 432 1
94 51164013 0 3
95 52026977 163 7
96 50965482 0 3
97 45947594 433 8
98 53357234 0 7
99 48367529 0 8
100 54286153 0 3
> table(per$cod_jer,per$grupo_fict)
1 2 3 4 5 6 7 8
0 3990 2296 1743 1453 356 250 2031 2051
11 149 85 29 34 14 6 34 25
13 2 4 1 0 0 0 1 1
14 3 1 0 0 0 0 0 1
32 37 12 13 10 3 1 23 13
101 19 12 6 5 3 0 6 12
102 2 0 0 0 0 0 0 0
103 11 10 3 3 0 1 3 0
104 17 8 1 7 2 1 7 9
105 11 12 3 3 3 0 6 10
106 147 57 30 29 8 1 43 42
107 33 37 5 9 3 2 8 9
108 6 10 2 3 0 2 3 4
109 44 37 11 9 6 2 14 14
111 112 81 26 28 8 3 22 18
112 21 8 4 8 2 0 3 2
113 94 61 14 16 4 1 17 24
114 60 52 10 14 9 5 8 20
115 72 24 21 13 5 1 11 16
125 5 4 1 0 1 0 0 1
138 15 5 2 2 1 0 2 0
163 50 35 26 26 7 12 43 41
234 51 43 31 32 10 7 49 53
236 78 29 46 35 7 7 39 37
317 44 28 21 13 7 2 28 21
318 20 27 5 10 4 3 12 14
319 45 21 25 19 1 2 26 21
321 6 4 9 3 0 3 8 1
322 43 30 24 16 5 3 16 34
323 30 14 25 15 3 4 24 22
324 59 29 31 27 8 5 28 27
325 15 12 6 5 1 2 8 11
326 18 12 17 13 4 2 20 15
327 45 28 23 26 7 6 25 40
328 52 49 33 32 5 9 31 35
329 42 36 26 20 2 3 23 30
431 6 2 4 1 2 0 2 6
432 39 18 27 24 5 1 28 34
433 139 92 90 89 18 13 61 66
1146 97 49 26 14 7 5 24 29
1147 56 33 26 25 9 0 19 20
1349 15 9 11 10 0 1 10 3
1544 62 33 20 32 4 3 25 43
1545 37 13 22 14 1 3 14 31
1848 16 27 11 15 3 0 10 12
For other hand I have a data.frame wiht vacancies, I mean, the size of each sample I need inside each gruop.
> dput(head(vacantes))
structure(list(cod_jer = c(101L, 316L, 325L, 1349L, 1544L, 102L
), vacantes = c(132, 180, 54, 63, 45, 0), vac1 = c(27, 36, 11,
13, 9, 0), vac2 = c(27, 36, 11, 13, 9, 0), vac3 = c(24, 33, 10,
12, 9, 0), vac4 = c(24, 33, 10, 12, 9, 0), vac5 = c(8, 11, 4,
4, 3, 0), vac6 = c(8, 11, 4, 4, 3, 0), vac7 = c(7, 10, 3, 3,
2, 0), vac8 = c(7, 10, 3, 3, 2, 0)), .Names = c("cod_jer", "vacantes",
"vac1", "vac2", "vac3", "vac4", "vac5", "vac6", "vac7", "vac8"
), row.names = c(NA, 6L), class = "data.frame")
> vacantes
cod_jer vacantes vac1 vac2 vac3 vac4 vac5 vac6 vac7 vac8
1 101 132 27 27 24 24 8 8 7 7
2 316 180 36 36 33 33 11 11 10 10
3 325 54 11 11 10 10 4 4 3 3
4 1349 63 13 13 12 12 4 4 3 3
5 1544 45 9 9 9 9 3 3 2 2
6 102 0 0 0 0 0 0 0 0 0
7 103 0 0 0 0 0 0 0 0 0
8 104 0 0 0 0 0 0 0 0 0
9 105 0 0 0 0 0 0 0 0 0
10 106 0 0 0 0 0 0 0 0 0
11 107 0 0 0 0 0 0 0 0 0
12 108 0 0 0 0 0 0 0 0 0
13 109 0 0 0 0 0 0 0 0 0
14 110 0 0 0 0 0 0 0 0 0
15 111 0 0 0 0 0 0 0 0 0
16 112 0 0 0 0 0 0 0 0 0
17 113 0 0 0 0 0 0 0 0 0
18 114 0 0 0 0 0 0 0 0 0
19 115 0 0 0 0 0 0 0 0 0
20 137 0 0 0 0 0 0 0 0 0
21 138 0 0 0 0 0 0 0 0 0
22 139 0 0 0 0 0 0 0 0 0
23 140 0 0 0 0 0 0 0 0 0
24 234 0 0 0 0 0 0 0 0 0
25 236 0 0 0 0 0 0 0 0 0
26 317 0 0 0 0 0 0 0 0 0
27 318 0 0 0 0 0 0 0 0 0
28 319 0 0 0 0 0 0 0 0 0
29 320 0 0 0 0 0 0 0 0 0
30 321 0 0 0 0 0 0 0 0 0
31 322 0 0 0 0 0 0 0 0 0
32 323 0 0 0 0 0 0 0 0 0
33 324 0 0 0 0 0 0 0 0 0
34 326 0 0 0 0 0 0 0 0 0
35 327 0 0 0 0 0 0 0 0 0
36 328 0 0 0 0 0 0 0 0 0
37 329 0 0 0 0 0 0 0 0 0
38 431 0 0 0 0 0 0 0 0 0
39 432 0 0 0 0 0 0 0 0 0
40 433 0 0 0 0 0 0 0 0 0
41 1146 0 0 0 0 0 0 0 0 0
42 1147 0 0 0 0 0 0 0 0 0
43 1545 0 0 0 0 0 0 0 0 0
44 1630 0 0 0 0 0 0 0 0 0
45 1848 0 0 0 0 0 0 0 0 0
I would like to make a sample strata in each of this combination groups: cod_jer and grupo_fict, in case of vacancies are 0, the sample size will be 0.
I was trying this:
size=subset(vacantes,select=c(vac1,vac2,vac3,vac4,vac5,vac6,vac7,vac8))
size=as.matrix(size)
size=as.vector(size)
for(i in 1:length(size)) {
if (size[i] > 0 ) {
s=strata(per,c("cod_jer","grupo_fict"),size=size,
method="srswor")
} else {
s="0"
}}
But I cant get it work :(
Any suugestion?
Thanks!

Calculate post error slowing in R

For my research, I would like to calculate the post-error slowing in the stop signal task to find out whether people become slower after they failed to inhibit their response. Here is some data and I would like to do the following:
For each subject determine first if it was a stop-trial (signal = 1)
For each stop-trial, determine if it is correct (signal = 1 & correct = 2) and then determine whether the next trial (thus the trial directly after the stop-trial) is a go-trial (signal = 0)
Then calculate the average reaction time for all these go-trials that directly follow a stop trial when the response is correct (signal = 0 & correct = 2).
For each incorrect stop trial (signal = 1 & correct = 0) determine whether the next trial (thus the trial directly after the stop-trial) is a go-trial (signal = 0)
Then calculate the average reaction time for all these go-trials that directly follow a stop-trial when the response is correct (correct = 2).
Then calculate the difference between the RTs calculated in step 2 and 3 (= post-error slowing).
I'm not that experienced in R to achieve this. I hope someone can help me with this script.
subject trial signal correct RT
1 1 0 2 755
1 2 0 2 543
1 3 1 0 616
1 4 0 2 804
1 5 0 2 594
1 6 0 2 705
1 7 1 2 0
1 8 1 2 0
1 9 0 2 555
1 10 1 0 604
1 11 0 2 824
1 12 0 2 647
1 13 0 2 625
1 14 0 2 657
1 15 1 0 578
1 16 0 2 810
1 17 1 2 0
1 18 0 2 646
1 19 0 2 574
1 20 0 2 748
1 21 0 0 856
1 22 0 2 679
1 23 0 2 738
1 24 0 2 620
1 25 0 2 715
1 26 1 2 0
1 27 0 2 675
1 28 0 2 560
1 29 1 0 584
1 30 0 2 564
1 31 0 2 994
1 32 1 2 0
1 33 0 2 715
1 34 0 2 644
1 35 0 2 545
1 36 0 2 528
1 37 1 2 0
1 38 0 2 636
1 39 0 2 684
1 40 1 2 0
1 41 0 2 653
1 42 0 2 766
1 43 0 2 747
1 44 0 2 821
1 45 0 2 612
1 46 0 2 624
1 47 0 2 665
1 48 1 2 0
1 49 0 2 594
1 50 0 2 665
1 51 1 0 658
1 52 0 2 800
1 53 1 2 0
1 54 1 0 738
1 55 0 2 831
1 56 0 2 815
1 57 0 2 776
1 58 0 2 710
1 59 0 2 842
1 60 1 0 516
1 61 0 2 758
1 62 1 2 0
1 63 0 2 628
1 64 0 2 713
1 65 0 2 835
1 66 1 0 791
1 67 0 2 871
1 68 0 2 816
1 69 0 2 769
1 70 0 2 930
1 71 0 2 676
1 72 0 2 868
2 1 0 2 697
2 2 0 2 689
2 3 0 2 584
2 4 1 0 788
2 5 0 2 448
2 6 0 2 564
2 7 0 2 587
2 8 1 0 553
2 9 0 2 706
2 10 0 2 442
2 11 1 0 245
2 12 0 2 601
2 13 0 2 774
2 14 1 0 579
2 15 0 2 652
2 16 0 2 556
2 17 0 2 963
2 18 0 2 725
2 19 0 2 751
2 20 0 2 709
2 21 0 2 741
2 22 1 0 613
2 23 0 2 781
2 24 1 2 0
2 25 0 2 634
2 26 1 2 0
2 27 0 2 487
2 28 1 2 0
2 29 0 2 692
2 30 0 2 745
2 31 1 2 0
2 32 0 2 610
2 33 0 2 836
2 34 1 0 710
2 35 0 2 757
2 36 0 2 781
2 37 0 2 1029
2 38 0 2 832
2 39 1 0 626
2 40 1 2 0
2 41 0 2 844
2 42 0 2 837
2 43 0 2 792
2 44 0 2 789
2 45 0 2 783
2 46 0 0 0
2 47 0 0 468
2 48 0 2 686
This may be too late to be useful but here's my solution: (i.e. I first split the data frame by subject, and then apply the same algorithm to each subject; the result is:
# 1 2
# -74.60317 23.39286
X <- read.table(
text=" subject trial signal correct RT
1 1 0 2 755
1 2 0 2 543
1 3 1 0 616
1 4 0 2 804
1 5 0 2 594
1 6 0 2 705
1 7 1 2 0
1 8 1 2 0
1 9 0 2 555
1 10 1 0 604
1 11 0 2 824
1 12 0 2 647
1 13 0 2 625
1 14 0 2 657
1 15 1 0 578
1 16 0 2 810
1 17 1 2 0
1 18 0 2 646
1 19 0 2 574
1 20 0 2 748
1 21 0 0 856
1 22 0 2 679
1 23 0 2 738
1 24 0 2 620
1 25 0 2 715
1 26 1 2 0
1 27 0 2 675
1 28 0 2 560
1 29 1 0 584
1 30 0 2 564
1 31 0 2 994
1 32 1 2 0
1 33 0 2 715
1 34 0 2 644
1 35 0 2 545
1 36 0 2 528
1 37 1 2 0
1 38 0 2 636
1 39 0 2 684
1 40 1 2 0
1 41 0 2 653
1 42 0 2 766
1 43 0 2 747
1 44 0 2 821
1 45 0 2 612
1 46 0 2 624
1 47 0 2 665
1 48 1 2 0
1 49 0 2 594
1 50 0 2 665
1 51 1 0 658
1 52 0 2 800
1 53 1 2 0
1 54 1 0 738
1 55 0 2 831
1 56 0 2 815
1 57 0 2 776
1 58 0 2 710
1 59 0 2 842
1 60 1 0 516
1 61 0 2 758
1 62 1 2 0
1 63 0 2 628
1 64 0 2 713
1 65 0 2 835
1 66 1 0 791
1 67 0 2 871
1 68 0 2 816
1 69 0 2 769
1 70 0 2 930
1 71 0 2 676
1 72 0 2 868
2 1 0 2 697
2 2 0 2 689
2 3 0 2 584
2 4 1 0 788
2 5 0 2 448
2 6 0 2 564
2 7 0 2 587
2 8 1 0 553
2 9 0 2 706
2 10 0 2 442
2 11 1 0 245
2 12 0 2 601
2 13 0 2 774
2 14 1 0 579
2 15 0 2 652
2 16 0 2 556
2 17 0 2 963
2 18 0 2 725
2 19 0 2 751
2 20 0 2 709
2 21 0 2 741
2 22 1 0 613
2 23 0 2 781
2 24 1 2 0
2 25 0 2 634
2 26 1 2 0
2 27 0 2 487
2 28 1 2 0
2 29 0 2 692
2 30 0 2 745
2 31 1 2 0
2 32 0 2 610
2 33 0 2 836
2 34 1 0 710
2 35 0 2 757
2 36 0 2 781
2 37 0 2 1029
2 38 0 2 832
2 39 1 0 626
2 40 1 2 0
2 41 0 2 844
2 42 0 2 837
2 43 0 2 792
2 44 0 2 789
2 45 0 2 783
2 46 0 0 0
2 47 0 0 468
2 48 0 2 686", header=TRUE)
sapply(split(X, X["subject"]), function(D){
PCRT <- with(D, RT[which(c(signal[-1],NA)==1 & c(correct[-1], NA)==2 & signal==0) ])
PERT <- with(D, RT[which(c(signal[-1],NA)==1 & c(correct[-1], NA)==0 & signal==0) ])
mean(PERT) - mean(PCRT)
})
This is ok if you can be sure that every respondent has at least 1 correct and 1 incorrect "stop" trial followed by a "go" trial. A more general case would be (giving NA if they are either always correct or always mistaken):
sapply(split(X, X["subject"]), function(D){
PCRT <- with(D, RT[which(c(signal[-1],NA)==1 & c(correct[-1], NA)==2 & signal==0) ])
PERT <- with(D, RT[which(c(signal[-1],NA)==1 & c(correct[-1], NA)==0 & signal==0) ])
if(length(PCRT)>0 & length(PERT)>0) mean(PERT) - mean(PCRT) else NA
})
Does that help you? A little bit redundant maybe, but I tried to follow your steps as best as possible (not sure whether I mixed something up, please check for yourself looking at the table). The idea is to put the data in a csv file first and treat it as a data frame. Find the csv raw file here: http://pastebin.com/X5b2ysmQ
data <- read.csv("datatable.csv",header=T)
data[,"condition1"] <- data[,"signal"] == 1
data[,"condition2"] <- data[,"condition1"] & data[,"correct"] == 2
data[,"RT1"] <- NA
for(i in which(data[,"condition2"])){
if( nrow(data)>i && !data[i+1,"condition1"] && data[i+1,"correct"] == 2 )
# next is a go trial
data[i+1,"RT1"] <- data[i+1,"RT"]
}
averageRT1 <- mean( data[ !is.na(data[,"RT1"]) ,"RT1"] )
data[,"RT2"] <- NA
for(i in which(data[,"condition1"] & data[,"correct"] == 0)){
if( nrow(data)>i && !data[i+1,"condition1"] && data[i+1,"correct"] == 2 )
# next is a go trial
data[i+1,"RT2"] <- data[i+1,"RT"]
}
averageRT2 <- mean( data[ !is.na(data[,"RT2"]) ,"RT2"] )
postErrorSlowing <- abs(averageRT2-averageRT1)
#Nilsole I just tried it and it is almost perfect. How could the code be improved that for each subject the postErrorSlowing is calculated and placed in a dataframe? Thus that a new data frame is created which consists of subject number (1,2,3 etc.) and the postErrorSlowing variable? Something like this (postErrorSlowing are made up numbers)
subject postErrorSlowing
1 50
2 75
....

ctree to produce predictions other than the existing categories

I have a ctree to run. In my training set, the response variable has 3 categories: 0, 1, 99.
However, the tree plot produces more value outcomes between 0 and 1:
Actual
Prediction 0 1 99
0 6281 0 0
0.0869565217391304 63 6 0
0.288888888888889 32 13 0
0.529411764705882 24 27 0
0.588235294117647 35 50 0
0.625 9 15 0
0.641891891891892 53 95 0
0.684014869888476 85 184 0
0.807692307692308 5 21 0
0.853035143769968 46 267 0
0.864406779661017 8 51 0
0.892018779342723 23 190 0
0.896103896103896 8 69 0
0.95668549905838 23 508 0
0.98695652173913 3 227 0
1 0 58 0
99 0 0 3018
Does anyone know how is this possible?
Thank you!

Importing DAT file into R but uneven columns

I have a DAT file I want to read into R but when I import my data, it keeps on showing I have 10 columns/variables (coming from first line) when in actuality, it is really supposed to be 29 columns/variables. How do i fix this problem?
DAT file example on notepad:
smsa66 smsa76 nearc2 nearc4 nearc4a nearc4b ed76 ed66 age76 daded
nodaded momed nomomed momdad14 sinmom14 step14 south66 south76
lwage76 famed black wage76 enroll76 kww iqscore mar76 libcrd14
exp76 exp762
1 1 0 0 0 0 7
5 29 9.94 1 10.25 1 1
0 0 0 0 6.306275 9 1
548 0 15 . 1 0 16
256
1 1 0 0 0 0 12
11 27 8 0 8 0 1
0 0 0 0 6.175867 8 0
481 0 35 93 1 1 9
81
1 1 0 0 0 0 12
12 34 14 0 12 0 1
0 0 0 0 6.580639 2 0
721 0 42 103 1 1 16
256
1 1 1 1 1 0 11
11 27 11 0 12 0 1
0 0 0 0 5.521461 6 0
250 0 25 88 1 1 10
100
1 1 1 1 1 0 12
12 34 8 0 7 0 1
0 0 0 0 6.591674 8 0
729 0 34 108 1 0 16
256
1 1 1 1 1 0 12
11 26 9 0 12 0 1
0 0 0 0 6.214608 6 0
500 0 38 85 1 1 8
64
1 1 1 1 1 0 18
16 33 14 0 14 0 1
0 0 0 0 6.336826 1 0
565 0 41 119 1 1 9
81
1 1 1 1 1 0 14
13 29 14 0 14 0 1
0 0 0 0 6.410175 1 0
608 0 46 108 1 1 9
81
txt1<-" smsa66 smsa76 nearc2 nearc4 nearc4a nearc4b ed76 ed66 age76 daded
nodaded momed nomomed momdad14 sinmom14 step14 south66 south76
lwage76 famed black wage76 enroll76 kww iqscore mar76 libcrd14
exp76 exp762"
txt2 <-
" 1 1 0 0 0 0 7
5 29 9.94 1 10.25 1 1
0 0 0 0 6.306275 9 1
548 0 15 NA 1 0 16
256
1 1 0 0 0 0 12
11 27 8 0 8 0 1
0 0 0 0 6.175867 8 0
481 0 35 93 1 1 9
81
1 1 0 0 0 0 12
12 34 14 0 12 0 1
0 0 0 0 6.580639 2 0
721 0 42 103 1 1 16
256
1 1 1 1 1 0 11
11 27 11 0 12 0 1
0 0 0 0 5.521461 6 0
250 0 25 88 1 1 10
100
1 1 1 1 1 0 12
12 34 8 0 7 0 1
0 0 0 0 6.591674 8 0
729 0 34 108 1 0 16
256
1 1 1 1 1 0 12
11 26 9 0 12 0 1
0 0 0 0 6.214608 6 0
500 0 38 85 1 1 8
64
1 1 1 1 1 0 18
16 33 14 0 14 0 1
0 0 0 0 6.336826 1 0
565 0 41 119 1 1 9
81
1 1 1 1 1 0 14
13 29 14 0 14 0 1
0 0 0 0 6.410175 1 0
608 0 46 108 1 1 9
81"
Now the code:
inp <- scan(text=txt2, what="numeric")
inmat <- matrix( as.numeric(inp), ncol=29, byrow=TRUE)
dfrm <- as.data.frame(inmat)
scan(text=txt1, what="")
Read 29 items
[1] "smsa66" "smsa76" "nearc2" "nearc4" "nearc4a" "nearc4b" "ed76"
[8] "ed66" "age76" "daded" "nodaded" "momed" "nomomed" "momdad14"
[15] "sinmom14" "step14" "south66" "south76" "lwage76" "famed" "black"
[22] "wage76" "enroll76" "kww" "iqscore" "mar76" "libcrd14" "exp76"
[29] "exp762"
names(dfrm) <- scan(text=txt1, what="")
#Read 29 items
dfrm
#-----------------------
smsa66 smsa76 nearc2 nearc4 nearc4a nearc4b ed76 ed66 age76 daded nodaded momed nomomed
1 1 1 0 0 0 0 7 5 29 9.94 1 10.25 1
2 1 1 0 0 0 0 12 11 27 8 0 8 0
3 1 1 0 0 0 0 12 12
snipped remainder of output
Final result:
str(dfrm)
'data.frame': 8 obs. of 29 variables:
$ smsa66 : num 1 1 1 1 1 1 1 1
$ smsa76 : num 1 1 1 1 1 1 1 1
$ nearc2 : num 0 0 0 1 1 1 1 1
$ nearc4 : num 0 0 0 1 1 1 1 1
$ nearc4a : num 0 0 0 1 1 1 1 1
$ nearc4b : num 0 0 0 0 0 0 0 0
$ ed76 : num 7 12 12 11 12 12 18 14
$ ed66 : num 5 11 12 11 12 11 16 13
$ age76 : num 29 27 34 27 34 26 33 29
$ daded : num 9.94 8 14 11 8 9 14 14
$ nodaded : num 1 0 0 0 0 0 0 0
$ momed : num 10.2 8 12 12 7 ...
$ nomomed : num 1 0 0 0 0 0 0 0
$ momdad14: num 1 1 1 1 1 1 1 1
$ sinmom14: num 0 0 0 0 0 0 0 0
$ step14 : num 0 0 0 0 0 0 0 0
$ south66 : num 0 0 0 0 0 0 0 0
$ south76 : num 0 0 0 0 0 0 0 0
$ lwage76 : num 6.31 6.18 6.58 5.52 6.59 ...
$ famed : num 9 8 2 6 8 6 1 1
$ black : num 1 0 0 0 0 0 0 0
$ wage76 : num 548 481 721 250 729 500 565 608
$ enroll76: num 0 0 0 0 0 0 0 0
$ kww : num 15 35 42 25 34 38 41 46
$ iqscore : num NA 93 103 88 108 85 119 108
$ mar76 : num 1 1 1 1 1 1 1 1
$ libcrd14: num 0 1 1 1 0 1 1 1
$ exp76 : num 16 9 16 10 16 8 9 9
$ exp762 : num 256 81 256 100 256 64 81 81

Resources