How do I fix a summarise function error in dplyr package?

How do I fix a summarise function error in dplyr package? - r

I have some problems summarise function in "dplyr" package.
This is the code.
library("dplyr")
a <- read.csv("Number of subway passengers.csv",header = T, stringsAsFactor = F)
a <- a[,c(-2,-3,-4,-5)]
colnames(a)=c("Date","4-5","5-6","6-7","7-8","8-9","9-10","10-11","11-12","12-13","13-14","14-15","15-
16","16-17","17-18","18-19","19-20","20-21","21-22","22-23","23-24","0-1","1-2","2-3","3-4","Total")
b <- summarise(a,mean_passenger=mean("Total",na.rm=TRUE))
After running the last code I have some error in summarise.
In mean.default("Total", na.rm = TRUE) : argument is not numeric or logical:returning NA
Why does this error occur?
I attach the result of using the function str.
> str(a)
'data.frame': 16501 obs. of 26 variables:
$ Date : chr "2019-11-01" "2019-11-01" "2019-11-01" "2019-11-01" ...
$ 4-5 : int 32 2 3 0 5 0 11 1 2 0 ...
$ 5-6 : int 438 353 89 182 143 211 187 127 83 175 ...
$ 6-7 : int 529 2019 152 852 161 1078 154 477 115 622 ...
$ 7-8 : int 1612 4520 289 2926 288 4395 302 1044 219 1817 ...
$ 8-9 : int 3405 9906 435 9348 482 13000 386 3662 366 5234 ...
$ 9-10 : int 2360 6525 481 4124 631 6669 550 3510 494 3292 ...
$ 10-11 : int 2377 3571 716 2064 768 2964 841 2593 843 2292 ...
$ 11-12 : int 2853 2951 1090 1889 1359 2501 1686 2813 1262 2349 ...
$ 12-13 : int 3334 3190 1073 1538 1531 2127 1781 2646 1583 2160 ...
$ 13-14 : int 3545 3348 1367 1751 1937 2108 2059 2718 1868 2159 ...
$ 14-15 : int 2850 3179 1782 1403 2466 1926 2405 2579 2303 2071 ...
$ 15-16 : int 4606 3265 2235 1431 2821 1718 3125 2103 2479 1559 ...
$ 16-17 : int 4915 3575 2345 1218 3403 1778 3241 2010 2656 1777 ...
$ 17-18 : int 7472 4191 3627 1249 5807 2396 3796 2033 3583 1599 ...
$ 18-19 : int 11107 5445 7462 1486 10738 3746 4836 2582 5246 1776 ...
$ 19-20 : int 5754 3882 2943 816 4680 2557 3192 1682 2709 1261 ...
$ 20-21 : int 3920 2596 2249 439 3670 935 2107 675 1782 548 ...
$ 21-22 : int 3799 2177 2199 288 4495 510 2452 512 1565 341 ...
$ 22-23 : int 3369 1624 1460 296 4118 384 2407 380 1094 260 ...
$ 23-24 : int 1678 912 640 202 2366 299 1394 323 596 153 ...
$ 0-1 : int 228 478 62 47 271 75 236 143 66 73 ...
$ 1-2 : int 2 39 0 1 1 0 6 10 1 1 ...
$ 2-3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ 3-4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Total : int 70185 67748 32699 33550 52141 51377 37154 34623 30915 31519 ...

"Total" is interpreted as a string. We can reproduce the same error with
mean("Total")
#[1] NA
Warning message:
In mean.default("Total") : argument is not numeric or logical: returning NA
We need to use Total without quotes to be interpreted as column.
b <- dplyr::summarise(a, mean_passenger = mean(Total,na.rm=TRUE))

Related

R find the index of a charactor in dataframe

I have a dataframe with SAT scores for all states in US.
'data.frame': 51 obs. of 7 variables:
$ X2010.rank : int 1 2 3 4 5 6 7 8 9 10 ...
$ state : chr "Iowa " "Minnesota " "Wisconsin " "Missouri " ...
$ reading : int 603 594 595 593 585 592 585 590 585 580 ...
$ math : int 613 607 604 595 605 603 600 595 593 594 ...
$ writing : int 582 580 579 580 576 571 577 567 568 559 ...
$ combined : int 1798 1781 1778 1768 1766 1766 1762 1752 1746 1733 ...
$ participation: chr "3%" "7%" "4%" "4%" ...
I need to find the index of a particular state. I tried the which command but its returning integer(0)
> which(sat$state=="California")
integer(0)
However this command is working for other rows and getting me the index:
> which(sat$combined==1781)
[1] 2
where am I going wrong. Please help.

Transform single row into rows and columns

I have a list of 170 items, each with 12 variables. This data is currently organised in one continuous row (1 observations of 2040 variables), e.g.:
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
but I want it to be organised into 170 columns with 12 rows as follows:
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
I have tried the following:
list2=lapply(list1, function(x) t(x))
but this doesn't alter the data in any way. Is there something else I can do to transform the data?

We convert the string to a vector of numeric elements with scan, split the vector by itself to create a list and convert it to a data.frame
v2 <- scan(text=v1, what=numeric(), quiet=TRUE)
data.frame(split(v2, v2))

If your data is already converted into a vector (as #akrun showed with using scan) you could also do:
data <- 1:2040 # your data
breaks <- seq(1, 2040, 170)
result <- lapply(breaks, function(x) data[x : (x + 169)])
Results in
> str(result)
List of 12
$ : int [1:170] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:170] 171 172 173 174 175 176 177 178 179 180 ...
$ : int [1:170] 341 342 343 344 345 346 347 348 349 350 ...
$ : int [1:170] 511 512 513 514 515 516 517 518 519 520 ...
$ : int [1:170] 681 682 683 684 685 686 687 688 689 690 ...
$ : int [1:170] 851 852 853 854 855 856 857 858 859 860 ...
$ : int [1:170] 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 ...
$ : int [1:170] 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 ...
$ : int [1:170] 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 ...
$ : int [1:170] 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 ...
$ : int [1:170] 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 ...
$ : int [1:170] 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 ...

non meaningful operation for fractor error when storing new value in data frame: R

I am trying to update a a value in a data frame but am getting--what seems to me--a weird error about operation that I don't think I am using.
Here's a summary of the data:
> str(us.cty2015#data)
'data.frame': 3108 obs. of 15 variables:
$ STATEFP : Factor w/ 52 levels "01","02","04",..: 17 25 33 46 4 14 16 24 36 42 ...
$ COUNTYFP : Factor w/ 325 levels "001","003","005",..: 112 91 67 9 43 81 7 103 72 49 ...
$ COUNTYNS : Factor w/ 3220 levels "00023901","00025441",..: 867 1253 1600 2465 38 577 690 1179 1821 2104 ...
$ AFFGEOID : Factor w/ 3220 levels "0500000US01001",..: 976 1472 1879 2813 144 657 795 1395 2098 2398 ...
$ GEOID : Factor w/ 3220 levels "01001","01003",..: 976 1472 1879 2813 144 657 795 1395 2098 2398 ...
$ NAME : Factor w/ 1910 levels "Abbeville","Acadia",..: 1558 1703 1621 688 856 1075 148 1807 1132 868 ...
$ LSAD : Factor w/ 9 levels "00","03","04",..: 5 5 5 5 5 5 5 5 5 5 ...
$ ALAND : num 1.66e+09 1.10e+09 3.60e+09 2.12e+08 1.50e+09 ...
$ AWATER : num 2.78e+06 5.24e+07 3.50e+07 2.92e+08 8.91e+06 ...
$ t_pop : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_wht : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_free_blk: num 0 0 0 0 0 0 0 0 0 0 ...
$ n_slv : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_blk : num 0 0 0 0 0 0 0 0 0 0 ...
$ n_free : num 0 0 0 0 0 0 0 0 0 0 ...
> str(us.cty1860#data)
'data.frame': 2126 obs. of 29 variables:
$ DECADE : Factor w/ 1 level "1860": 1 1 1 1 1 1 1 1 1 1 ...
$ NHGISNAM : Factor w/ 1236 levels "Abbeville","Accomack",..: 1142 1218 1130 441 812 548 1144 56 50 887 ...
$ NHGISST : Factor w/ 41 levels "010","050","060",..: 32 13 9 36 16 36 16 30 23 39 ...
$ NHGISCTY : Factor w/ 320 levels "0000","0010",..: 142 206 251 187 85 231 131 12 6 161 ...
$ ICPSRST : Factor w/ 37 levels "1","11","12",..: 5 13 21 26 22 26 22 10 15 17 ...
$ ICPSRCTY : Factor w/ 273 levels "10","1010","1015",..: 25 93 146 72 247 122 12 10 228 45 ...
$ ICPSRNAM : Factor w/ 1200 levels "ABBEVILLE","ACCOMACK",..: 1108 1184 1097 432 791 535 1110 55 49 860 ...
$ STATENAM : Factor w/ 41 levels "Alabama","Arkansas",..: 32 13 9 36 16 36 16 30 23 39 ...
$ ICPSRSTI : int 14 31 44 49 45 49 45 24 34 40 ...
$ ICPSRCTYI : int 1210 1970 2910 1810 710 2450 1130 110 50 1450 ...
$ ICPSRFIP : num 0 0 0 0 0 0 0 0 0 0 ...
$ STATE : Factor w/ 41 levels "010","050","060",..: 32 13 9 36 16 36 16 30 23 39 ...
$ COUNTY : Factor w/ 320 levels "0000","0010",..: 142 206 251 187 85 231 131 12 6 161 ...
$ PID : num 1538 735 306 1698 335 ...
$ X_CENTROID : num 1348469 184343 1086494 -62424 585888 ...
$ Y_CENTROID : num 556680 588278 -229809 -433290 -816852 ...
$ GISJOIN : Factor w/ 2126 levels "G0100010","G0100030",..: 1585 627 319 1769 805 1788 823 1425 1079 2006 ...
$ GISJOIN2 : Factor w/ 2126 levels "0100010","0100030",..: 1585 627 319 1769 805 1788 823 1425 1079 2006 ...
$ SHAPE_AREA : num 2.35e+09 1.51e+09 8.52e+08 2.54e+09 6.26e+08 ...
$ SHAPE_LEN : num 235777 155261 166065 242608 260615 ...
$ t_pop : int 25043 653 4413 8184 174491 1995 4324 17187 4649 8392 ...
$ n_wht : int 24974 653 4295 6892 149063 1684 3001 17123 4578 2580 ...
$ n_free_blk : int 69 0 2 0 10939 2 7 64 12 409 ...
$ n_slv : int 0 0 116 1292 14484 309 1316 0 59 5403 ...
$ n_blk : int 69 0 118 1292 25423 311 1323 64 71 5812 ...
$ n_free : num 25043 653 4297 6892 160007 ...
$ frac_free : num 1 1 0.974 0.842 0.917 ...
$ frac_free_blk: num 1 NA 0.0169 0 0.4303 ...
$ frac_slv : num 0 0 0.0263 0.1579 0.083 ...
> str(overlap)
'data.frame': 15266 obs. of 7 variables:
$ cty2015 : Factor w/ 3108 levels "0","1","10","100",..: 1 1 2 2 2 2 2 1082 1082 1082 ...
$ cty1860 : Factor w/ 2126 levels "0","1","10","100",..: 1047 1012 1296 1963 2033 2058 2065 736 1413 1569 ...
$ area_inter : num 1.66e+09 2.32e+05 9.81e+04 1.07e+09 7.67e+07 ...
$ area1860 : num 1.64e+11 1.81e+11 1.54e+09 2.91e+09 2.32e+09 ...
$ frac_1860 : num 1.01e-02 1.28e-06 6.35e-05 3.67e-01 3.30e-02 ...
$ sum_frac_1860 : num 1 1 1 1 1 ...
$ scaled_frac_1860: num 1.01e-02 1.28e-06 6.35e-05 3.67e-01 3.30e-02 ...
I am trying to multiply a vector of variables vars <- c("t_pop", "n_wht", "n_free_blk", "n_slv", "n_blk", "n_free") in the us.cty1860#data data frame by a scalar overlap$scaled_frac_1860[i], then add it to the same vector of variables in the us.cty2015#data data frame, and finally overwrite the variables in the us.cty2015#data data frame.
When I make the following call, I get an error that seems to be saying that I am trying to preform invalid operations on factors (which is not the case (you can confirm from the str output)).
> us.cty2015#data[overlap$cty2015[1], vars] <- us.cty2015#data[overlap$cty2015[1], vars] + (overlap$scaled_frac_1860[1] * us.cty1860#data[overlap$cty1860[1], vars])
Error in Summary.factor(1L, na.rm = FALSE) :
‘max’ not meaningful for factors
In addition: Warning message:
In Ops.factor(i, 0L) : ‘>=’ not meaningful for factors
However, when I don't attempt to overwrite the old value, the operation works fine.
> us.cty2015#data[overlap$cty2015[1], vars] + (overlap$scaled_frac_1860[1] * us.cty1860#data[overlap$cty1860[1], vars])
t_pop n_wht n_free_blk n_slv n_blk n_free
0 118.3889 113.6468 0.1317233 4.610316 4.742039 113.7785
I'm sure there are better ways of accomplishing what I am trying to do but does anyone have any idea what is going on?
Edit:
I am using the following libraries: rgdal, rgeos, and maptools
The all the data/object are coming from NHGIS shapefiles 1860 and 2015 United States Counties.

SummaryBy - "arguments must have same length"

I have a data frame : see below and I would like to do a summaryby that will tell me the min and max of time (in second) for each CowID. But even though length(CowID)=length(Time), it doesn't work and I have the error :
Error in tapply(currVAR, rh.string.factor, function(x) { :
arguments must have same length
I wonder about the str of my data, there're a lot that is not useful like "Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 25852 obs. of 6 variables:" and what appears after $Time
> summary(LMD60)
CowID Date DateHour
2140 : 727 Min. :2014-01-13 Min. :2014-01-13 14:33:05
2019 : 366 1st Qu.:2014-01-20 1st Qu.:2014-01-20 15:33:46
2228 : 366 Median :2014-01-28 Median :2014-01-28 14:48:52
2234 : 366 Mean :2014-01-27 Mean :2014-01-28 04:26:46
2235 : 366 3rd Qu.:2014-02-04 3rd Qu.:2014-02-04 15:57:25
2047 : 365 Max. :2014-02-12 Max. :2014-02-12 16:10:39
(Other):23296
Measure Feeding Time
Min. : 8.0 hoko :15857 Min. : 0.00
1st Qu.: 56.0 strap: 9995 1st Qu.:15.00
Median : 96.0 Median :30.00
Mean : 135.8 Mean :30.34
3rd Qu.: 168.0 3rd Qu.:45.00
Max. :1634.0 Max. :60.00
> str(LMD60)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 25852 obs. of 6 variables:
$ CowID : Factor w/ 71 levels "1921","1923",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Date : Date, format: "2014-01-27" "2014-01-27" ...
$ DateHour: POSIXct, format: "2014-01-27 15:16:35" "2014-01-27 15:16:36" ...
$ Measure : num 53 57 108 75 38 54 148 139 72 94 ...
$ Feeding : Factor w/ 2 levels "hoko","strap": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : num 0 1 1 2 2 3 3 4 4 5 ...
- attr(*, "vars")=List of 2
..$ : symbol CowID
..$ : symbol Date
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 215
..$ : int 0 1 2 3 4 5 6 7 8 9 ...
..$ : int 121 122 123 124 125 126 127 128 129 130 ...
..$ : int 241 242 243 244 245 246 247 248 249 250 ...
..$ : int 362 363 364 365 366 367 368 369 370 371 ...
..$ : int 483 484 485 486 487 488 489 490 491 492 ...
..$ : int 604 605 606 607 608 609 610 611 612 613 ...
..$ : int 726 727 728 729 730 731 732 733 734 735 ...
..$ : int 847 848 849 850 851 852 853 854 855 856 ...
..$ : int 968 969 970 971 972 973 974 975 976 977 ...
..$ : int 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 ...
..$ : int 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 ...
..$ : int 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 ...
..$ : int 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 ...
..$ : int 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 ...
..$ : int 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 ...
..$ : int 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 ...
..$ : int 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 ...
..$ : int 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 ...
..$ : int 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 ...
..$ : int 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 ...
..$ : int 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 ...
..$ : int 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 ...
..$ : int 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 ...
..$ : int 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 ...
..$ : int 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 ...
..$ : int 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 ...
..$ : int 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 ...
..$ : int 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 ...
..$ : int 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 ...
..$ : int 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 ...
..$ : int 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 ...
..$ : int 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 ...
..$ : int 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 ...
..$ : int 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 ...
..$ : int 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 ...
..$ : int 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 ...
..$ : int 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 ...
..$ : int 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 ...
..$ : int 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 ...
..$ : int 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 ...
..$ : int 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 ...
..$ : int 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 ...
..$ : int 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 ...
..$ : int 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 ...
..$ : int 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 ...
..$ : int 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 ...
..$ : int 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 ...
..$ : int 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 ...
..$ : int 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 ...
..$ : int 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 ...
..$ : int 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 ...
..$ : int 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 ...
..$ : int 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 ...
..$ : int 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 ...
..$ : int 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 ...
..$ : int 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 ...
..$ : int 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 ...
..$ : int 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 ...
..$ : int 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 ...
..$ : int 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 ...
..$ : int 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 ...
..$ : int 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 ...
..$ : int 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 ...
..$ : int 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 ...
..$ : int 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 ...
..$ : int 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 ...
..$ : int 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 ...
..$ : int 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 ...
..$ : int 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 ...
..$ : int 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 ...
..$ : int 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 ...
..$ : int 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 ...
..$ : int 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 ...
..$ : int 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 ...
..$ : int 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 ...
..$ : int 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 ...
..$ : int 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 ...
..$ : int 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 ...
..$ : int 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 ...
..$ : int 9514 9515 9516 9517 9518 9519 9520 9521 9522 9523 ...
..$ : int 9635 9636 9637 9638 9639 9640 9641 9642 9643 9644 ...
..$ : int 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 ...
..$ : int 9877 9878 9879 9880 9881 9882 9883 9884 9885 9886 ...
..$ : int 9999 10000 10001 10002 10003 10004 10005 10006 10007 10008 ...
..$ : int 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 ...
..$ : int 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 ...
..$ : int 10364 10365 10366 10367 10368 10369 10370 10371 10372 10373 ...
..$ : int 10484 10485 10486 10487 10488 10489 10490 10491 10492 10493 ...
..$ : int 10601 10602 10603 10604 10605 10606 10607 10608 10609 10610 ...
..$ : int 10722 10723 10724 10725 10726 10727 10728 10729 10730 10731 ...
..$ : int 10839 10840 10841 10842 10843 10844 10845 10846 10847 10848 ...
..$ : int 10959 10960 10961 10962 10963 10964 10965 10966 10967 10968 ...
..$ : int 11078 11079 11080 11081 11082 11083 11084 11085 11086 11087 ...
..$ : int 11199 11200 11201 11202 11203 11204 11205 11206 11207 11208 ...
..$ : int 11320 11321 11322 11323 11324 11325 11326 11327 11328 11329 ...
..$ : int 11442 11443 11444 11445 11446 11447 11448 11449 11450 11451 ...
..$ : int 11563 11564 11565 11566 11567 11568 11569 11570 11571 11572 ...
..$ : int 11684 11685 11686 11687 11688 11689 11690 11691 11692 11693 ...
..$ : int 11801 11802 11803 11804 11805 11806 11807 11808 11809 11810 ...
.. [list output truncated]
- attr(*, "group_sizes")= int 121 120 121 121 121 122 121 121 121 121 ...
- attr(*, "biggest_group_size")= int 122
- attr(*, "labels")='data.frame': 215 obs. of 2 variables:
..$ CowID: Factor w/ 71 levels "1921","1923",..: 1 1 1 2 2 2 3 3 3 4 ...
..$ Date : Date, format: "2014-01-27" "2014-01-28" ...
..- attr(*, "vars")=List of 2
.. ..$ : symbol CowID
.. ..$ : symbol Date
..- attr(*, "drop")= logi TRUE
> summaryBy(LMD60$Time~LMD60$CowID, data=LMD60, FUN=list(min,max))
Error in tapply(currVAR, rh.string.factor, function(x) { :
arguments must have same length

You have to change your data from tibble to dataframe (data= as.data.frame(data)). summaryBy works only on dataframes.

I know I'm terribly late to this thread, but I just ran into the same problem as OP and hrbrmstr's recommendation of coercing it into a dataframe worked perfectly so I'm posting in case anyone else comes across this thread, try df <- as.data.frame(df) first.

how to limit x axis length based on values in all facets of facet_wrap

I have a load of genomic data (dput way too large)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 7454 obs. of 3 variables:
$ chr : num 1 1 1 1 1 1 1 1 1 1 ...
$ leftPos: num 480000 600000 2520000 2760000 2880000 3000000 3120000 3480000 3600000 4440000 ...
$ Means : num 45.2 58.3 10.7 81.2 16 ...
- attr(*, "vars")=List of 1
..$ : symbol chr
- attr(*, "labels")='data.frame': 22 obs. of 1 variable:
..$ chr: Factor w/ 24 levels "chr1","chr10",..: 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "vars")=List of 1
.. ..$ : symbol chr
- attr(*, "indices")=List of 22
..$ : int 0 1 2 3 4 5 6 7 8 9 ...
..$ : int 559 560 561 562 563 564 565 566 567 568 ...
..$ : int 908 909 910 911 912 913 914 915 916 917 ...
..$ : int 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 ...
..$ : int 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 ...
..$ : int 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 ...
..$ : int 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 ...
..$ : int 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 ...
..$ : int 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 ...
..$ : int 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 ...
..$ : int 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 ...
..$ : int 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 ...
..$ : int 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 ...
..$ : int 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 ...
..$ : int 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 ...
..$ : int 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 ...
..$ : int 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 ...
..$ : int 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 ...
..$ : int 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 ...
..$ : int 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 ...
..$ : int 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 ...
..$ : int 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 ...
- attr(*, "group_sizes")= int 559 349 383 370 283 229 177 173 140 222 ...
- attr(*, "biggest_group_size")= int 682
I would like to plot this on a facet plot but limit the x ax-s to the maximum of the leftPos for each chr. At the moment the facet is plotted for each chr with equal width. When I use scales="free_x" the facet just stretches the plot to fill a pre-defined width. Is it possible to have different width facets?
The code I'm using so far:
ggplot(Zoutliers1,aes(x = leftPos,
y = as.numeric(Means),
group = chr,
xend = leftPos,
yend=0))+
geom_bar(stat="identity",fill = "red", size = 1, colour = "red")+
geom_line()+
geom_segment(linetype= 1, colour = "#919191")+
ggtitle(TBBName)+
ylim(-50,480)+
facet_wrap(~ chr,nrow = 1)+
geom_hline(yintercept = UL1)+
geom_hline(yintercept = LL1)+
theme(panel.margin = unit(0.1, "lines"))+
theme(axis.text.x = element_blank())+
theme(panel.border = element_rect(fill=NA,color="darkred", size=0.5,
linetype="dashed"))
The plot I'm getting:

You need to set the 'space' parameter to 'free', which can only be done in facet_grid. Here is a demonstration with sample data.
library(gridExtra)
library(ggplot2)
#creating some sample data
set.seed(10001)
dat <-data.frame(chr=1:6,leftPos=seq(100,1000,length.out=6))
dat2 <- dat[sample(1:nrow(dat),1000,T),]
dat2$x <- rnorm(nrow(dat2))*dat2$chr
#basic plot
p1 <- ggplot(dat2, aes(x=x)) +
geom_histogram()
#different scales
p_scales <- p1 + facet_grid(.~chr, scales="free_x") + labs(title="free x, default space")
p_space_scales <- p1 + facet_grid(.~chr, scales = "free_x",space="free") + labs(title="free x and free space")
grid.arrange(p_scales,p_space_scales)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I fix a summarise function error in dplyr package? - r

Related

R find the index of a charactor in dataframe

Transform single row into rows and columns

non meaningful operation for fractor error when storing new value in data frame: R

SummaryBy - "arguments must have same length"

how to limit x axis length based on values in all facets of facet_wrap

Categories

Resources