I have a data frame : see below and I would like to do a summaryby that will tell me the min and max of time (in second) for each CowID. But even though length(CowID)=length(Time), it doesn't work and I have the error :
Error in tapply(currVAR, rh.string.factor, function(x) { :
arguments must have same length
I wonder about the str of my data, there're a lot that is not useful like "Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 25852 obs. of 6 variables:" and what appears after $Time
> summary(LMD60)
CowID Date DateHour
2140 : 727 Min. :2014-01-13 Min. :2014-01-13 14:33:05
2019 : 366 1st Qu.:2014-01-20 1st Qu.:2014-01-20 15:33:46
2228 : 366 Median :2014-01-28 Median :2014-01-28 14:48:52
2234 : 366 Mean :2014-01-27 Mean :2014-01-28 04:26:46
2235 : 366 3rd Qu.:2014-02-04 3rd Qu.:2014-02-04 15:57:25
2047 : 365 Max. :2014-02-12 Max. :2014-02-12 16:10:39
(Other):23296
Measure Feeding Time
Min. : 8.0 hoko :15857 Min. : 0.00
1st Qu.: 56.0 strap: 9995 1st Qu.:15.00
Median : 96.0 Median :30.00
Mean : 135.8 Mean :30.34
3rd Qu.: 168.0 3rd Qu.:45.00
Max. :1634.0 Max. :60.00
> str(LMD60)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 25852 obs. of 6 variables:
$ CowID : Factor w/ 71 levels "1921","1923",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Date : Date, format: "2014-01-27" "2014-01-27" ...
$ DateHour: POSIXct, format: "2014-01-27 15:16:35" "2014-01-27 15:16:36" ...
$ Measure : num 53 57 108 75 38 54 148 139 72 94 ...
$ Feeding : Factor w/ 2 levels "hoko","strap": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : num 0 1 1 2 2 3 3 4 4 5 ...
- attr(*, "vars")=List of 2
..$ : symbol CowID
..$ : symbol Date
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 215
..$ : int 0 1 2 3 4 5 6 7 8 9 ...
..$ : int 121 122 123 124 125 126 127 128 129 130 ...
..$ : int 241 242 243 244 245 246 247 248 249 250 ...
..$ : int 362 363 364 365 366 367 368 369 370 371 ...
..$ : int 483 484 485 486 487 488 489 490 491 492 ...
..$ : int 604 605 606 607 608 609 610 611 612 613 ...
..$ : int 726 727 728 729 730 731 732 733 734 735 ...
..$ : int 847 848 849 850 851 852 853 854 855 856 ...
..$ : int 968 969 970 971 972 973 974 975 976 977 ...
..$ : int 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 ...
..$ : int 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 ...
..$ : int 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 ...
..$ : int 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 ...
..$ : int 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 ...
..$ : int 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 ...
..$ : int 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 ...
..$ : int 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 ...
..$ : int 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 ...
..$ : int 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 ...
..$ : int 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 ...
..$ : int 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 ...
..$ : int 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 ...
..$ : int 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 ...
..$ : int 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 ...
..$ : int 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 ...
..$ : int 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 ...
..$ : int 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 ...
..$ : int 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 ...
..$ : int 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 ...
..$ : int 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 ...
..$ : int 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 ...
..$ : int 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 ...
..$ : int 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 ...
..$ : int 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 ...
..$ : int 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 ...
..$ : int 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 ...
..$ : int 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 ...
..$ : int 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 ...
..$ : int 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 ...
..$ : int 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 ...
..$ : int 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 ...
..$ : int 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 ...
..$ : int 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 ...
..$ : int 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 ...
..$ : int 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 ...
..$ : int 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 ...
..$ : int 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 ...
..$ : int 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 ...
..$ : int 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 ...
..$ : int 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 ...
..$ : int 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 ...
..$ : int 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 ...
..$ : int 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 ...
..$ : int 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 ...
..$ : int 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 ...
..$ : int 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 ...
..$ : int 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 ...
..$ : int 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 ...
..$ : int 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 ...
..$ : int 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 ...
..$ : int 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 ...
..$ : int 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 ...
..$ : int 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 ...
..$ : int 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 ...
..$ : int 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 ...
..$ : int 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 ...
..$ : int 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 ...
..$ : int 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 ...
..$ : int 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 ...
..$ : int 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 ...
..$ : int 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 ...
..$ : int 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 ...
..$ : int 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 ...
..$ : int 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 ...
..$ : int 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 ...
..$ : int 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 ...
..$ : int 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 ...
..$ : int 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 ...
..$ : int 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 ...
..$ : int 9514 9515 9516 9517 9518 9519 9520 9521 9522 9523 ...
..$ : int 9635 9636 9637 9638 9639 9640 9641 9642 9643 9644 ...
..$ : int 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 ...
..$ : int 9877 9878 9879 9880 9881 9882 9883 9884 9885 9886 ...
..$ : int 9999 10000 10001 10002 10003 10004 10005 10006 10007 10008 ...
..$ : int 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 ...
..$ : int 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 ...
..$ : int 10364 10365 10366 10367 10368 10369 10370 10371 10372 10373 ...
..$ : int 10484 10485 10486 10487 10488 10489 10490 10491 10492 10493 ...
..$ : int 10601 10602 10603 10604 10605 10606 10607 10608 10609 10610 ...
..$ : int 10722 10723 10724 10725 10726 10727 10728 10729 10730 10731 ...
..$ : int 10839 10840 10841 10842 10843 10844 10845 10846 10847 10848 ...
..$ : int 10959 10960 10961 10962 10963 10964 10965 10966 10967 10968 ...
..$ : int 11078 11079 11080 11081 11082 11083 11084 11085 11086 11087 ...
..$ : int 11199 11200 11201 11202 11203 11204 11205 11206 11207 11208 ...
..$ : int 11320 11321 11322 11323 11324 11325 11326 11327 11328 11329 ...
..$ : int 11442 11443 11444 11445 11446 11447 11448 11449 11450 11451 ...
..$ : int 11563 11564 11565 11566 11567 11568 11569 11570 11571 11572 ...
..$ : int 11684 11685 11686 11687 11688 11689 11690 11691 11692 11693 ...
..$ : int 11801 11802 11803 11804 11805 11806 11807 11808 11809 11810 ...
.. [list output truncated]
- attr(*, "group_sizes")= int 121 120 121 121 121 122 121 121 121 121 ...
- attr(*, "biggest_group_size")= int 122
- attr(*, "labels")='data.frame': 215 obs. of 2 variables:
..$ CowID: Factor w/ 71 levels "1921","1923",..: 1 1 1 2 2 2 3 3 3 4 ...
..$ Date : Date, format: "2014-01-27" "2014-01-28" ...
..- attr(*, "vars")=List of 2
.. ..$ : symbol CowID
.. ..$ : symbol Date
..- attr(*, "drop")= logi TRUE
> summaryBy(LMD60$Time~LMD60$CowID, data=LMD60, FUN=list(min,max))
Error in tapply(currVAR, rh.string.factor, function(x) { :
arguments must have same length
You have to change your data from tibble to dataframe (data= as.data.frame(data)). summaryBy works only on dataframes.
I know I'm terribly late to this thread, but I just ran into the same problem as OP and hrbrmstr's recommendation of coercing it into a dataframe worked perfectly so I'm posting in case anyone else comes across this thread, try df <- as.data.frame(df) first.
Related
I have some problems summarise function in "dplyr" package.
This is the code.
library("dplyr")
a <- read.csv("Number of subway passengers.csv",header = T, stringsAsFactor = F)
a <- a[,c(-2,-3,-4,-5)]
colnames(a)=c("Date","4-5","5-6","6-7","7-8","8-9","9-10","10-11","11-12","12-13","13-14","14-15","15-
16","16-17","17-18","18-19","19-20","20-21","21-22","22-23","23-24","0-1","1-2","2-3","3-4","Total")
b <- summarise(a,mean_passenger=mean("Total",na.rm=TRUE))
After running the last code I have some error in summarise.
In mean.default("Total", na.rm = TRUE) : argument is not numeric or logical:returning NA
Why does this error occur?
I attach the result of using the function str.
> str(a)
'data.frame': 16501 obs. of 26 variables:
$ Date : chr "2019-11-01" "2019-11-01" "2019-11-01" "2019-11-01" ...
$ 4-5 : int 32 2 3 0 5 0 11 1 2 0 ...
$ 5-6 : int 438 353 89 182 143 211 187 127 83 175 ...
$ 6-7 : int 529 2019 152 852 161 1078 154 477 115 622 ...
$ 7-8 : int 1612 4520 289 2926 288 4395 302 1044 219 1817 ...
$ 8-9 : int 3405 9906 435 9348 482 13000 386 3662 366 5234 ...
$ 9-10 : int 2360 6525 481 4124 631 6669 550 3510 494 3292 ...
$ 10-11 : int 2377 3571 716 2064 768 2964 841 2593 843 2292 ...
$ 11-12 : int 2853 2951 1090 1889 1359 2501 1686 2813 1262 2349 ...
$ 12-13 : int 3334 3190 1073 1538 1531 2127 1781 2646 1583 2160 ...
$ 13-14 : int 3545 3348 1367 1751 1937 2108 2059 2718 1868 2159 ...
$ 14-15 : int 2850 3179 1782 1403 2466 1926 2405 2579 2303 2071 ...
$ 15-16 : int 4606 3265 2235 1431 2821 1718 3125 2103 2479 1559 ...
$ 16-17 : int 4915 3575 2345 1218 3403 1778 3241 2010 2656 1777 ...
$ 17-18 : int 7472 4191 3627 1249 5807 2396 3796 2033 3583 1599 ...
$ 18-19 : int 11107 5445 7462 1486 10738 3746 4836 2582 5246 1776 ...
$ 19-20 : int 5754 3882 2943 816 4680 2557 3192 1682 2709 1261 ...
$ 20-21 : int 3920 2596 2249 439 3670 935 2107 675 1782 548 ...
$ 21-22 : int 3799 2177 2199 288 4495 510 2452 512 1565 341 ...
$ 22-23 : int 3369 1624 1460 296 4118 384 2407 380 1094 260 ...
$ 23-24 : int 1678 912 640 202 2366 299 1394 323 596 153 ...
$ 0-1 : int 228 478 62 47 271 75 236 143 66 73 ...
$ 1-2 : int 2 39 0 1 1 0 6 10 1 1 ...
$ 2-3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ 3-4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Total : int 70185 67748 32699 33550 52141 51377 37154 34623 30915 31519 ...
"Total" is interpreted as a string. We can reproduce the same error with
mean("Total")
#[1] NA
Warning message:
In mean.default("Total") : argument is not numeric or logical: returning NA
We need to use Total without quotes to be interpreted as column.
b <- dplyr::summarise(a, mean_passenger = mean(Total,na.rm=TRUE))
I have a dataframe with SAT scores for all states in US.
'data.frame': 51 obs. of 7 variables:
$ X2010.rank : int 1 2 3 4 5 6 7 8 9 10 ...
$ state : chr "Iowa " "Minnesota " "Wisconsin " "Missouri " ...
$ reading : int 603 594 595 593 585 592 585 590 585 580 ...
$ math : int 613 607 604 595 605 603 600 595 593 594 ...
$ writing : int 582 580 579 580 576 571 577 567 568 559 ...
$ combined : int 1798 1781 1778 1768 1766 1766 1762 1752 1746 1733 ...
$ participation: chr "3%" "7%" "4%" "4%" ...
I need to find the index of a particular state. I tried the which command but its returning integer(0)
> which(sat$state=="California")
integer(0)
However this command is working for other rows and getting me the index:
> which(sat$combined==1781)
[1] 2
where am I going wrong. Please help.
I'm running a straightforward linear regression model fit on the following dataframe:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 3 4 5 6 7 1 2 3 4 ...
$ MONTH : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 3 3 3 3 3 3 3 3 3 3 ...
I split the data in to training and test sets as follows using the caret package:
split<-createDataPartition(y = model_data_rev$EXIT_20, p = 0.7, list = FALSE)
d_training = model_data_rev[split,]
d_test = model_data_rev[-split,]
I train the model using the train function in the caret package:
ctrl<-trainControl(method = 'cv',number = 5)
lmCVFit<-train(EXIT_20 ~ ., data = d_training, method = 'lm', trControl = ctrl, metric='Rsquared')
summary(lmCVFit)
When I run summary(lmCVFit) I get the following error:
Error in summary.lm(object$finalModel, ...) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning message:
In cbind(est, se, tval, 2 * pt(abs(tval), rdf, lower.tail = FALSE)) :
number of rows of result is not a multiple of vector length (arg 1)
I thought it might be the related to the my initial dataframe above. Specifically, i thought it could have to do with the factor variables. So I cut them off (not shown), ran everything again, and got the same error.
I also ran the regression without CV using the 'lm' function in R and got the same error when I ran summary()
Has anyone seen this and can anyone help? I can't find anything on line that speaks to this error in the context of regression.
Thanks in advance.
EDIT
I modified the ordinal variable to standard character variables. Structure now looks like this:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 4 2 6 7 ...
$ MONTH : Factor w/ 12 levels "April","August",..: 8 8 8 8 8 8 8 8 8 8 ...
I still get the error when running summary after fitting the model.
It is also important emphasize that the model fitting works without throwing an error. It is summary() that is throwing off the error.
Thanks.
I have a load of genomic data (dput way too large)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 7454 obs. of 3 variables:
$ chr : num 1 1 1 1 1 1 1 1 1 1 ...
$ leftPos: num 480000 600000 2520000 2760000 2880000 3000000 3120000 3480000 3600000 4440000 ...
$ Means : num 45.2 58.3 10.7 81.2 16 ...
- attr(*, "vars")=List of 1
..$ : symbol chr
- attr(*, "labels")='data.frame': 22 obs. of 1 variable:
..$ chr: Factor w/ 24 levels "chr1","chr10",..: 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "vars")=List of 1
.. ..$ : symbol chr
- attr(*, "indices")=List of 22
..$ : int 0 1 2 3 4 5 6 7 8 9 ...
..$ : int 559 560 561 562 563 564 565 566 567 568 ...
..$ : int 908 909 910 911 912 913 914 915 916 917 ...
..$ : int 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 ...
..$ : int 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 ...
..$ : int 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 ...
..$ : int 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 ...
..$ : int 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 ...
..$ : int 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 ...
..$ : int 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 ...
..$ : int 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 ...
..$ : int 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 ...
..$ : int 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 ...
..$ : int 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 ...
..$ : int 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 ...
..$ : int 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 ...
..$ : int 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 ...
..$ : int 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 ...
..$ : int 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 ...
..$ : int 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 ...
..$ : int 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 ...
..$ : int 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 ...
- attr(*, "group_sizes")= int 559 349 383 370 283 229 177 173 140 222 ...
- attr(*, "biggest_group_size")= int 682
I would like to plot this on a facet plot but limit the x ax-s to the maximum of the leftPos for each chr. At the moment the facet is plotted for each chr with equal width. When I use scales="free_x" the facet just stretches the plot to fill a pre-defined width. Is it possible to have different width facets?
The code I'm using so far:
ggplot(Zoutliers1,aes(x = leftPos,
y = as.numeric(Means),
group = chr,
xend = leftPos,
yend=0))+
geom_bar(stat="identity",fill = "red", size = 1, colour = "red")+
geom_line()+
geom_segment(linetype= 1, colour = "#919191")+
ggtitle(TBBName)+
ylim(-50,480)+
facet_wrap(~ chr,nrow = 1)+
geom_hline(yintercept = UL1)+
geom_hline(yintercept = LL1)+
theme(panel.margin = unit(0.1, "lines"))+
theme(axis.text.x = element_blank())+
theme(panel.border = element_rect(fill=NA,color="darkred", size=0.5,
linetype="dashed"))
The plot I'm getting:
You need to set the 'space' parameter to 'free', which can only be done in facet_grid. Here is a demonstration with sample data.
library(gridExtra)
library(ggplot2)
#creating some sample data
set.seed(10001)
dat <-data.frame(chr=1:6,leftPos=seq(100,1000,length.out=6))
dat2 <- dat[sample(1:nrow(dat),1000,T),]
dat2$x <- rnorm(nrow(dat2))*dat2$chr
#basic plot
p1 <- ggplot(dat2, aes(x=x)) +
geom_histogram()
#different scales
p_scales <- p1 + facet_grid(.~chr, scales="free_x") + labs(title="free x, default space")
p_space_scales <- p1 + facet_grid(.~chr, scales = "free_x",space="free") + labs(title="free x and free space")
grid.arrange(p_scales,p_space_scales)
I need to convert my data frame into a numeric matrix. However, when I use the data.frame function, the decimals get converted to a different number and I have no idea why. Can someone fill me in on what's happening?
> head(x[,1:5])
TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2 -0.840833333333333 0.018 0.354916666666667 -0.203750 0.6890000
CREB3L1 1.333 0.7625 0.13475 2.498750 1.1572500
RPS11 1.4755 0.3245 0.634 0.483125 0.9526250
PNMA1 -1.39075 -1.48725 -0.8305 -0.463250 -2.2230000
MMP2 0.0278333333333333 -0.2065 0.0666666666666666 2.156000 0.1501667
C10orf90 -2.5495 -2.76575 -2.76375 -2.482250 -2.1107500
> head(data.matrix(x[,1:5]))
TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2 3323 94 1701 -0.203750 0.6890000
CREB3L1 4307 3022 654 2.498750 1.1572500
RPS11 4485 1458 2786 0.483125 0.9526250
PNMA1 4379 4438 3397 -0.463250 -2.2230000
MMP2 155 932 328 2.156000 0.1501667
C10orf90 5139 5193 5230 -2.482250 -2.1107500
> class(x)
[1] "data.frame"
> str(x)
'data.frame': 6150 obs. of 174 variables:
$ TCGA-AA-3520-01A-01R-0821-07: Factor w/ 5538 levels "","0","0.000166666666666662",..: 3323 4307 4485 4379 155 5139 4177 1400 4735 3363 ...
$ TCGA-AA-3532-01A-01R-0821-07: Factor w/ 5597 levels "","0.000499999999999968",..: 94 3022 1458 4438 932 5193 1374 2757 4671 2503 ...
$ TCGA-AA-3553-01A-01R-0821-07: Factor w/ 5550 levels "","0.000249999999999995",..: 1701 654 2786 3397 328 5230 65 194 4900 3966 ...
$ TCGA-A6-2674-01A-02R-0821-07: num -0.204 2.499 0.483 -0.463 2.156 ...
$ TCGA-AA-3521-01A-01R-0821-07: num 0.689 1.157 0.953 -2.223 0.15 ...
$ TCGA-AA-3534-01A-01R-0821-07: num -0.6789 -0.0877 1.5736 -1.6678 -0.7148 ...
$ TCGA-AA-3555-01A-01R-0821-07: Factor w/ 5580 levels "","-0.00012499999999999",..: 373 4970 2076 519 1344 5084 3882 1285 4760 2778 ...
$ TCGA-A6-2670-01A-02R-0821-07: num 0.588 0.569 0.808 -1.661 1.073 ...
$ TCGA-A6-2683-01A-01R-0821-07: num -0.77 0.741 1.564 -2.984 -1.569 ...
$ TCGA-AA-3526-01A-02R-0821-07: num -0.824 2.215 0.819 -1.846 -0.862 ...
$ TCGA-A6-2677-01A-01R-0821-07: num -0.733 0.526 0.892 -1.598 -1.69 ...
$ TCGA-AA-3522-01A-01R-0821-07: num -0.981 2.094 0.818 -1.048 -1.452 ...
$ TCGA-AA-3538-01A-01R-0821-07: num -0.144 0.631 0.794 -1.523 -0.198 ...
$ TCGA-AA-3556-01A-01R-0821-07: Factor w/ 5556 levels "","-0.000125000000000014",..: 2256 4772 3446 4253 4040 4927 3026 316 3766 3221 ...
$ TCGA-A6-2678-01A-01R-0821-07: num -1.38 1.706 1.103 -2.725 -0.918 ...
$ TCGA-AA-3524-01A-02R-0821-07: Factor w/ 5611 levels "","-0.0005","0.000500000000000006",..: 4062 3671 4749 4751 4051 5226 2623 1227 4252 1489 ...
$ TCGA-AA-3542-01A-02R-0821-07: num -1.195 0.641 1.952 -1.63 -1.264 ...
$ TCGA-AA-3558-01A-01R-0821-07: Factor w/ 5580 levels "","0.000375000000000007",..: 4245 3920 4277 4910 4766 5126 1450 3350 4898 1915 ...
$ TCGA-AA-3544-01A-01R-0821-07: num -0.157 0.649 0.937 -1.941 -1.417 ...
$ TCGA-AA-3560-01A-01R-0821-07: num -0.146 0.554 0.581 -2.503 -0.438 ...
$ TCGA-AA-3514-01A-02R-0821-07: Factor w/ 5678 levels "","0","0.000375000000000028",..: 3800 2056 2422 1158 1507 4620 3564 1877 5480 4076 ...
$ TCGA-AA-3527-01A-01R-0821-07: num -0.3973 -0.0915 1.4019 -2.5513 -0.395 ...
$ TCGA-AA-3548-01A-01R-0821-07: Factor w/ 5470 levels "","0.000100000000000011",..: 2590 3817 3388 4531 2770 4922 2715 406 4473 2711 ...
$ TCGA-AA-3561-01A-01R-0821-07: num -1.115 1.01 1.266 -1.419 -0.537 ...
$ TCGA-AA-3517-01A-01R-0821-07: Factor w/ 5604 levels "","-0.000333333333333335",..: 479 1182 4514 5003 4005 4799 1499 4796 849 3079 ...
$ TCGA-AA-3529-01A-02R-0821-07: Factor w/ 5583 levels "","-0.000124999999999978",..: 2912 3970 4073 4555 4257 5238 3242 2668 899 3508 ...
$ TCGA-AA-3549-01A-02R-0821-07: Factor w/ 5538 levels "","0.000166666666666671",..: 1378 4762 4356 4857 519 4739 1254 4777 350 444 ...
$ TCGA-AA-3562-01A-02R-0821-07: Factor w/ 5628 levels "","0","0.000249999999999993",..: 2453 3556 3523 4987 2236 5148 1681 1854 2249 4096 ...
The data.matrix() function converts factors to numbers by using their internal codes. That's why they're listed as factors in the data frame and have different values after using data.matrix(). To create a numeric matrix in this situation, try this:
y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)
When using as.matrix(), factors become strings. Using apply() will convert everything to numeric without losing the matrix structure.
As Stephen Henderson mentioned in his comment, it's a good idea to try to figure out why the numeric values stored in your data frame are being treated as factors.