create matrix from raw data - r
My data looks like this:
> head(data, 20)
# A tibble: 20 x 2
hosp zip
<chr> <chr>
1 010001 14843
2 010001 36303
3 010016 13320
4 010021 10468
5 010023 36040
6 010023 36116
7 010023 36116
8 010023 36116
9 010024 36401
10 010029 10025
11 010029 11412
12 010029 11733
13 010033 14086
14 010033 14701
15 010033 35244
16 010034 12308
17 010038 11413
18 010039 10011
19 010039 11704
20 010039 35749
hospis hospital id and zip is zip code. Patients in each hospital came from multiple zip codes. How can I create a matrix to present for each hospital, how many patients were from each zip code?
Ideal matrix would be like this:
zip 010001 010016 010021 ... hosp
14843 1 0 0
36303 1 0 0
13320 0 1 0
10468 0 0 1
Thanks!!
As was stated in the comments you can use table. The t() function puts zip code on the left:
t(as.matrix(table(data)))
Related
R saccades analysis
Blockquote I got eye tracking gaze data in the form of x/y coordinates and timestamps. Now I want to plot the saccades using the R package saccades. Unfortunately, it doesn't work. I guess it's a matter of having the data in the wrong format. My data: > View(EUFKDCDL_Q09AS_saccades_2) > head(EUFKDCDL_Q09AS_saccades) # A tibble: 6 x 4 time x y trial <dbl> <dbl> <dbl> <dbl> 1 1550093577941 732 391 1 2 1550093577962 706 320 1 3 1550093577980 666 352 1 4 1550093578000 886 288 1 5 1550093578017 787 221 1 6 1550093578037 729 302 1 The code that didn't work: > fixations <- detect.fixations(EUFKDCDL_Q09AS_saccades) Error in detect.fixations(EUFKDCDL_Q09AS_saccades) : No saccades were detected. Something went wrong. The full code that shouldwork according github (it'swith the sample data): > library(saccades) > data(samples) > head(samples) time x y trial 1 0 53.18 375.73 1 2 4 53.20 375.79 1 3 8 53.35 376.14 1 4 12 53.92 376.39 1 5 16 54.14 376.52 1 6 20 54.46 376.74 1 > fixations <- detect.fixations(samples) > head(fixations[c(1,4,5,10)]) trial x y dur 0 1 53.81296 377.40741 71 1 1 39.68156 379.58711 184 2 1 59.99267 379.92467 79 3 1 18.97898 56.94046 147 4 1 40.28365 39.03599 980 5 1 47.36547 35.39441 1310 > diagnostic.plot(samples, fixations) So there must be a problem with how my data is structured I guess? What does the mean? I hope that any of you can help me creating this saccade plot as in the sceenshot attached I am an R newbie as well...please be patient with me. :D
Check and count conditions for following value
I have a dataframe with 18 rows and 25 variables. The values are between 0 and 1. For each row, I want to count the number of times a high value (> than 0.7) is followed by a low value (<0.4) and stored that count in a new column. So far I have been using: df$n_calls<-rowSums(df > 0.7) I know it is possible to use different conditions but in my case it is very important to check that the low value is right after the high value Here is an example of my df 1 2 3 4 5 6 7 8 9 10 11 1 0.186158072 0.27738592 0.42165043 0.43501515 0.10918095 0.09976244 0.09571536 0.08674526 0.09239877 0.07523392 0.043679510 2 0.773469188 0.75381254 0.20389633 0.46444408 0.30433377 0.68334244 0.42105103 0.66224478 0.32412056 0.30951402 0.616658953 3 0.201245200 0.26873094 0.25892904 0.38605874 0.68438397 0.30236790 0.51493090 0.66314468 0.68910974 0.59134860 0.625550641 4 0.033746517 0.06388212 0.06978669 0.05517553 0.06032239 0.06736223 0.06514233 0.05133860 0.06034266 0.05702451 0.011144861 5 0.590297759 0.40352955 0.08106493 0.06063485 0.07780428 0.09633069 0.10882515 0.11468680 0.28375374 0.63941033 0.629284574 6 0.165001648 0.31174739 0.36955514 0.47581249 0.65349233 0.66471913 0.58004314 0.50790858 0.51298260 0.18651107 0.501195655 7 0.033164989 0.05678890 0.05941058 0.04139692 0.04660761 0.05452679 0.04939543 0.02780824 0.03680599 0.04645522 0.018496662 8 0.080893779 0.07228276 0.07473865 0.05536056 0.05732153 0.06403365 0.06139970 0.05142047 0.05698089 0.06998986 0.032598440 9 0.557273680 0.49226191 0.63900601 0.37497255 0.72114277 0.37557355 0.34360391 0.37502000 0.41622472 0.46852220 0.410656260 10 -0.004010143 0.03051558 0.04403711 0.02749514 0.04770637 0.05800898 0.05603494 0.04163723 0.04622024 0.04677767 0.007736933 11 0.280273472 0.59839662 0.74167893 0.75352655 0.75108785 0.72345468 0.65395063 0.32957749 0.08357061 0.33165070 0.731228429 12 0.107398713 0.10983041 0.13630594 0.19905651 0.47014034 0.72519345 0.69545405 0.62194265 0.49873996 0.16549282 0.087689371 13 0.164520925 0.22763832 0.50824238 0.59686660 0.68419908 0.66837348 0.62380175 0.20226234 0.11425066 0.09725765 0.078701134 14 0.076934267 0.09684586 0.10703672 0.08436558 0.10789735 0.24130640 0.36615645 0.42805115 0.42937392 0.51390288 0.584757257 15 0.055565174 0.06796064 0.07519020 0.05498454 0.05754891 0.06377643 0.06537049 0.05152625 0.05783594 0.05963775 0.022556411 16 0.126975964 0.19394191 0.53324900 0.60905758 0.67072084 0.61613836 0.55415573 0.18317823 0.13453799 0.09835233 0.067080267 17 0.730333357 0.65759923 0.59045925 0.63148539 0.36305458 0.40829673 0.48734552 0.58647457 0.66968986 0.48312152 0.453863785 18 0.196450179 0.33968393 0.51538678 0.44868341 0.22221050 0.18934329 0.19179838 0.18764290 0.22423578 0.27524872 0.608625015 12 13 14 15 16 17 18 19 20 21 22 1 0.038553121 0.040081485 0.05358118 0.07403555 0.05091901 0.042299806 0.04322122 0.05587749 0.06881493 0.09753878 0.10462942 2 0.618447812 0.048885425 0.06231155 0.08228801 0.05963307 0.022666894 0.09384802 0.07914030 0.08549148 0.08373159 0.07404309 3 0.179434300 0.679981042 0.69176338 0.74453573 0.70937271 0.289762839 0.17956945 0.68770664 0.73864122 0.73187173 0.34604987 4 0.005094105 0.007952117 0.02076629 0.04174891 0.02129751 0.010066515 0.01454399 0.04337116 0.05259742 0.05795045 0.04533231 5 0.554122074 0.322792638 0.21839661 0.18322419 0.05764354 0.041600287 0.04692187 0.04305403 0.05762126 0.06212474 0.05289008 6 0.719147265 0.481543275 0.20168371 0.19885731 0.27223662 0.587549079 0.66694312 0.76974309 0.45266122 0.23338301 0.09435850 7 0.019041585 0.005380972 0.01856521 0.03947278 0.01221314 0.004858193 0.01322566 0.02001854 0.02755861 0.03889634 0.03102918 8 0.031368415 0.024535386 0.04031225 0.06011198 0.03558484 0.027890723 0.04100022 0.04572906 0.05465957 0.06437218 0.06308497 9 0.290487995 0.109253389 0.09076971 0.11177720 0.08365271 0.074780381 0.07845467 0.08843678 0.12696256 0.15252180 0.16108674 10 0.004599971 0.004843833 0.02327683 0.05022203 0.02867540 0.013674600 0.02376855 0.03408261 0.04563785 0.04991278 0.04216682 11 0.702763718 0.204497547 0.05554607 0.07056242 0.04561622 0.027652748 0.05185238 0.03544719 0.04735368 0.05194280 0.05193089 12 0.087884047 0.068055513 0.07587232 0.09912338 0.09637278 0.085378227 0.09348430 0.09237792 0.10785289 0.22242136 0.28522539 13 0.050134608 0.060945434 0.07203437 0.09687331 0.07316602 0.067771770 0.07634787 0.08154630 0.09157153 0.08930093 0.09904561 14 0.255098748 0.323642069 0.34568802 0.42105224 0.41797424 0.434900416 0.39764147 0.30798058 0.31269146 0.42912436 0.52562571 15 0.015262751 0.027712972 0.03813722 0.07103989 0.05202094 0.040513502 0.04066496 0.23360454 0.34666910 0.62701471 0.61683636 16 0.052436966 0.080045644 0.11447572 0.10672800 0.07924541 0.064626998 0.07234429 0.06744468 0.07878329 0.08901864 0.07953835 17 0.422132751 0.127518376 0.13062324 0.15104667 0.12490013 0.110841862 0.10892834 0.07984952 0.09097741 0.15193027 0.18654107 18 0.662904286 0.247251060 0.20583902 0.32290931 0.47391488 0.574805088 0.64776018 0.73091902 0.27798841 0.35922799 0.36333131 23 24 n_calls 1 0.23100480 0.30027592 0 2 0.07209460 0.06670631 1 3 0.30800154 0.27452357 2 4 0.04148986 0.03842700 0 5 0.05362370 0.05018294 0 6 0.08703911 0.08242964 0 7 0.03186000 0.03233006 0 8 0.05789078 0.05637648 0 9 0.25593446 0.29909342 1 10 0.03615961 0.03356159 0 11 0.05754763 0.06368048 1 12 0.45794999 0.56138753 0 13 0.16676533 0.22718405 0 14 0.63646856 0.29169414 0 15 0.64039251 0.60901138 0 16 0.08805636 0.09688941 0 17 0.36883747 0.41561690 1 18 0.37085132 0.36292634 Any idea how to proceed?
We can use the rowSums based on subsetting the dataset by removing the last column, first column so that dimensions will the same and it compares the adjacent columns rowSums(df[-length(df)] > 0.7 & df[-1] < 0.4)
Extracting data from dataframe using different dataframe without headers (R)
I have a gridded data as a data-frame that has daily temperatures (in K) for 30 years. I need to extract data for days that matches another data-frame and keep the first and second columns (lon and lat). Data example: gridded data from which I need to remove days that do not match days in the second data (df2$Dates) >head(Daily.df) lon lat 1991-05-01 1991-05-02 1991-05-03 1991-05-04 1991-05-05 1991-05-06 1991-05-07 1991-05-08 1991-05-09 1 5.000 60 278.2488 280.1225 280.3909 279.4138 276.6809 276.2085 276.6250 277.7930 276.9693 2 5.125 60 278.2514 280.1049 280.3789 279.4395 276.7141 276.2467 276.6571 277.8264 277.0225 3 5.250 60 278.2529 280.0871 280.3648 279.4634 276.7437 276.2849 276.6918 277.8608 277.0740 4 5.375 60 278.2537 280.0687 280.3488 279.4858 276.7691 276.3238 276.7289 277.8960 277.1232 5 5.500 60 278.2537 280.0493 280.3319 279.5066 276.7909 276.3633 276.7688 277.9313 277.1701 6 5.625 60 278.2539 280.0294 280.3143 279.5264 276.8090 276.4042 276.8111 277.9666 277.2147 1991-05-10 1991-05-11 1991-05-12 1991-05-13 1991-05-14 1991-05-15 1991-05-16 1991-05-17 1991-05-18 1991-05-19 1 276.9616 277.3436 273.3149 274.4931 274.6967 275.6298 272.2511 271.5413 271.7289 271.7964 2 276.9689 277.2988 273.3689 274.5399 274.6801 275.6307 272.2214 271.4445 271.6410 271.7023 3 276.9720 277.2533 273.4225 274.5811 274.6646 275.6241 272.1858 271.3391 271.5424 271.5989 4 276.9716 277.2080 273.4726 274.6146 274.6507 275.6109 272.1456 271.2274 271.4340 271.4872 5 276.9689 277.1632 273.5163 274.6382 274.6380 275.5917 272.1022 271.1121 271.3168 271.3693 6 276.9645 277.1190 273.5507 274.6501 274.6263 275.5672 272.0571 270.9955 271.1919 271.2469 1991-05-20 1991-05-21 1991-05-22 1991-05-23 1991-05-24 1991-05-25 1991-05-26 1991-05-27 1991-05-28 1991-05-29 1 272.2633 268.0039 268.5981 269.4139 267.7836 265.8771 263.5669 266.1666 269.7285 272.5083 2 272.2543 268.0218 268.5847 269.4107 267.7886 265.8743 263.5125 266.1031 269.6471 272.4676 3 272.2434 268.0369 268.5716 269.4089 267.7910 265.8669 263.4592 266.0332 269.5697 272.4217 4 272.2308 268.0507 268.5597 269.4090 267.7925 265.8559 263.4066 265.9581 269.4987 272.3714 5 272.2164 268.0642 268.5505 269.4112 267.7936 265.8425 263.3546 265.8797 269.4355 272.3175 6 272.2005 268.0793 268.5451 269.4154 267.7962 265.8276 263.3039 265.7997 269.3818 272.2614 1991-05-30 1991-05-31 1991-06-01 1991-06-02 1991-06-03 1991-06-04 1991-06-05 1991-06-06 1991-06-07 1991-06-08 1 274.2950 273.4715 274.5197 274.7548 273.8259 272.4433 274.1811 274.4135 274.3999 276.0327 2 274.2205 273.4638 274.5292 274.8316 273.8658 272.4700 274.1992 274.4426 274.4650 276.0698 3 274.1421 273.4549 274.5373 274.9027 273.9028 272.4980 274.2160 274.4781 274.5309 276.1012 4 274.0609 273.4452 274.5438 274.9665 273.9365 272.5273 274.2322 274.5211 274.5969 276.1255 5 273.9784 273.4353 274.5482 275.0216 273.9660 272.5576 274.2481 274.5725 274.6617 276.1417 6 273.8960 273.4253 274.5508 275.0668 273.9912 272.5887 274.2649 274.6334 274.7239 276.1487 1991-06-09 1991-06-10 1991-06-11 1991-06-12 1991-06-13 1991-06-14 1991-06-15 1991-06-16 1991-06-17 1991-06-18 1 276.5216 277.1812 277.8093 278.3013 278.5323 278.5403 277.9563 278.3461 275.8296 273.8277 2 276.5531 277.1925 277.8261 278.3409 278.4956 278.5317 277.9148 278.3234 275.8167 273.8302 3 276.5861 277.2065 277.8457 278.3748 278.4503 278.5181 277.8654 278.2939 275.8057 273.8358 4 276.6204 277.2239 277.8684 278.4029 278.3988 278.4996 277.8080 278.2583 275.7966 273.8427 5 276.6564 277.2466 277.8945 278.4253 278.3423 278.4759 277.7429 278.2171 275.7888 273.8504 6 276.6938 277.2753 277.9242 278.4414 278.2834 278.4472 277.6715 278.1714 275.7819 273.8570 1991-06-19 1991-06-20 1991-06-21 1991-06-22 1991-06-23 1991-06-24 1991-06-25 1991-06-26 1991-06-27 1991-06-28 1 275.1738 274.6805 275.6100 274.8936 273.5818 273.2099 273.1788 271.2747 273.2458 276.9931 2 275.1808 274.7123 275.7043 274.9494 273.5861 273.1770 273.2280 271.2435 273.2662 276.9822 3 275.1859 274.7478 275.7993 275.0009 273.5956 273.1439 273.2730 271.2133 273.2803 276.9678 4 275.1891 274.7879 275.8941 275.0467 273.6107 273.1106 273.3130 271.1840 273.2886 276.9502 5 275.1902 274.8337 275.9870 275.0857 273.6318 273.0777 273.3472 271.1556 273.2918 276.9307 6 275.1891 274.8864 276.0776 275.1168 273.6589 273.0454 273.3752 271.1285 273.2905 276.9101 1991-06-29 1991-06-30 1 272.0784 273.5677 2 272.0577 273.5973 3 272.0339 273.6237 4 272.0075 273.6476 5 271.9794 273.6701 6 271.9500 273.6925 Second data I'm using for extracting (using Dates variable) >head(df2) Dates Temp Wind.S Wind.D 1 5/1/1991 18 4 238 2 5/2/1991 18 8 93 3 5/4/1991 22 8 229 4 5/6/1991 21 4 81 5 5/7/1991 21 8 192 6 5/9/1991 17 8 32 7 5/13/1991 22 8 229 8 5/18/1991 21 4 81 9 6/2/1991 21 8 192 10 6/7/1991 17 8 32 The header of the final data I'm looking for is something like this: >head(df3) lon lat 1991-05-01 1991-05-02 1991-05-04 1991-05-06 1991-05-09 1991-05-13
Example data following the format of yours Daily.df <- data.frame(lon=1:5,lat=1:5,A=1:5,B=1:5,C=1:5,D=1:5) colnames(Daily.df) <- c("lon","lat","1991-05-01","1991-05-02","1991-05-03","1991-05-04") lon lat 1991-05-01 1991-05-02 1991-05-03 1991-05-04 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 df2 <- data.frame(Dates = c("5/1/1991","5/2/1991","5/4/1991")) Dates 1 5/1/1991 2 5/2/1991 3 5/4/1991 Using lubridate to convert df2$Dates into the right format, make a vector of the column names you want to keep (thesedates) including lon and lat. Then use select_at to keep those columns. library(lubridate) library(dplyr) thesedates <- c("lon","lat",as.character(mdy(df2$Dates))) new.df <- Daily.df %>% select_at(vars(thesedates)) Output lon lat 1991-05-01 1991-05-02 1991-05-04 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5
If you want to have a long data set to match, I would think you need to first convert the dates in df2 into the proper format and then wrangle the data into wide format. Step 1 - Convert dates into correct format df2$Dates <- as.Date(df2$Dates, format = "%m/%d/%Y") Step 2 - convert to wide format library(tidyr) spread(df2, Dates, data)
extract rows with common characters in a column by comparing two data.frame
How can compare two data.frame (df1 and df2) and extract the rows with common gene names df1 = logp chr start end CNA Genes No.of.genes 25.714.697 1 90100868 90212160 gain Iqca,Ackr3 2 2.213.423 1 175422136 176019087 loss Rgs7,Fh1,Kmo,Opn3,Chml,Wdr64,Gm25560,Exo1,Gm23805,Pld5,B020018G12Rik 11 5.607.005 2 145619035 147312698 gain Slc24a3,Rin2,Naa20,Crnkl1,4930529M08Rik,Insm1,Ralgapa2,Xrn2,Nkx2-4,Nkx2-2,Gm22261 11 3.756.075 2 141246149 141653989 loss Macrod2 1 4.852.608 2 41586450 41739605 loss Lrp1b 1 590.684 2 86729423 86860061 loss Olfr1089,Olfr1090,Olfr1093,Olfr1093,Olfr141,Olfr1094,Olfr1094,Olfr1095 8 5.721.239 3 25408115 25519319 gain Nlgn1 1 4.295.527 3 92005564 92134972 gain Pglyrp3,Prr9 2 4.257.749 3 15244004 15897870 gain Gm9733,Gm9733,Gm9733,Gm9733,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1c,Sirpb1c,Sirpb1c,Sirpb1c 16 418.259 3 154861710 155490219 loss Tnni3k,Tnni3k,Fpgt,Gm26456,Lrriq3 5 2.284.327 4 134885344 137474898 gain Rhd,Rhd,Tmem50a,D4Wsu53e,Syf2,Runx3,Clic4,Srrm1,Ncmap,Rcan3,Nipal3,Stpg1,Gm25317,Grhl3,Gm23106,Ifnlr1,Il22ra1,Myom3,Srsf10,Pnrc2,Pnrc2,Cnr2,Fuca1,Hmgcl,Gale,Lypla2,Pithd1,Tceb3,Rpl11,Gm26001,Id3,E2f2,Asap3,Tcea3,Zfp46,Hnrnpr,Htr1d,Luzp1,Kdm1a,4930549C01Rik,Lactbl1,Ephb2,C1qb,C1qc,C1qa,Epha8,Zbtb40,Gm23834,Gm23834,Wnt4,Cdc42,Gm13011,Gm13011,Cela3b,Cela3b,Hspg2 56 1.017.899 4 108176679 108417038 gain Echdc2,Zyg11a,Zyg11b,Selrc1,Fam159a,Gpx7 6 2.229.929 4 80406963 83998058 gain Tyrp1,Lurap1l,Mpdz,n-R5s187,Nfib,Zdhhc21,Cer1,Frem1,Ttc39b,Gm23412,Snapc3,Psip1,Ccdc171,Gm25899,Gm25899 15 279.458 4 110534756 110628705 gain Agbl4 1 1.103.167 4 121565222 124833802 gain Ppt1,Cap1,Mfsd2a,Mycl,Trit1,Bmp8b,Bmp8b,Oxct2b,Ppie,Hpcal4,Nt5c1a,Heyl,Pabpc4,Gm25788,Gm22154,Bmp8a,Bmp8a,Oxct2a,Macf1,Ndufs5,Akirin1,Rhbdl2,Mycbp,Rragc,Gm22983,Pou3f1,Utp11l,Gm24480,Fhl3,Sf3a3,Inpp5b,Mtf1,n-R5s192 33 1.781.441 4 139917291 140083763 loss Klhdc7a,Igsf21 2 6.829.744 6 147086557 147179673 gain Mansc4,Klhl42 2 1.070.905 6 63350920 64077379 loss Grid2 1 3.132.886 7 17188025 18205037 gain Psg29,Ceacam5,Ceacam14,Gm5155,Ceacam11,Ceacam13,Ceacam12,Igfl3,Igfl3 9 591.926 7 26773232 26976928 gain Cyp2a5,Cyp2a5,Cyp2a5,Cyp2a22,Cyp2a22,Cyp2a22 6 4.170.656 7 20654493 24128503 gain Nlrp4e,Nlrp5,Gm10175,Zfp180,Zfp112 5 2.494.001 7 38898625 38991306 loss Gm21142,Gm25671 2 13.222.294 7 67330026 67943164 loss Mef2a,Lrrc28,Gm23233,Ttc23,Synm 5 1.330.269 7 7171339 10865583 loss Zfp418,Clcn4-2,Zik1,Nlrp4b 4 3.414.431 8 49942996 51497632 loss Gm23986 1 3.059.542 9 21959210 22072123 gain Epor,Rgl3,Ccdc151,Prkcsh,Elavl3,Zfp653 6 5.277.845 10 80335500 80575991 gain Reep6,Adamtsl5,Plk5,Mex3d,Mbd3,Uqcr11,Uqcr11,Tcf3,Gm25044,Gm25044,Gm25044,Gm25044,Onecut3,Atp8b3,Rexo1,Klf16 16 26.812.338 10 100597718 100692256 loss 1700017N19Rik 1 6.998.267 11 60393963 60504695 gain Lrrc48,Atpaf2,Gid4,Drg2,Myo15 5 2.624.723 11 75676344 76212635 gain Crk,Ywhae,Doc2b,Rph3al,1700016K19Rik,Fam101b,Vps53,Glod4,Fam57a,Gemin4 10 11.851.916 11 97742687 97853778 gain Pip4k2b,Cwc25,1700001P01Rik,Rpl23,SNORA21,Snora21,Lasp1 7 3.553.325 11 74899198 75121318 loss Tsr1,Srr,Smg6,Gm22733 4 309.751 11 105624215 107309569 loss Tanc2,Cyb561,Ace,Ace,Kcnh6,Dcaf7,Taco1,Map3k3,Limd2,Strada,Ccdc47,Ddx42,Ftsj3,Psmc5,Gm23645,Smarcd2,Tcam1,Gh,Gh,Gh,Gh,Gh,Cd79b,Scn4a,2310007L24Rik,Icam2,Ern1,Snord104,Gm22711,Tex2,Milr1,Gm25889,Polg2,Ddx5,Cep95,Smurf2,Bptf,Nol11,Pitpnc1 39 2.642.471 11 30118384 30155192 loss Sptbn1 1 10.304.184 12 114641806 116183315 gain Ighv1-73,Ighv1-83,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Vipr2 11 1.414.343 12 116239354 117192837 loss Wdr60,Esyt2,Ncapg2,Gm25112,Gm24354,Ptprn2 6 2.875.469 14 10676764 10768859 loss Fhit 1 7.743.121 14 52237972 52331429 loss Rab2b,Gm23758,Tox4,Mettl3,Sall2 5 2.689.596 14 43932587 45325020 loss Ang5,Ang6,Ear2,Ear2,Ptgdr,Ptger2,Txndc16,Gpr137c,Ero1l 9 1.912.962 14 119385279 119496386 loss Hs6st3 1 950.029 14 118589508 118681878 loss Abcc4 1 4.105.345 14 3004822 8437757 loss Flnb,Dnase1l3,Abhd6,Rpp14,Rpp14,Pxk,Pdhb,Kctd6,Acox2,Fam107a,Oit1,4930452B06Rik 12 1.870.555 16 33446020 33668062 loss Zfp148,Slc12a8 2 3.148.258 17 5087550 8333690 gain Arid1b,Tmem242,Zdhhc14,Snx9,Synj2,Serac1,Gtf2h5,Tulp4,n-R5s26,Tmem181a,Dynlt1a,Dynlt1b,Tmem181b-ps,Tmem181b-ps,Dynlt1c,Tmem181c-ps,Dynlt1f,Sytl3,Ezr,Rsph3b,Tagap1,Rnaset2b,Rnaset2b,Gm25119,Rps6ka2,Ttll2,Gm9992,Gm26057,Fndc1,Tagap,Rsph3a,Gm22416,Rnaset2a,Rnaset2a,Fgfr1op,Ccr6,Mpc1,Sft2d1 38 50.819.398 17 40052632 40331607 gain Gm7148,Pgk2,Crisp3,Crisp1 4 4.099.936 17 14074943 15508274 loss Dact2,Smoc2,Thbs2,Gm23352,Wdr27,1600012H06Rik,Phf10,Gm3417,9030025P20Rik,Gm3448,Gm3435,Tcte3,Ermard,Dll1,Fam120b,Psmb1,Tbp 17 12.022.555 17 30590875 31053645 loss Glo1,Dnah8,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Glp1r,Umodl1 12 5.135.466 17 36160573 36277761 loss Gm22453,Rpp21,Trim39 3 4.254.769 17 27372278 27593833 loss Grm4,Hmga1,Nudt3 3 5.565.997 18 87905985 87999255 loss Gm24987,Gm24987 2 df2 = Recursive_level logp chr start end CNA Genes No.of.Gene 1 1.416.541 1 68580000 68640000 loss Erbb4 1 1 7.876.897 1 173840000 174010000 loss Mndal,Mnda,Ifi203,Ifi202b 4 1 6.280.751 1 173500000 173660000 loss BC094916,Pydc4,Pyhin1 3 1 7.369.317 1 115900000 116280000 loss Cntnap5a 1 2 128.766 2 146170000 146660000 gain 4930529M08Rik,Insm1,Ralgapa2 3 1 5.777.222 2 76720000 76800000 loss Ttn 1 2 1.448.913 3 15360000 16000000 loss Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1c,Sirpb1c,Sirpb1c,Sirpb1c 12 1 3.845.977 4 119500000 125160000 gain AA415398,AA415398,AA415398,AA415398,AA415398,Foxj3,Guca2a,Guca2b,Hivep3,Edn2,Foxo6,Scmh1,Slfnl1,Ctps,Cited4,Kcnq4,Nfyc,Mir30c-1,Mir30e,Rims3,Exo5,Zfp69,Smap2,Col9a2,Zmpste24,Tmco2,Rlf,Ppt1,Cap1,Mfsd2a,Mycl,Trit1,Bmp8b,Bmp8b,Oxct2b,Ppie,Hpcal4,Nt5c1a,Heyl,Pabpc4,Bmp8a,Bmp8a,Oxct2a,Macf1,Ndufs5,Akirin1,Rhbdl2,Mycbp,Rragc,Pou3f1,Utp11l,Fhl3,Sf3a3,Inpp5b,Mtf1,n-R5s192,1110065P20Rik,Yrdc,Maneal,Cdca8,Rspo1,Gnl2,Dnali1,Snip1,Meaf6,Zc3h12a 66 1 1.446.699 4 73900000 74180000 gain Frmd3 1 1 2.262.305 4 72740000 72880000 gain Aldoart1 1 1 1.234.215 4 80820000 84340000 gain Tyrp1,Lurap1l,Mpdz,n-R5s187,Nfib,Zdhhc21,Cer1,Frem1,Ttc39b,Snapc3,Psip1,Ccdc171,Bnc2 13 1 123.671 4 108480000 108760000 gain Zcchc11,Prpf38a,Orc1,Cc2d1b,Zfyve9 5 1 1.418.261 4 139400000 147600000 loss Ubr4,Iffo2,Aldh4a1,Tas1r2,Pax7,Klhdc7a,Igsf21,Arhgef10l,Rcc2,Padi4,Padi3,Padi1,Padi2,Sdhb,Atp13a2,Mfap2,Crocc,Necap2,Spata21,Szrd1,Fbxo42,Rsg1,Arhgef19,Epha2,Fam131c,Clcnka,Clcnka,Clcnkb,Clcnkb,Hspb7,Zbtb17,Spen,Fblim1,Tmem82,Slc25a34,Plekhm2,Ddi2,Rsc1a1,Agmat,Dnajc16,Casp9,Cela2a,Cela2a,Ctrc,Efhd2,Fhad1,Tmem51,Kazn,Prdm2,Pdpn,Lrrc38,1700012P22Rik,Aadacl3,9430007A20Rik,Dhrs3,Vps13d,Tnfrsf1b,Tnfrsf8,Zfp600,Zfp600,Rex2 61 1 8.113.817 6 129740000 129800000 gain Klri2 1 1 15.569.108 6 41360000 41480000 loss Prss3,Prss3,Prss1,Prss1 4 1 2.037.683 6 63480000 63700000 loss Grid2 1 2 14.694 7 38260000 38280000 gain Pop4 1 1 14.946 7 35780000 38280000 gain Zfp507,Tshz3,Zfp536,Uri1,Ccne1,1600014C10Rik,Plekhf1,Pop4 8 1 7.192.011 7 47500000 47620000 loss Mrgpra2b,Mrgpra3 2 1 1.722.108 7 26000000 26200000 loss Cyp2b13,Cyp2b9 2 1 12.683.495 7 11350000 11680000 loss Zscan4f 1 1 1.360.954 10 80900000 81100000 gain Timm13,Lmnb2,Gadd45b,Gng7,Diras1,Slc39a3,Sgta,Thop1,Creb3l3 9 1 267.959 11 97880000 98000000 gain Fbxo47,Plxdc1,Arl5c 3 1 1.872.174 11 75860000 76420000 gain Rph3al,1700016K19Rik,Fam101b,Vps53,Glod4,Fam57a,Gemin4,Rnmtl1,Nxn,Timm22,Abr 11 1 2.811.352 12 113560000 114920000 gain Ighv14-3,Ighv13-1,Ighv13-1,Ighv13-1,Ighv13-1,Ighv13-1,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-5,Ighv6-5,Ighv6-5,Ighv6-5,Ighv6-5 16 1 1.979.667 12 115860000 115980000 loss Ighv1-83 1 1 2.098.521 12 17420000 21160000 loss Nol10,Odc1,Hpcal1,5730507C01Rik,Asap2 5 1 21.864.853 13 12580000 12650000 loss Ero1lb 1 1 3.233.185 13 61500000 62780000 loss Ctsm,Cts3,Zfp808 3 1 5.640.895 14 53540000 53780000 gain Trav12-2,Trav12-3,Trav13-2,Trav14-2,Trav15-2-dv6-2,Trav3-3,Trav9-4,Trav9-4,Trav9-4,Trav9-4,Trav4-4-dv10,Trav5-4,Trav6-7-dv9,Trav7-6,Trav7-6,Trav7-6,Trav16,Trav13-4-dv7,Trav14-3,Trav3-4 20 1 2.942.081 14 86300000 97240000 gain Diap3,Tdrd3,Rps3a2,Pcdh20,Pcdh9,Klhl1 6 1 4.662.806 14 9840000 9880000 loss Fhit 1 1 3.638.346 14 43740000 44640000 loss Ear1,Ear1,Ear10,Ear10,Ang5,Ang6,Ear2,Ear2 8 1 1.709.546 14 35320000 37400000 loss Grid1,n-R5s46,Ccser2,Rgr,Lrit1,Lrit2,Cdhr1,2610528A11Rik,Ghitm 9 2 3.387.282 14 84060000 85740000 loss Pcdh17 1 1 2.140.909 14 68280000 86300000 loss Adam7,Adamdec1,Adam28,Stc1,Nkx2-6,Nkx3-1,Slc25a37,Synb,Entpd4,SYNB,Loxl2,R3hcc1,Chmp7,Tnfrsf10b,Tnfrsf10b,Tnfrsf10b,Tnfrsf10b,Rhobtb2,Pebp4,Egr3,Bin3,Ccar2,9930012K11Rik,9930012K11Rik,Pdlim2,Sorbs3,Ppp3cc,Slc39a14,Piwil2,Polr3d,Mir320,Phyhip,Bmp1,Sftpc,Lgi3,Reep4,Hr,Nudt18,Fam160b2,Dmtn,Fgf17,Npm2,Xpo7,Dok2,Gfra2,Fndc3a,Cysltr2,Rcbtb2,Rb1,Lpar6,Itm2b,Med4,Nudt15,Sucla2,Htr2a,Esd,Lrch1,5031414D18Rik,Lrrc63,Lcp1,Cpb2,Zc3h13,Siah3,Spert,Cog3,Slc25a30,Tpt1,Snora31,Gtf2f2,Kctd4,Gpalpp1,Nufip1,Rps2-ps6,Tsc22d1,Serp2,Lacc1,Ccdc122,Enox1,n-R5s48,Dnajc15,Epsti1,Fam216b,Tnfsf11,Akap11,Dgkh,Vwa8,Zfp957,Rgcc,Naa16,Mtrf1,Kbtbd7,Zbtbd6,Wbp4,Elf1,Sugt1,Lect1,Pcdh8,Olfm4,Pcdh17 99 1 3.810.267 14 109680000 111240000 loss n-R5s50,Slitrk6 2 1 3.924.724 15 77460000 77560000 loss Apol10a,Apol10a,Apol10a,Apol10a,Apol11a,Apol11a,Apol11a,Apol11a,Apol7c 9 1 7.728.161 16 44780000 44920000 gain Cd200r1,Cd200r1,Cd200r4,Cd200r4,Cd200r2,Cd200r2 6 1 348.511 17 73500000 76640000 gain Galnt14,Ehd3,Xdh,Memo1,Dpy30,Spast,Slc30a6,Nlrc4,Yipf4,Birc6,Ttc27,Ltbp1,Rasgrp3,Fam98a 14 1 1.052.043 17 36120000 36540000 gain Rpp21,Trim39 2 1 1.325.386 17 90420000 90540000 loss Nrxn1 1 1 4.438.061 17 38300000 38360000 loss Olfr137,Olfr137 2 1 125.062 17 30380000 30920000 loss Btbd9,Glo1,Dnah8,Glp1r 4 1 2.998.359 19 13860000 13900000 gain Olfr1502 1 2 3.307.524 19 30910000 30970000 loss Prkg1 1 When i tried df2[mapply(function(x, y) length(intersect(x,y))>0, strsplit(df1$Gene, ','), strsplit(df2$Gene, ',')),] i got out logp chr start end CNA Genes No.of.genes 39 2.689.596 14 43932587 45325020 loss Ang5,Ang6,Ear2,Ear2,Ptgdr,Ptger2,Txndc16,Gpr137c,Ero1l 9 But i can find many rows with at least one common Gene
We could split up the "Genes" column in each datasets with strsplit, then compare the corresponding list elements with mapply, check if there is any intersect and use that index to subset the "df2" df2[mapply(function(x,y) any(x %in% y), strsplit(df1$Gene, ','), strsplit(df2$Gene, ",")),] # chr start end Gene #1 1179 3360 gain Recl,Bcl,Trim3,Pop4 #3 7180 9229 loss Sox1 #4 8159 8360 loss Sox1 #5 9154 10588 loss Pekg Or use intersect and length df2[mapply(function(x, y) length(intersect(x,y))>0, strsplit(df1$Gene, ','), strsplit(df2$Gene, ',')),] Update If we need to find whether a single "Gene" of first dataset is found in any of the rows of second data (using the updated dataset) df2[sapply(strsplit(df2$Gene, ','), function(x) any(sapply(strsplit(df1$Gene, ','), function(y) any(x %in% y)))),]
How to prepare my data fo a factorial repeated measures analysis?
Currently, my dataframe is in wide-format and I want to do a factorial repeated measures analysis with two between subject factors (sex & org) and a within subject factor (tasktype). Below I've illustrated how my data looks with a sample (the actual dataset has a lot more variables). The variable starting with '1_' and '2_' belong to measurements during task 1 and task 2 respectively. this means that 1_FD_H_org and 2_FD_H_org are the same measurements but for tasks 1 and 2 respectively. id sex org task1 task2 1_FD_H_org 1_FD_H_text 2_FD_H_org 2_FD_H_text 1_apv 2_apv 2 F T Correct 2 69.97 68.9 116.12 296.02 10 27 6 M T Correct 2 53.08 107.91 73.73 333.15 16 21 7 M T Correct 2 13.82 30.9 31.8 78.07 4 9 8 M T Correct 2 42.96 50.01 88.81 302.07 4 24 9 F H Correct 3 60.35 102.9 39.81 96.6 15 10 10 F T Incorrect 3 78.61 80.42 55.16 117.57 20 17 I want to analyze whether there is a difference between the two tasks on e.g. FD_H_org for the different groups/conditions (sex & org). How do I reshape my data so I can analyze it with a model like this? ezANOVA(data=df, dv=.(FD_H_org), wid=.(id), between=.(sex, org), within=.(task)) I think that the correct format of my data should like this: id sex org task outcome FD_H_org FD_H_text apv 2 F T 1 Correct 69.97 68.9 10 2 F T 2 2 116.12 296.02 27 6 M T 1 Correct 53.08 107.91 16 6 M T 2 2 73.73 333.15 21 But I'm not sure. I tryed to achieve this wih the reshape2 package but couldn't figure out how to do it. Anybody who can help?
I think probably you need to rebuild it by binding the 2 subsets of columns together with rbind(). The only issue here was that your outcomes implied difference data types, so forced them both to text: require(plyr) dt<-read.table(file="dt.txt",header=TRUE,sep=" ") # this was to bring in your data newtab=rbind( ddply(dt,.(id,sex,org),summarize, task=1, outcome=as.character(task1), FD_H_org=X1_FD_H_org, FD_H_text=X1_FD_H_text, apv=X1_apv), ddply(dt,.(id,sex,org),summarize, task=2, outcome=as.character(task2), FD_H_org=X2_FD_H_org, FD_H_text=X2_FD_H_text, apv=X2_apv) ) newtab[order(newtab$id),] id sex org task outcome FD_H_org FD_H_text apv 1 2 F T 1 Correct 69.97 68.90 10 7 2 F T 2 2 116.12 296.02 27 2 6 M T 1 Correct 53.08 107.91 16 8 6 M T 2 2 73.73 333.15 21 3 7 M T 1 Correct 13.82 30.90 4 9 7 M T 2 2 31.80 78.07 9 4 8 M T 1 Correct 42.96 50.01 4 10 8 M T 2 2 88.81 302.07 24 5 9 F H 1 Correct 60.35 102.90 15 11 9 F H 2 3 39.81 96.60 10 6 10 F T 1 Incorrect 78.61 80.42 20 12 10 F T 2 3 55.16 117.57 17 EDIT - obviously you don't need plyr for this (and it may slow it down) unless you're doing further transformations. This is the code with no non-standard dependencies: newcolnames<-c("id","sex","org","task","outcome","FD_H_org","FD_H_text","apv") t1<-dt[,c(1,2,3,3,4,6,8,10)] t1$org.1<-1 colnames(t1)<-newcolnames t2<-dt[,c(1,2,3,3,5,7,9,11)] t2$org.1<-2 t2$task2<-as.character(t2$task2) colnames(t2)<-newcolnames newt<-rbind(t1,t2) newt[order(newt$id),]