cummeRbund csHeatmap column user-defined order - r

I am using the R package cummeRbund (from Bioconductor) to visualize RNA-seq data, I created a cuffGeneSet instance called "DEG_genes" that contains 662 genes that are significantly differentially expressed between males and females. My goal is to create a heatmap using csHeatmap() in which the male and female samples (replicates) are separated but with a specific user-defined order within the sex category.
I used:
> DEG<-diffData(genes(cuff)) # take differentially expressed genes
> DEG_significant<-subset(DEG,significant=='yes') # retain only significant changes
> DEG_sign_IDs <- DEG_significant$gene_id # retrieve IDs
> DEG_genes<-getGenes(cuff,DEG_sign_IDs) # get CuffGeneSet instance
> hmap<-csHeatmap(DEG_genes,clustering='none',labRow=F,replicates=T)
This gives me ALMOST what I want: the heatmap shows Females on the left and Males on the right but they are alphabetically ordered (Female_0,Female_1,Female_10,Female_11,Female_12...Female_19,Female_2,Female_20,Female_21..,Female_29 on the left and similarly for males Male_0,Male_1,Male_10...Male_19,Male_2,Male_20...etc on the right) and I want them to be in a specific order (clusterReps). I created a test vector with replicate names on a specific order (Males on the left with 0 and 6 echanged and females on the right) as follow:
clusterReps<-c("Male_6","Male_1","Male_2","Male_3","Male_4","Male_5","Male_0","Male_7","Male_8","Male_9","Male_10","Male_11","Male_12","Male_13","Male_14","Male_15","Male_16","Male_17","Male_18","Male_19","Male_20","Male_21","Male_22","Male_23","Male_24","Male_25","Male_26","Male_27","Male_28","Male_29","Male_30","Male_31","Male_32","Male_33","Female_0","Female_1","Female_2","Female_3","Female_4","Female_5","Female_6","Female_7","Female_8","Female_9","Female_10","Female_11","Female_12","Female_13","Female_14","Female_15","Female_16","Female_17","Female_18","Female_19","Female_20","Female_21","Female_22","Female_23","Female_24","Female_25","Female_26","Female_27","Female_28")
I would like the data to be exactly the same except the order of the columns that must follow the order of the "clusterReps" vector. Knowing that the heatmap is a ggplot, I looked everywhere for a solution the last 2 days but with no success (despite a closely ressembling problem with heatmap.2() instead of csHeatmap() on stackoverflow, I tried to get a replicate fpkm matrix and use heatmap.2 but could only use heatmap_2 and some options were not accepted).
Using:
> hmap<-hmap+scale_x_discrete(limits=clusterReps)
Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
only changes the x-axis labels but not the actual data (the heatmap remains identical).
Is there a similar function that rearranges the columns and not just labels?
Thanks in advance for your help, I'm not familiar with handling ggplot objects, and in particular heatmaps from cummeRbund.
EDIT:
Here is what I can give as further information:
> DEG_genes
CuffGeneSet instance for 662 genes
Slots:
annotation
fpkm
repFpkm
diff
count
isoforms CuffFeatureSet instance of size 930
TSS CuffFeatureSet instance of size 785
CDS CuffFeatureSet instance of size 230
promoters CuffFeatureSet instance of size 662
splicing CuffFeatureSet instance of size 785
relCDS CuffFeatureSet instance of size 662
> summary(DEG_genes)
Length Class Mode
662 CuffGeneSet S4
I am afraid I can't give more information for the moment, please let me know if you want me to execute a command and report the output if it can help.

I am not very fluent in R, but I was having the same problem. To solve it I made a script that renames all my sample names in all the files inside the cuffdiff folder to something that will give the right order when sorted alphabetically, and then rebuild the database.

Related

Memory management in R ComplexUpset Package

I'm trying to plot an stacked barplot inside an upset-plot using the ComplexUpset package. The plot I'd like to get looks something like this (where mpaa would be component in my example):
I have a dataframe of size 57244 by 21, where one column is ID and the other is type of recording, and other 19 columns are components from 1 to 19:
ID component1 component2 ... component19 type
1 1 0 1 a
2 0 0 1 b
3 1 1 0 b
Ones and zeros indicate affiliation with a certain component. As shown in the example in the docs, I first convert these ones and zeros to logical, and then try to plot the basic upset plot. Here's the code:
df <- df %>% mutate(across(where(is.numeric), as.logical))
components <- colnames(df)[2:20]
upset(df, components, name='protein', width_ratio = 0.1)
But unfortunately after thinking for a while when processing the last line it spits out an error message like this:
Error: cannot allocate vector of size 176.2 Mb
Though I know I'm using the 32Gb RAM architecture, I'm sure I couldn't have flooded the memory so much that 167 Mb can't be allocated, so my guess is I am managing memory in R somehow wrong. Could you please explein what's faulty in my code, if possible.
I also know that UpsetR package plots the same data, but as far as i know it provides no way for the stacked barplotting.
Somehow, it works if you:
Tweak the min_size parameter so that the plot is not overloaded and makes a better impression
Making the first argument of ComplexUpset a sample with some data also helps, even if your sample is the whole dataset.

Issue with Boxplot formula or variable definition

I have a csv file having 4 columns labeled AGE, DIASTOLIC, BMI and EVER.PREGNANT and 700 rows. The last column consists of only yes or no. I wish to plot the data BMI vs EVER.PREGNANT with an intent to comparing BMI of those with yes in the fourth column and no in the same column. What code should I write to get the required boxplot?
I have tried the following code:
Sheet=read.csv(/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv, sep=",")
boxplot(BMI~EVER.PREGNANT,data=sheet, main="BMI vs PREG",xlab="BMI",ylab="PREGNANT")
The error that I get is
Error in eval(expr,envr,enclos): object 'Sheet' not found
Similarly, what modifications can be done to plot AGE vs DIASTOLIC, where both columns are numbers? Will I get the 700 odd values nicely?
I answer here because it tells me not to extend the discussion :-).
I think you haven't loaded correctly your data set. You need to add header = T when loading to tell the program that your first row corresponds with the names of the variables.
Sheet=read.csv("/Downloads/1739230_1284354330_PIMA.csv", sep=",", header = T)

Questions associated with "Error: Aesthetics must be either length 1 or the same as the data"

I understand the subject "Error: Aesthetics must be either length 1 or the same as the data" has been done a lot (plenty of reading available online), however, I still have some unresolved questions
I am working with a dataset regarding all calls made to the Seattle Police Department in 2015. After I am done cleaning the data into an acceptable format I wind up with a dataset that is 62,092 rows and 13 columns (dataset name is SPD_2015). I would add a portion of the dataset to this question but I'm not entirely sure how to do it in a clean and legible format.
I used package lubridate to extract the times associated with my data set. I then created a bar graph that showed what time the crimes occur
ggplot(SPD_2015, aes(hour(date.reported.time))) +
geom_bar(width = 0.7)
and that works perfectly.
Since Car Prowls were the most frequently reported crime, I wanted to graph what time these car prowls occurred. And this is when I come across the error ""Error: Aesthetics must be either length 1 or the same as the data".
I read that ggplot2 does not like it when you subset within the ggplot code, so I subsetted my data by creating a separate data frame.
car.prowl <- filter(SPD_2015, summarized.offense.description == "CAR PROWL")
So here is my question. Why is it that when I look at the dimensions of my newly created dataset "car.prowl" I see that it has a dimension of 11,539 rows and 13 columns. But when I examine the length of the hours in the occurred.time column (the time that the crime occurred) I get a length of 62,092 which is the length of the original dataset?
In my mind I am picturing that the following code would work:
ggplot(car.prowl, aes(hour(occured.time))) +
geom_bar()
The length of the car.prowl$occured.time is correct:
> length(car.prowl$occured.time)
[1] 11539
but when I apply the hour function I get the length of the original dataset:
> length(hour(car.prowl$occured.time))
[1] 62092
when it should be 11,539.
Thank you. Please let me know what I can do to make my question more clear.
It could be a caching issue as Jeremy said above. I'm not sure this would work, but you could try the below, chaining things together.
SPD_2015%>%
filter(summarized.offense.description == "CAR PROWL")%>%
ggplot(aes(hour(occured.time)))+
geom_bar()

Dropping small sample size from a factor plot

I'm new to R, so forgive my ignorance. I'm playing around with a dataset that reflects the mileage my car has achieved since I got it. Here's the data formatted .csv. (Note: I have this data in excel and when I saved it as .txt space delimited there was an issue where one line kept throwing an error on read.table saying there weren't the right number of columns...so I switched to .csv and it worked fine)
Date,Miles,Gallons,Price.per.Gallon,Total.Cost,Grade,MPG,Price.per.Mile,Cumulative.Miles,Cumulative.Gallons,Cumulative.Cost,Cumulative.MPG,Cumulative.Price.per.Mile,Gas.Source,Car.Said,Delta,Average.Price.of.Gas,Avg.Temp
6/8/2011,391.8,9.751,3.749,36.556499,Regular,40.18049431,0.093303979,570.4,9.751,36.56,40.18049431,0.064095372,Dealer,41.18,1,3.74935904,82.8
6/22/2011,441.2,9.566,3.359,32.132194,Regular,46.12168095,0.072829089,1011.6,19.317,68.692194,43.14334524,0.067904502,Speedway,47,0.878319047,3.556048765,73.2
7/7/2011,460.6,9.594,3.599,34.528806,Regular,48.0091724,0.074964842,1472.2,28.911,103.221,44.75805057,0.070113436,BP,49.4,1.390827601,3.570301961,79.5
7/18/2011,397.4,8.178,3.319,27.142782,Regular,48.59378821,0.068300911,1869.6,37.089,130.363782,45.60381784,0.069728168,Shell,45.7,2.893788212,3.514890722,83.1
7/26/2011,368.7,8.959,3.359,30.093281,Regular,41.15414667,0.081619965,2238.3,46.048,160.457063,44.73809937,0.071687023,Kroger,42.9,1.745853332,3.484560958,79.1
8/8/2011,436.3,9.845,3.559,35.038355,Regular,44.31691214,0.080307942,2674.6,55.893,195.495418,44.6639114,0.073093329,Kroger,48,3.683087862,3.49767266,76
8/9/2011,262.2,4.986,3.479,17.346294,Regular,52.58724428,0.066156728,2936.8,60.879,212.841712,45.31283365,0.072474023,Shell,46.9,5.687244284,3.496143366,74.5
8/13/2011,250.1,5.887,3.369,19.833303,Regular,42.48343808,0.079301491,3186.9,66.766,232.675015,45.0633556,0.073009826,mobil,45.5,3.016561916,3.484932675,74.1
8/14/2011,424.4,8.699,3.759,32.699541,Regular,48.78721692,0.077048871,3611.3,75.465,265.374556,45.49261247,0.073484495,Speedway,49,0.212783079,3.516524959,68
8/18/2011,437,9.594,3.399,32.610006,regular,45.54930165,0.074622439,4048.3,85.059,297.984562,45.49900657,0.073607332,Speedway,47.6,2.050698353,3.503269049,77.1
8/30/2011,407.3,9.244,3.429,31.697676,Regular,44.06101255,0.077823904,4455.6,94.303,329.682238,45.35804799,0.073992782,Shell,48.6,4.538987451,3.495988866,66.6
9/10/2011,347.3,7.992,3.549,28.363608,Regular,43.45595596,0.081668897,4802.9,102.295,358.045846,45.20944328,0.074547845,Meijer,49.6,6.144044044,3.500130466,65
9/21/2011,375,8.874,3.369,29.896506,Regular,42.25828262,0.079724016,5177.9,111.169,387.942352,44.97386861,0.07492272,Meijer,44.9,2.641717377,3.489663054,67.5
10/5/2011,404.8,9.243,3.079,28.459197,Regular,43.79530455,0.07030434,5582.7,120.412,416.401549,44.88340033,0.074587843,UDF,45.4,1.604695445,3.458139961,61.5
10/14/2011,376.5,8.715,3.249,28.315035,Regular,43.20137694,0.075205936,5959.2,129.127,444.716584,44.76987772,0.074626894,UDF,46.4,3.198623064,3.444024751,56.4
10/23/2011,382.8,8.953,3.199,28.640647,Regular,42.75661789,0.074818827,6342,138.08,473.357231,44.63933951,0.074638479,Speedway,43.8,1.043382107,3.428137536,50.3
10/31/2011,403.4,9.517,3.299,31.396583,Regular,42.38730692,0.077829903,6745.4,147.597,504.753814,44.49412928,0.074829338,Kroger,45.7,3.312693076,3.419810796,47.5
11/15/2011,402.8,9.146,3.249,29.715354,Regular,44.04111087,0.073771981,7148.2,156.743,534.469168,44.46769553,0.074769756,UDF,45.1,1.058889132,3.409843936,54.4
11/29/2011,361.1,9.209,3.149,28.999141,Regular,39.21164079,0.080307785,7509.3,165.952,563.468309,44.1760268,0.075036063,BP,41.7,2.488359214,3.395369197,42.8
12/10/2011,354.2,9.23,3.199,29.52677,Regular,38.37486457,0.083361858,7863.5,175.182,592.995079,43.87037481,0.075411087,Shell,40.3,1.925135428,3.385022885,22.8
12/19/2011,357.4,8.957,2.999,26.862043,Regular,39.90175282,0.075159605,8220.9,184.139,619.857122,43.67733071,0.075400154,UDF,41.3,1.398247181,3.366245727,41.8
1/5/2012,322.6,8.549,3.459,29.570991,Regular,37.73540765,0.091664572,8543.5,192.688,649.428113,43.41370506,0.076014293,Speedway,41,3.26459235,3.370360962,32.3
1/14/2012,370,9.148,3.319,30.362212,Regular,40.44599913,0.082060032,8913.5,201.836,679.790325,43.27919697,0.076265252,Shell,42,1.554000875,3.368033081,17.9
1/28/2012,327.3,9.108,3.329,30.320532,Regular,35.93544137,0.09263835,9240.8,210.944,710.110857,42.96211317,0.076845171,BP,37.5,1.56455863,3.366347737,32.1
2/9/2012,307,7.971,3.399,27.093429,Regular,38.51461548,0.088252212,9547.8,218.915,737.204286,42.80017358,0.077211953,Shell,41.1,2.585384519,3.367536651,28.8
2/16/2012,370.5,10.057,3.229,32.474053,Regular,36.84001193,0.087649266,9918.3,228.972,769.678339,42.53838897,0.077601841,Speedway,42.2,5.359988068,3.361451789,44
2/29/2012,406.3,9.518,3.759,35.778162,Regular,42.6875394,0.088058484,10324.6,238.49,805.456501,42.54434148,0.078013337,Shell,42.9,0.212460601,3.377317711,54.1
3/14/2012,370.6,9.812,3.699,36.294588,Regular,37.77007746,0.097934668,10695.2,248.302,841.751089,42.35567978,0.078703632,UDF,40.5,2.729922544,3.390029436,63.6
3/23/2012,357.6,7.999,3.929,31.428071,Regular,44.7055882,0.087886105,11052.8,256.301,873.17916,42.429019,0.07900072,Shell,43.1,1.605588199,3.406850383,66
4/3/2012,252.5,4.57,3.849,17.58993,Regular,55.25164114,0.069663089,11305.3,260.871,890.76909,42.65364874,0.078792167,Meijer,41.9,13.35164114,3.414596065,58.6
4/13/2012,382.3,9.416,3.629,34.170664,Regular,40.6011045,0.089381805,11687.6,270.287,924.939754,42.58214417,0.079138553,Shell,44.3,3.698895497,3.422065264,51.2
4/24/2012,393.7,9.018,3.659,32.996862,Regular,43.65713018,0.083812197,12081.3,279.305,957.936616,42.61685254,0.079290856,UDF,43.3,0.357130184,3.429715243,49.2
5/7/2012,354.7,9.203,3.729,34.317987,Regular,38.54177985,0.096752148,12436,288.508,992.254603,42.48686345,0.079788887,Speedway,40.6,2.058220146,3.439262007,70.3
5/18/2012,378,9.505,3.699,35.158995,Regular,39.76854287,0.093013214,12814,298.013,1027.413598,42.40016375,0.080178992,Speedway,42.2,2.431457128,3.447546241,62.2
6/1/2012,381.5,9.781,3.699,36.179919,Regular,39.0041918,0.094835961,13195.5,307.794,1063.593517,42.29224741,0.080602745,Sunoco,41,1.9958082,3.455536875,61.4
6/12/2012,386.8,8.976,3.649,32.753424,Regular,43.09269162,0.084677932,13582.3,316.77,1096.346941,42.31492881,0.080718799,Meijer,44.1,1.007308378,3.46101885,75.5
6/23/2012,379.9,9.168,3.339,30.611952,Regular,41.43760908,0.080578973,13962.2,325.938,1126.958893,42.29025152,0.080714994,Kroger,41.8,0.362390925,3.457586697,74.4
7/8/2012,321.9,8.285,3.549,29.403465,Regular,38.85334943,0.091343476,14284.1,334.223,1156.362358,42.20505471,0.080954513,Shell,40.9,2.046650573,3.459852727,84.1
7/21/2012,369.5,8.88,3.479,30.89352,Regular,41.61036036,0.083608985,14653.6,343.103,1187.255878,42.18966316,0.081021447,Meijer,42.6,0.98963964,3.460348286,70.1
7/21/2012,385,7.808,3.499,27.320192,Regular,49.30840164,0.070961538,15038.6,350.911,1214.57607,42.34805976,0.080763906,Speedway,48.5,0.808401639,3.461208312,70.1
7/26/2012,367.1,9.644,3.479,33.551476,Regular,38.06511821,0.091396012,15405.7,360.555,1248.127546,42.23350113,0.081017256,BP,44.2,6.134881792,3.461684198,82.5
8/12/2012,376.6,9.287,3.769,35.002703,Regular,40.55130828,0.09294398,15782.3,369.842,1283.130249,42.19126005,0.081301854,BP,42.3,1.74869172,3.46940112,66.4
8/24/2012,414.9,9.22,3.859,35.57998,Regular,45,0.085755556,16197.2,379.062,1318.710229,42.25957759,0.081415938,Speedway,44.6,0.4,3.478877411,76.5
9/9/2012,373.3,8.984,3.799,34.130216,Regular,41.55164737,0.091428385,16570.5,388.046,1352.840445,42.24318766,0.081641498,Speedway,42.8,1.248352627,3.486288855,62.1
9/19/2012,408.1,9.123,3.799,34.658277,Regular,44.73309218,0.084925942,16978.6,397.169,1387.498722,42.30038095,0.081720443,BP,46.5,1.766907815,3.493471852,53.5
9/28/2012,408.3,9.281,3.659,33.959179,Regular,43.99310419,0.083172126,17386.9,406.45,1421.457901,42.33903309,0.081754534,BP,45.6,1.606895809,3.497251571,59
10/7/2012,393.1,8.942,3.699,33.076458,Regular,43.96108253,0.084142605,17780,415.392,1454.534359,42.37395039,0.081807332,Speedway,46.3,2.338917468,3.50159454,45
10/15/2012,402.9,9.075,3.549,32.207175,Regular,44.39669421,0.079938384,18182.9,424.467,1486.741534,42.41719615,0.081765919,Speedway,46.1,1.703305785,3.502608057,54.6
10/24/2012,365.7,8.264,3.299,27.262936,Regular,44.25217812,0.074550003,18548.6,432.731,1514.00447,42.45223938,0.081623652,Speedway,46.8,2.547821878,3.49871969,68.5
11/4/2012,363.3,9.561,3.259,31.159299,Regular,37.99811735,0.085767407,18911.9,442.292,1545.163769,42.35595489,0.081703254,Meijer,42,4.001882648,3.493537683,37.3
11/15/2012,391.9,10.224,3.499,35.773776,Regular,38.33137715,0.091282919,19303.8,452.516,1580.937545,42.26502488,0.081897737,Speedway,44.1,5.768622848,3.493661097,33.7
11/24/2012,430.2,9.068,3.579,32.454372,Regular,47.44155271,0.075440195,19734,461.584,1613.391917,42.36671982,0.081756963,BP,44.3,3.141552713,3.495337614,29.5
12/2/2012,394.5,9.146,3.239,29.623894,Regular,43.13361032,0.075092253,20128.5,470.73,1643.015811,42.38162004,0.081626341,Sunoco,45.8,2.666389679,3.490357128,55.1
12/12/2012,386.1,9.312,3.169,29.509728,Regular,41.46262887,0.076430272,20514.6,480.042,1672.525539,42.36379317,0.081528547,Speedway,43.4,1.937371134,3.484123345,31
12/23/2012,359.8,8.642,3.199,27.645758,Regular,41.63388105,0.076836459,20874.4,488.684,1700.171297,42.35088523,0.081447673,Speedway,42.4,0.766118954,3.479081159,30.7
1/6/2013,336.4,8.878,3.079,27.335362,Regular,37.89141699,0.081258508,21210.8,497.562,1727.506659,42.27131493,0.081444672,Meijer,41,3.108583014,3.47194251,33.2
1/21/2013,350,9.257,3.259,30.168563,Regular,37.80922545,0.086195894,21560.8,506.819,1757.675222,42.1898153,0.0815218,Meijer,40.6,2.790774549,3.468053135,20.8
2/1/2013,335.7,9.058,3.499,31.693942,Regular,37.0611614,0.094411504,21896.5,515.877,1789.369164,42.09976409,0.081719415,Meijer,38.7,1.638838596,3.468596514,12.1
2/13/2013,360.9,9.42,3.759,35.40978,Regular,38.31210191,0.098115212,22257.4,525.297,1824.778944,42.03184103,0.08198527,Speedway,41.4,3.087898089,3.473804236,31
2/26/2013,371.3,9.081,3.899,35.406819,Regular,40.88756745,0.09535906,22628.7,534.378,1860.185763,42.01239572,0.082204712,Meijer,42.2,1.312432551,3.481029838,36.9
3/9/2013,362.6,8.952,3.439,30.785928,Regular,40.5049151,0.084903276,22991.3,543.33,1890.971691,41.98755821,0.082247271,BP,42.7,2.195084897,3.480337347,36.5
3/21/2013,375.3,8.991,3.859,34.696269,Regular,41.74174174,0.092449424,23366.6,552.321,1925.66796,41.98355666,0.082411132,Kroger,44,2.258258258,3.486501437,23.8
4/8/2013,361.7,9,3.299,29.691,Regular,40.18888889,0.082087365,23728.3,561.321,1955.35896,41.95478167,0.082406197,Speedway,43.4,3.211111111,3.483495112,61.8
4/20/2013,362.3,8.036,3.699,29.725164,Regular,45.08461921,0.082045719,24090.6,569.357,1985.084124,41.99895672,0.082400776,BP,45.6,0.515380786,3.486536784,39
4/30/2013,382.3,8.246,3.539,29.182594,Regular,46.36187242,0.076334277,24472.9,577.603,2014.266718,42.06124276,0.082306009,Speedway,48.7,2.338127577,3.487285762,60.2
5/9/2013,397.3,8.722,3.339,29.122758,Regular,45.55147902,0.073301681,24870.2,586.325,2043.389476,42.1131625,0.082162165,Pilot,47.4,1.848520981,3.485079906,65.8
5/18/2013,399,9.051,3.899,35.289849,Regular,44.08352668,0.088445737,25269.2,595.376,2078.679325,42.14311628,0.082261382,Kroger,45.7,1.616473318,3.491372385,68.3
5/30/2013,380.2,9.04,3.659,33.07736,Regular,42.05752212,0.086999895,25649.4,604.416,2111.756685,42.14183609,0.082331621,Sunoco,44.4,2.342477876,3.493879522,78.2
6/14/2013,395.3,9.095,3.759,34.188105,Regular,43.46344145,0.086486479,26044.7,613.511,2145.94479,42.16142824,0.082394683,Meijer,45,1.536558549,3.497809803,67.6
6/22/2013,390.3,9.008,3.559,32.059472,Regular,43.32815275,0.082140589,26435,622.519,2178.004262,42.17831102,0.082390931,BP,44.3,0.971847247,3.49869524,78.2
7/4/2013,388.9,9.501,3.399,32.293899,Regular,40.93253342,0.083039082,26823.9,632.02,2210.298161,42.15958356,0.082400328,BP,43.7,2.767466582,3.497196546,71.6
7/18/2013,399.8,9.06,3.299,29.88894,Regular,44.12803532,0.07475973,27223.7,641.08,2240.187101,42.18740251,0.08228812,Speedway,45.2,1.07196468,3.494395553,83.9
8/25/2013,394.3,9.114,3.529,32.163306,Regular,43.2631117,0.081570647,27618,650.194,2272.350407,42.20248111,0.082277877,Kroger,45.8,2.536888304,3.494880616,74.6
9/5/2013,413.7,9.507,3.519,33.455133,Regular,43.51530451,0.0808681,28031.7,659.701,2305.80554,42.2214003,0.082257071,Speedway,46,2.484695488,3.495228202,70.2
9/14/2013,431.2,9.272,3.299,30.588328,Regular,46.50560828,0.070937681,28462.9,668.973,2336.393868,42.28077964,0.082085587,UDF,46.7,0.194391717,3.492508469,55.1
9/25/2013,417.6,9.685,3.159,30.594915,Regular,43.11822406,0.073263685,28880.5,678.658,2366.988783,42.29273065,0.081958026,Meijer,48.1,4.981775942,3.487749033,61.3
10/11/2013,421.9,9.202,3.299,30.357398,Regular,45.84872854,0.071954013,29302.4,687.86,2397.346181,42.34030181,0.081813987,Kroger,45.7,0.148728537,3.485224001,62.7
10/23/2013,389,8.975,3.259,29.249525,Regular,43.34261838,0.075191581,29691.4,696.835,2426.595706,42.35321131,0.081727224,Meijer,45.9,2.557381616,3.482310312,39.6
11/2/2013,392.8,8.852,3.299,29.202748,Regular,44.37415273,0.074345081,30084.2,705.687,2455.798454,42.3785616,0.081630838,Meijer,44.8,0.425847266,3.480010903,49.7
11/12/2013,363.5,9.114,2.959,26.968326,Regular,39.88369541,0.074190718,30447.7,714.801,2482.76678,42.34675105,0.081542014,Valero,44,4.116304586,3.473367804,31.4
11/24/2013,375.5,9.123,3.199,29.184477,Regular,41.15970624,0.077721643,30823.2,723.924,2511.951257,42.33179174,0.081495473,UDF,42.6,1.440293763,3.46991018,21.1
12/2/2013,364,9.006,2.999,27.008994,Regular,40.41749944,0.074200533,31187.2,732.93,2538.960251,42.30826955,0.08141033,Meijer,41.1,0.682500555,3.464123792,38.9
12/12/2013,325.8,8.576,2.979,25.547904,Regular,37.98973881,0.078415912,31513,741.506,2564.508155,42.25832293,0.081379372,Murphy,39.5,1.510261194,3.458513019,13.8
1/7/2014,317.1,8.915,3.199,28.519085,Regular,35.56926528,0.089937196,31830.1,750.421,2593.02724,42.17885693,0.081464628,Kroger,38.5,2.930734717,3.455430005,-3.6
1/15/2014,359.5,9.252,3.299,30.522348,Regular,38.85646347,0.08490222,32189.6,759.673,2623.549588,42.13839376,0.081503019,Meijer,41.1,2.243536533,3.453524856,28.6
1/27/2014,302.7,8.89,3.249,28.88361,Regular,34.04949381,0.095419921,32492.3,768.563,2652.433198,42.04482912,0.08163267,BP,35.8,1.750506187,3.451159109,37.1
2/4/2014,346.7,8.983,3.279,29.455257,Regular,38.59512412,0.084958918,32839,777.546,2681.888455,42.00497463,0.081667787,UDF,40,1.404875877,3.449170152,22.4
2/16/2014,310.1,8.773,3.459,30.345807,Regular,35.34708766,0.097858133,33149.1,786.319,2712.234262,41.93069225,0.081819243,Speedway,37.7,2.352912345,3.449279824,23.3
3/1/2014,361.8,9.065,3.599,32.624935,Regular,39.91174848,0.09017395,33510.9,795.384,2744.859197,41.90768233,0.081909444,Speedway,42.2,2.288251517,3.450986187,35.1
3/17/2014,354.2,9.356,3.579,33.485124,Regular,37.858059,0.094537335,33865.1,804.74,2778.344321,41.86060094,0.082041521,Speedway,41.9,4.041941,3.45247449,26.3
3/28/2014,354.1,9.165,3.579,32.801535,Regular,38.63611566,0.092633536,34219.2,813.905,2811.145856,41.82429153,0.082151127,UDF,39.8,1.163884343,3.453899234,51.9
4/8/2014,371.5,9.164,3.549,32.523036,Regular,40.53906591,0.087545184,34590.7,823.069,2843.668892,41.80998191,0.082209059,UDF,41.7,1.16093409,3.45495808,49.7
4/21/2014,373.8,9.216,3.679,33.905664,Regular,40.55989583,0.090705361,34964.5,832.285,2877.574556,41.79613954,0.082299891,Shell,42.2,1.640104167,3.457438925,64.1
5/2/2014,391.9,8.834,3.599,31.793566,Regular,44.36268961,0.081126731,35356.4,841.119,2909.368122,41.82309519,0.082286888,Speedway,44.8,0.437310392,3.458925695,50.9
5/10/2014,375.1,8.854,3.659,32.396786,Regular,42.36503275,0.086368398,35731.5,849.973,2941.764908,41.82874044,0.082329734,Speedway,45.8,3.434967246,3.46100983,65.5
5/21/2014,401.1,9.094,3.659,33.274946,Regular,44.10600396,0.082959227,36132.6,859.067,2975.039854,41.85284733,0.082336722,Speedway,45.6,1.493996041,3.463105734,72.3
6/6/2014,435.3,9.487,3.599,34.143713,Regular,45.88384105,0.0784372,36567.9,868.554,3009.183567,41.89687688,0.082290303,Speedway,50.5,4.616158954,3.464590074,67.5
6/21/2014,458.4,9.286,3.799,35.277514,Regular,49.36463493,0.076957928,37026.3,877.84,3044.461081,41.9758726,0.082224286,Kroger,49.6,0.235365066,3.468127541,73.8
7/5/2014,386.8,9.292,3.029,28.145468,Regular,41.6272062,0.072764912,37413.1,887.132,3072.606549,41.97222059,0.082126489,Speedway,44.5,2.872793801,3.463528031,69.2
7/19/2014,433.1,8.961,3.499,31.354539,Regular,48.33165941,0.072395611,37846.2,896.093,3103.961088,42.03581548,0.082015132,Kroger,48.3,0.031659413,3.463882753,66.7
8/6/2014,401.4,9.055,3.399,30.777945,Regular,44.32909994,0.076676495,38247.6,905.148,3134.739033,42.05875724,0.081959104,Speedway,47.6,3.270900055,3.463233673,73.1
8/25/2014,414.1,9.001,3.039,27.354039,Regular,46.00599933,0.066056602,38661.7,914.149,3162.093072,42.09762304,0.081788775,Speedway,46.9,0.894000667,3.459056535,78.2
9/15/2014,406.2,9.094,2.959,26.909146,Regular,44.66681328,0.066246051,39067.9,923.243,3189.002218,42.12292972,0.081627173,Kroger,47.1,2.433186717,3.454130947,59.5
9/30/2014,396.3,9.129,3.189,29.112381,Regular,43.41110746,0.073460462,39464.2,932.372,3218.114599,42.13554247,0.081545162,Kroger,46.7,3.28889254,3.451535009,62
10/22/2014,397.7,9.328,2.859,26.668752,Regular,42.63507719,0.06705746,39861.9,941.7,3244.783351,42.1404906,0.081400619,UDF,45.1,2.464922813,3.445665659,46.9
11/5/2014,413.2,9.262,2.879,26.665298,Regular,44.61239473,0.064533635,40275.1,950.962,3271.448649,42.16456599,0.081227574,UDF,46,1.387605269,3.440146556,50
11/17/2014,398.9,9.081,2.899,26.325819,Regular,43.9268803,0.065996037,40674,960.043,3297.774468,42.18123563,0.081078194,Speedway,45.2,1.2731197,3.435027877,28.6
11/25/2014,345.8,9.003,2.899,26.099697,Regular,38.40941908,0.075476278,41019.8,969.046,3323.874165,42.14619327,0.08103097,UDF,40.7,2.290580917,3.430047867,36.7
12/7/2014,345.6,8.738,2.139,18.690582,Regular,39.55138476,0.054081545,41365.4,977.784,3342.564747,42.12300467,0.080805812,Speedway,41.6,2.048615244,3.418510373,33
12/30/2014,360.8,9.013,1.869,16.845297,Regular,40.03106624,0.046688739,41726.2,986.797,3359.410044,42.10389776,0.080510807,Kroger,42.2,2.168933762,3.40435778,25.4
2/2/2015,338.8,8.725,2.059,17.964775,Regular,38.83094556,0.05302472,42065,995.522,3377.374819,42.0752128,0.080289429,Speedway,41.1,2.269054441,3.392566733,25.9
2/12/2015,321.7,8.765,2.359,20.676635,Regular,36.70279521,0.064273034,42386.7,1004.287,3398.051454,42.02832457,0.08016787,Speedway,39.2,2.497204792,3.383546191,26.3
3/3/2015,310.7,9.93,2.039,20.24727,Regular,31.28902316,0.065166624,42697.4,1014.217,3418.298724,41.92317818,0.080058709,AAFES,37.4,6.110976838,3.370382003,26.4
3/13/2015,408.5,9.404,2.199,20.679396,Regular,43.43896214,0.050622756,43105.9,1023.621,3438.97812,41.93710367,0.079779755,Kroger,42.7,0.738962144,3.359620524,46.1
3/22/2015,396.5,9.051,2.339,21.170289,Regular,43.80731411,0.05339291,43502.4,1032.672,3460.148409,41.9534954,0.079539253,Speedway,45.9,2.092685891,3.35067515,40
3/30/2015,386.7,8.931,1.999,17.853069,Regular,43.29862277,0.04616775,43889.1,1041.603,3478.001478,41.9650289,0.079245222,Meijer,44.4,1.101377225,3.339085504,46.4
4/10/2015,414,8.905,2.399,21.363095,Regular,46.49073554,0.051601679,44303.1,1050.508,3499.364573,42.00339264,0.078986901,Kroger,48.3,1.809264458,3.331116539,61
4/19/2015,368.7,7.84,2.419,18.96496,Regular,47.02806122,0.051437375,44671.8,1058.348,3518.329533,42.04061424,0.07875952,Shell,48.4,1.371938776,3.324359788,62.5
4/28/2015,407.9,9.18,2.179,20.00322,Regular,44.4335512,0.049039519,45079.7,1067.528,3538.332753,42.06119184,0.078490601,Speedway,47.5,3.066448802,3.314510489,49.3
5/10/2015,425.1,9.235,2.499,23.078265,Regular,46.03140227,0.054289026,45504.8,1076.763,3561.411018,42.09524287,0.078264513,Kroger,47.7,1.668597726,3.307516155,74.9
5/19/2015,436.6,9.161,2.629,24.084269,Regular,47.65855256,0.055163236,45941.4,1085.924,3585.495287,42.1421757,0.078044972,BP,49.1,1.44144744,3.301792102,62.9
5/28/2015,399.1,8.503,2.299,19.548397,Regular,46.9363754,0.0489812,46340.5,1094.427,3605.043684,42.17942357,0.077794665,UDF,49,2.063624603,3.294001047,72.9
6/9/2015,416.6,8.858,2.639,23.376262,Regular,47.03093249,0.056112007,46757.1,1103.285,3628.419946,42.21837513,0.077601475,Kroger,48.4,1.36906751,3.288742207,65.5
7/9/2015,419.6,8.917,2.389,21.302713,Regular,47.05618482,0.050769097,47176.7,1112.202,3649.722659,42.25716192,0.077362822,BP,49.4,2.343815184,3.281528588,73.1
7/30/2015,433.9,9.361,2.499,23.393139,Regular,46.35188548,0.053913664,47610.6,1121.563,3673.115798,42.29133807,0.077149118,UDF,48.9,2.548114518,3.274997301,76.2
8/12/2015,410.8,8.774,2.699,23.681026,Regular,46.82015044,0.05764612,48021.4,1130.337,3696.796824,42.32649201,0.076982279,UDF,47.5,0.679849556,3.270526245,68.3
8/23/2015,397,8.841,2.059,18.203619,Regular,44.90442258,0.045852945,48418.4,1139.178,3715.000443,42.34649897,0.076727039,UDF,48.8,3.895577423,3.26112376,72.1
9/1/2015,435.8,9.6,1.999,19.1904,Regular,45.39583333,0.044034878,48854.2,1148.778,3734.190843,42.37198136,0.076435411,Kroger,49.6,4.204166667,3.250576563,75.5
9/12/2015,422.5,8.493,2.269,19.270617,Regular,49.74685035,0.045610928,49276.7,1157.271,3753.46146,42.42610417,0.076171121,Kroger,45.3,4.446850347,3.243372952,58.8
9/22/2015,391.3,8.491,1.799,15.275309,Regular,46.08408904,0.039037335,49668,1165.762,3768.736769,42.45274764,0.075878569,Speedway,48.3,2.215910965,3.232852648,63.3
10/1/2015,421.3,8.961,2.459,22.035099,Regular,47.01484209,0.052302632,50089.3,1174.723,3790.771868,42.48754813,0.075680272,Kroger,50.2,3.185157906,3.22694956,55.9
10/25/2015,412.4,10.057,1.079,10.851503,Regular,41.00626429,0.026313053,50501.7,1184.78,3801.623371,42.47497426,0.075277137,UDF,45.8,4.793735706,3.208716699,55.2
11/14/2015,445.4,9.047,1.979,17.904013,Regular,49.23178954,0.040197604,50947.1,1193.827,3819.527384,42.52617842,0.074970457,Kroger,45.5,3.731789543,3.199397722,38.2
11/24/2015,395.3,9.451,1.899,17.947449,Regular,41.82626177,0.045402097,51342.4,1203.278,3837.474833,42.52068101,0.074742802,Meijer,44.4,2.573738229,3.189183907,37.7
12/9/2015,381.4,9.291,1.469,13.648479,Regular,41.05047896,0.03578521,51723.8,1212.569,3851.123312,42.50941596,0.074455537,Speedway,43.8,2.749521042,3.176003437,46.7
12/18/2015,391,8.715,1.839,16.026885,Regular,44.86517499,0.040989476,52114.8,1221.284,3867.150197,42.5262265,0.074204452,Kroger,46.1,1.234825014,3.166462671,30.8
12/31/2015,356.6,8.754,1.999,17.499246,Regular,40.7356637,0.049072479,52471.4,1230.038,3884.649443,42.51348332,0.074033653,Speedway,43.2,2.464336303,3.158154011,33.4
1/8/2016,375.7,10.531,1.099,11.573569,Regular,35.67562435,0.030805347,52847.1,1240.569,3896.223012,42.45543779,0.073726335,UDF,43.2,7.524375653,3.140674168,38.4
1/17/2016,408.8,8.996,1.199,10.786204,Regular,45.44241885,0.026385039,53255.9,1249.565,3907.009216,42.47694198,0.073362937,Kroger,41.1,4.342418853,3.126695463,24
1/26/2016,326.8,8.83,1.799,15.88517,Regular,37.01019253,0.048608231,53582.7,1258.395,3922.894386,42.43858248,0.073211958,Kroger,39.9,2.889807475,3.11737919,39.6
2/3/2016,338.2,7.974,1.599,12.750426,Regular,42.41284174,0.037700846,53920.9,1266.369,3935.644812,42.4384204,0.072989227,UDF,44.1,1.687158264,3.107818347,53.7
2/10/2016,355.1,8.88,1.349,11.97912,Regular,39.98873874,0.033734497,54276,1275.249,3947.623932,42.42136242,0.072732403,UDF,43.3,3.311261261,3.095571086,16.5
2/17/2016,334.9,8.703,1.559,13.567977,Regular,38.48098357,0.040513517,54610.9,1283.952,3961.191909,42.39465338,0.072534822,UDF,39.6,1.119016431,3.08515576,31.7
2/26/2016,375.8,8.959,1.879,16.833961,Regular,41.94664583,0.044795,54986.7,1292.911,3978.02587,42.39154899,0.072345237,UDF,44.4,2.453354169,3.076797916,29.9
3/13/2016,385.7,8.732,1.959,17.105988,Regular,44.17086578,0.0443505,55372.4,1301.643,3995.131858,42.40348544,0.072150238,UDF,45,0.829134219,3.06929923,54.5
4/5/2016,402.6,9.241,1.959,18.103119,Regular,43.56671356,0.044965522,55775,1310.884,4013.234977,42.41168555,0.071954011,Kroger,45.9,2.333286441,3.061472241,30.9
4/14/2016,370.8,8.674,2.1139,18.3359686,Regular,42.74844362,0.049449754,56145.8,1319.558,4031.570946,42.4138992,0.071805388,UDF,44.5,1.751556375,3.055243457,47.7
4/28/2016,397.6,9.2,2.399,22.0708,Regular,43.2173913,0.05551006,56543.4,1328.758,4053.641746,42.41946239,0.071690803,UDF,44.7,1.482608696,3.050699786,55.8
5/7/2016,377,8.884,1.669,14.827396,Regular,42.43583971,0.039329963,56920.4,1337.642,4068.469142,42.41957116,0.071476468,Speedway,44.1,1.664160288,3.041523174,62.3
5/18/2016,389.2,9.253,2.459,22.753127,Regular,42.06203393,0.058461272,57309.6,1346.895,4091.222269,42.41711492,0.071388079,Kroger,44.8,2.737966065,3.037521313,56.2
5/24/2016,410.5,8.846,2.579,22.813834,Regular,46.40515487,0.055575722,57720.1,1355.741,4114.036103,42.44313626,0.071275623,UDF,47.1,0.694845128,3.034529532,68
6/28/2016,376.6,8.994,2.349,21.126906,Regular,41.87235935,0.05609906,58096.7,1364.735,4135.163009,42.43937468,0.071177244,UDF,42.9,1.027640649,3.030011694,74.6
7/13/2016,357.2,9.138,1.579,14.428902,Regular,39.08951631,0.040394462,58453.9,1373.873,4149.591911,42.41709387,0.070989137,Meijer,41.6,2.510483694,3.020360623,79
8/4/2016,358.8,9.236,1.919,17.723884,Regular,38.84798614,0.04939767,58812.7,1383.109,4167.315795,42.3932604,0.070857413,Kroger,40.7,1.852013859,3.013006057,79.6
8/12/2016,386.7,8.98,2.239,20.10622,Regular,43.0623608,0.051994363,59199.4,1392.089,4187.422015,42.39757659,0.070734197,Kroger,44.7,1.637639198,3.008013148,82.9
8/22/2016,367.9,8.752,2.339,20.470928,Regular,42.03610603,0.055642642,59567.3,1400.841,4207.892943,42.39531824,0.070640988,Meijer,43.3,1.263893967,3.003833371,66.6
8/30/2016,360.1,9.337,2.139,19.971843,Regular,38.56699154,0.055461936,59927.4,1410.178,4227.864786,42.36997032,0.070549778,UDF,41.1,2.533008461,2.998107179,76
9/12/2016,410.1,9.475,2.159,20.456525,Regular,43.2823219,0.049881797,60337.5,1419.653,4248.321311,42.3760595,0.070409303,Marathon,45.1,1.8176781,2.992506838,66
9/22/2016,395.8,9.273,2.189,20.298597,Regular,42.68305834,0.051284985,60733.3,1428.926,4268.619908,42.37805177,0.070284669,UDF,44.4,1.716941659,2.987292489,73.6
10/5/2016,379.5,9.097,1.699,15.455803,Regular,41.71704958,0.040726754,61112.8,1438.023,4284.075711,42.37387024,0.07010112,Speedway,43.7,1.982950423,2.979142691,67.3
10/7/2016,129.6,2.722,2.309,6.285098,Regular,47.61204996,0.048496127,61242.4,1440.745,4290.360809,42.38376673,0.0700554,Kroger,47.2,0.412049963,2.977876591,67.4
10/10/2016,400.1,8.569,2.219,19.014611,Regular,46.69156261,0.047524646,61642.5,1449.314,4309.37542,42.40923637,0.06990916,Kroger,48.8,2.108437391,2.973389769,53.2
10/20/2016,395.7,8.947,1.949,17.437703,Regular,44.22711523,0.044067988,62038.2,1458.261,4326.813123,42.42038977,0.069744337,Wal Mart,46.8,2.572884766,2.967104738,59.5
10/31/2016,395.9,9.247,2.099,19.409453,Regular,42.81388558,0.049026151,62434.1,1467.508,4346.222576,42.42286925,0.069612961,Meijer,45.6,2.786114415,2.961634673,51.2
11/10/2016,414.6,8.899,1.999,17.789101,Regular,46.58950444,0.042906659,62848.7,1476.407,4364.011677,42.44798352,0.069436785,Meijer,48.3,1.710495561,2.955832421,45
11/22/2016,366,9.225,2.599,23.975775,Premium,39.67479675,0.065507582,63214.7,1485.632,4387.987452,42.43076347,0.069414036,Meijer,43.6,3.925203252,2.953616677,30.8
12/6/2016,393.2,9.229,1.989,18.356481,Regular,42.60483259,0.046684845,63607.9,1494.861,4406.343933,42.43183814,0.069273533,BP,44.8,2.195167407,2.947661309,N/A
12/21/2016,334,8.855,2.259,20.003445,Regular,37.71880294,0.059890554,63941.9,1503.716,4426.347378,42.40408428,0.069224521,UDF,39.2,1.481197064,2.943605959,N/A
1/9/2017,332,8.847,2.429,21.489363,Regular,37.52684526,0.064726997,64273.9,1512.563,4447.836741,42.37555725,0.069201289,Speedway,39.6,2.073154742,2.940596022,N/A
One of the things I tried was
plot(factor(Gas.Source),MPG)
It's exactly what you'd expect. Some of the factor levels have very few (or one) observations and so rather than a box and whicker you just get a black line.
I understand this is exactly what I asked it to do, as some of those sources had very few observations. So what I'd like to do is efficiently remove the measurements associated with factor levels where there aren't enough observations to really produce a box and whisker...
I'm guessing I could do this by creating a new dataframe where I've used logical subscripting to select only those rows corresponding to a factor level that has a count that's greater than X....but I'm not sophisticated enough to figure that out yet.
Found what I was looking for here
Given that the original data was in a dataframe called mileage
tbl <- table(mileage$Gas.Source)
new.Mileage <- droplevels(mileage[mileage$Gas.Source %in% names(tbl)[tbl>10],,drop=FALSE])
new.Mileage now has only those rows where there were more than 10 observations at that factor level (i.e. from that gas source)

R storing different columns in different vectors to compute conditional probabilities

I am completely new to R. I tried reading the reference and a couple of good introductions, but I am still quite confused.
I am hoping to do the following:
I have produced a .txt file that looks like the following:
area,energy
1.41155882174e-05,1.0914586287e-11
1.46893363946e-05,5.25011714434e-11
1.39244046855e-05,1.57904991488e-10
1.64155121046e-05,9.0815757601e-12
1.85202830392e-05,8.3207522281e-11
1.5256036289e-05,4.24756620609e-10
1.82107587343e-05,0.0
I have the following command to read the file in R:
tbl <- read.csv("foo.txt",header=TRUE).
producing:
> tbl
area energy
1 1.411559e-05 1.091459e-11
2 1.468934e-05 5.250117e-11
3 1.392440e-05 1.579050e-10
4 1.641551e-05 9.081576e-12
5 1.852028e-05 8.320752e-11
6 1.525604e-05 4.247566e-10
7 1.821076e-05 0.000000e+00
Now I want to store each column in two different vectors, respectively area and energy.
I tried:
area <- c(tbl$first)
energy <- c(tbl$second)
but it does not seem to work.
I need to different vectors (which must include only the numerical data of each column) in order to do so:
> prob(energy, given = area), i.e. the conditional probability P(energy|area).
And then plot it. Can you help me please?
As #Ananda Mahto alluded to, the problem is in the way you are referring to columns.
To 'get' a column of a data frame in R, you have several options:
DataFrameName$ColumnName
DataFrameName[,ColumnNumber]
DataFrameName[["ColumnName"]]
So to get area, you would do:
tbl$area #or
tbl[,1] #or
tbl[["area"]]
With the first option generally being preferred (from what I've seen).
Incidentally, for your 'end goal', you don't need to do any of this:
with(tbl, prob(energy, given = area))
does the trick.

Resources