Multiple boxplots in R created sequentially (preferably with base) - r

My data is (sort of) like this:
thing value
a 100
a 101
. .
a 99
b 201
b 202
. .
b 199
I want to compare the median of the values by thing. Normally I would make a boxplot with something like boxplot(value ~ thing, data = table), but there is a catch.
The problem
The number of values of each thing is about 100,000. Consequently I have had to process my table in sections. I process all values for thing a, then all values for thing b, etc.
Is there a way to make a boxplot and then add more plots to it, like using plot and dots? I want to use the base plotting system if possible (to make the plots consistent).
However, if that isn't possible, I guess ggplot2 might have to be the go.
I should add, what I'm hoping to achieve is something like the image here:
http://www.statmethods.net/graphs/images/boxplot1.jpg

Related

Give different color distribution for different columns in a data.frame

I tried to build a heat-map for the cluster result of my data.frame. My data.frame has 5 columns with corresponding row names. I want to know if I could give the color distribution based on different colors, since the range of my 5 variables are so different, and if I don't scale them the result from "pheatmap" function in R would be a heat-map with only one or two color. And I really don't want to scale the data since I do need the positive or negative sign of my data point to remain what is should be. And here's the head of my data.frame, which I omit the rownames.
r.Square_gamma_logLink cof_glm.gamma_logLink int_glm.gamma_logLink estimated_shape_logLink
0.2524970 0.002357581 8.685446 3.558583
0.5932941 0.002651972 9.486916 8.085618
0.3615135 -0.001646538 10.071672 6.195176
0.4131553 -0.002218262 10.563557 8.671028
0.3529775 -0.002336544 10.984005 4.569396
0.4169932 0.002213259 9.602592 5.216084
estimated_dispersion_logLink
0.2810107
0.1236764
0.1614159
0.1153266
0.2188473
0.1917147
I did try to use the pheatmap, and the heatmap function, which are not quite useful, and the result is looks pretty much like this.

R Plot Multiple Lines According to Choice of User in function()

I want to plot the data by using function(). My data consists of 4 vectors, say a, b, c and d.
I have to plot them by choice of vector at that time.
For example
i want to plot vector a and c then graph must have 2 lines....
if i want 2 plot all 4 vectors then there must be 4 lines in graph.
Till I have tried switch() but i think thats not suitable related to my work.
Is it even possible to write such code in an anonymous function ?
If yes, what is the right way, and if not, is there any workaround ?

Making a histogram

this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result
hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).

Making multiple graphs from individual lines in R, iterating through a dataset

I have a dataset that looks like this:
> averages
compound control.0 control.30 surgery.0 surgery.30
1. A 3.609958 3.578200086 3.556325 3.669107598
2. B 4.984090 4.798330495 4.965342 4.812247664
I want to make a graph for only compound A that plots two lines- one connecting (0, control.0) to (30, control.30) and one that plots (0, surgery.0) to (30, surgery.30). I also have 200 compounds so I would ideally like to be able to have the program go down the list and spit out a graph for each compound without me manually going in and changing the line number. How would I go about doing this?
For two line segments per graph, use this:
with(subset(averages, compound=="A"), plot(c(0,30,NA,0,30),c(control.0,control.30,NA,surgery.0,surgery.30), type="l"))
Then change the subset condition to create a loop.

dealing with data table with redundant rows

The title is not precisely stated but I could not come up with other words which summarizes what I exactly going to ask.
I have a table of the following form:
value (0<v<1) # of events
0.5677 100000
0.5688 5000
0.1111 6000
... ...
0.5688 200000
0.1111 35000
Here are some of the things I like to do with this table: drawing the histogram, computing mean value, fitting the distribution, etc. So far, I could only figure out how to do this with vectors like
v=(0.5677,...,0.5688,...,0.1111,...)
but not with tables.
Since the number of possible values are huge by being almost continuous, I guess making a new table would not be that effective, so doing this without modifying the original table and making another table would be desirable very much. But if it has to be done so, it's okay. Thanks in advance.
Appendix: What I want to figure out is how to treat this table as a usual data vector:
If I had the following vector representing the exact same data as above:
v= (0.5677, ...,0.5677 , 0.5688, ... 0.5688, 0.1111,....,0.1111,....)
------------------ ------------------ ------------------
(100000 times) (5000+200000 times) (6000+35000) times
then we just need to apply the basic functions like plot, mean, or etc to get what I wanted. I hope this makes my question more clear.
Your data consist of a value and a count for that value so you are looking for functions that will use the count to weight the value. Type ?weighted.mean to get information on a function that will compute the mean for weighted (grouped) data. For density plots, you want to use the weights= argument in the density() function. For the histogram, you just need to use cut() to combine values into a small number of groups and then use aggregate() to sum the counts for all the values in the group. You will find a variety of weighted statistical measures in package Hmisc (wtd.mean, wtd.var, wtd.quantile, etc).

Resources