Making a histogram

Making a histogram - r

this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result

hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).

Related

Get next level from a given level factor

I am currently making my first steps using R with RStudio and right now I am struggling with the following problem:
I got some test data about a marathon with four columns, where the third column is a factor with 15 levels representing different age classes.
One age class randomAgeClass will be randomly selected at the beginning, and an object is created holding the data that matches this age class.
set.seed(12345678)
attach(marathon)
randomAgeClass <- sample(levels(marathon[,3]), 1)
filteredMara <- subset(marathon, AgeClass == randomAgeClass)
My goal is to store a second object that holds the data matching the next higher level, meaning that if age class 'kids' was randomly selected, I now want to access the data relating to 'teenagers', which is the next higher level. Looking something like this:
nextAgeClass <- .... randomAgeClass+1 .... ?
filteredMaraAgeClass <- subset(marathon, AgeClass == nextAgeClass)
Note that I already found this StackOverflow question, which seems to partially match my situation, but the accepted answer is not understandable to me, thus I wasn't able to apply it to my needs.
Thanks a lot for any patient help!

First you have to make sure thar the levels of your factor are ordered by age:
factor(marathon$AgeClass,levels=c("kids","teenagers",etc.))
Then you almost got there in your example:
next_pos<-which(levels(marathon$AgeClass)==randomAgeClass)+1 #here you get the desired position in the level vector
nextAgeClass <- levels(marathon$AgeClass) [next_pos]
filteredMaraAgeClass <- subset(marathon, AgeClass == nextAgeClass)
You might have a problem if the randomAgeClass is the last one, so make sure to avoid that problem

Create a simple range/values in range table

I am working on a simple project to help me get to know R, coming from javascript.
I have imported a list of numbers, and all I simply want to do, is to export a table that looks like the following:
"range","number"
"0.000-0.510",863
"0.510-1.020",21
"1.020-1.530",2
"1.530-2.040",2
"2.040-2.550",0
"2.550-3.059",2
"3.059-3.569",0
"3.569-4.079",3
"4.079->4.589",0
"4.589->5.099",1
where the ranges are in 10 steps, from the smallest to the largest value, the "range" and "number" are the top rows, and the columns going down are the different ranges and number of occurrences in this range.
This is my attempt so far:
list <- read.csv(file = "results/solarSystem.data")
table(list)
range <- (max(list) - min(list)) / 10
a1<-as.data.frame(table(cut(list,breaks=c(min(list),min(list)+1*range,min(list)+2*range,min(list)+3*range,min(list)+4*range,min(list)+5*range,min(list)+6*range,min(list)+7*range,min(list)+8*range,min(list)+9*range,max(list)))))
colnames(a1)<-c("range","freq")
a1
However, I get an error that
'Error in cut.default(list, breaks = c(min(list), min(list) + 1 * range...
'x' must be numeric'
This is the file I am importing, what looks like just a simple list of numbers, so I don't understand how it cannot be numeric?
https://gyazo.com/8fd00ce45c1c033f9dc9bf6c829195eb
Any advice on this would be appreciated!
Peter

How to use Chao1? (BEGINNER)

I am using R for the first time. I am trying to use the Chao1 function to estimate the diversity of my dataset. I have 20 columns, one for each species, and 8 rows (nine if you include the header), one for each plot. Each cell has a number, which is the number of individuals of that species found in that plot. For example, in my Excel file, cell A2 has the value "8", which means that 8 individuals of Species1 were found in the first plot.
I have downloaded the Fossil and Vegan packages, where I believe the Chao1 function is located. They are active in my library. I have imported my dataset as "speciesabund". I am now trying to run Chao1. According to the description (https://artax.karlin.mff.cuni.cz/r-help/library/fossil/html/chao1.html) I'm supposed to type
chao1(x, taxa.row = TRUE)
I assumed "x" was meant to represent my dataset, so I tried
chao1(speciesabund, taxa.row = TRUE)
instead. It did not work and returned me "Error: Unsupported use of matrix or array for column indexing." I assume this means that I need to do something more to my data before trying to use the Chao function, is that correct? If so, how do I do this?
Thank you so much for your help! I am using this for the first time, so I'm sorry if my question is dumb.

How to compute for the mean and sd

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)

Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])

There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

How to manage factors with mixed data types

I'm afraid this question has two sub parts. My project is to determine which insurance carrier has the lowest cost based on CPT Codes. Since there are so many CPT Codes I wanted to group them using cut like this:
uCPTCode<- unique(data$CPTCode)
uCPTCode <- cut(uCPTCode,
breaks = c(-Inf, "01999", "69979", "79999", "89398", "99091", "99499", Inf),
labels = c("NA","Anesthesia", "Surgery", "Radiology", "Pathology&Laboratory", "Medicine","Evaluation&Management", "Temp"),
right = FALSE)
Not sure unique is required or wise, but seemed to make sense to me. The issue is that some codes have leading zeros and terminating letters like this
2608 Levels: 0014F 0159T 0164T 0191T 0195T 0232T 0319T 0326T 0513F 0517F 0518F
So question 1 is what is the process to convert these ranges into integers corresponding to the labels I have in the cut function so I can graph the grouped results the x axis?
Question 2 is that I expected the ranges to be continuous, but they are not. How to I manage what happens around code 99000 through 99216 where previous groups (Medicine, Anesthesiology and Evaluation and Management) get combined? Here is a link to the CPT grouper file https://www.dropbox.com/s/wm55n17pufoacww/CPTGrouper.xlsx?dl=0
Here is a smattering of results to see where I am going with it
https://www.dropbox.com/s/h6sdnvm9yew6jdg/SampleStudyResults.xlsx?dl=0
Thanks very much for your time and attention

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Making a histogram - r

Related

Get next level from a given level factor

Create a simple range/values in range table

How to use Chao1? (BEGINNER)

How to compute for the mean and sd

How to manage factors with mixed data types

Categories

Resources