I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?
I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)
You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).
To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)
Related
So I'm trying to format my xls data in a way that the first row will be seen in R, but it won't be analysed as in this example: http://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData
When you open the exprs(bm) expression data in this the first row gives you the gene names, but these aren't e.g. being log transformed.
I formatted my own data into a similar table, but cannot figure out how to omit the first table from showing up in R and more importantly being used in calculations, which of course results in error codes all the way.
Hope that makes sense?
Cheers
I'm new with R, but doing my best..
I'm trying to create a histogram from data I got in a .csv file. Just imagine one column with 10.000 random numbers with a range from 1 to 5. I want to create a histogram that shows how many times 1 occurs, how many times 2 occurs, how many times 3 occurs, etc. (Up to 5).
Is this possible in any way? Or should I do this in Excel and then get the results from there into R to create the histogram? I don't seem to get any wiser from any of the video tutorials so far or any of the other questions asked on here..
Import data from csv into R first:
dat = read.csv("c:\\documents\\file.csv")
Assuming you have a column called "col" in your csv file that has your data, run this:
hist(dat$col)
If you need to know how many times each value occurs, a more precise way is to make a table:
table(dat$col)
I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).
I have a data frame loaded using the CSV Library in R, like
mySheet <- read.csv("Table.csv", sep=";")
I now can print a summary on that mySheet object
summary(mySheet)
and it will show me a summary for each column, for example, one column named Diagnose has the unique values RCM, UCM, HCM and it shows the number of occurences of each of these values.
I now filter by a diagnose, like
subSheet <- mySheet[mySheet$Diagnose=='UCM',]
which seems to be working, when I just type subSheet in the console it will print only the rows where the value has been matched with 'UCM'
However, if I do a summary on that subSheet, like
summary(subSheet)
it still 'knows' about the other two possibilities RCM and HCM and prints those having a value of 0. However, I expected that the new created object will NOT know about the possible values of the original mySheet I initially loaded.
Is there any way to get rid of those other possible values after filtering? I also tried subset but this one just seems to be some kind of shortcut to '[' for the interactive mode... I also tried DROP=TRUE as option, but this one didn't change the game.
Totally mind squeezing :D Any help is highly appreciated!
What you are dealing with here are factors from reading the csv file. You can get subSheet to forget the missing factors with
subSheet$Diagnose <- droplevels(subSheet$Diagnose)
or
subSheet$Diagnose <- subSheet$Diagnose[ , drop=TRUE]
just before you do summary(subSheet).
Personally I dislike factors, as they cause me too many problems, and I only convert strings to factors when I really need to. So I would have started with something like
mySheet <- read.csv("Table.csv", sep=";", stringsAsFactors=FALSE)
I'm new at R, and I have to write commands to read a file containing real values and then compute and plot a histogram of distributions, using 100 subintervals.
I've been havin' some problems in using hist() function...
This is what I do for readin' data:
values = read.table("filepath.txt");
filepath.txt contains real values (2509.92, 615.41, 417.031, ... , 0.0516073, 0.023377, 0.00681471).
Then I've tried to follow these instructions ( http://msenux.redwoods.edu/math/R/hist.php ), but it did not work, because using method as.numeric(), the system thinks it's managin' integer data and all the values are set to 1.0
How could I do?
Thanks a lot!
If your "filepath.txt" is exactly as you show, it is a comma-separated file, and you need to specify such appropriately in your read.table call. That may be all you need to do.
The info on your referenced page has nothing to do with reading or converting data, so I'm not sure why you are asking about histogram generation when you know your source data is bad.
However, I'm not sure because your question is a little imprecise: there's no such thing as "the system." If you can provide the exact R code you are using to read the data file, and clarify whether "all values are set to 1.0" means the values in your variable values or all the data in the output of hist we can guide you further.