I have data in a csv file which I have imported to R. The data is in a test file available at http://www.cyclismo.org/tutorial/R/_static/trees91.csv
I have imported this using:
tree<-read.csv(file="trees91.csv",header=TRUE,sep=",");
I can then extract two rows as follows
m<-tree[1,4:28]
n<-tree[2,4:28]
I would then like to plot these two sets of data as a scatter graph. I am using the command:
plot(x,y)
However, this doesn't give me a scatter graph. Instead I get a plot with 25x25 small squares each with a small circle in. The ones on the diagonal contain a number in them. The left hand y-axis and top x-axis have the same labels (0.25,0.20, 0.25, 25, 25, 0.10, 0.5, 0.4, 0.08,0.15,0.10,0.10) whilst the other two axes have the labels (0.6,0.08,1.5,0.6, 12,0.15,0.1,0.8,0.08,0.04,0.20,0.08,0.08). I have tried this with both a header row and without a header row in the csv file (setting header =FALSE in the input command) and get the same problem.
Using the same approach but extracting two columns, I am able to plot a scatter graph, so I have no idea why R won't plot a scatter graph from rows in a csv file. This seems like a fairly basic thing to want to do.
Are you after this:
plot(unlist(m),unlist(n))
tree is a dataframe, and so are m and n as subsets of it. The default for dataframes is to plot each column against each column, so you get 25x25 plots as you saw. Unlist converts the dataframe to a vector, so you see the plotting behaviour that you might be expecting.
See:
?plot.default for what you want.
?plot.data.frame for what you're getting.
Related
My aim is creating a heatmap with a continuous legend. I tried a few paths and ended up with pheatmap. Legend is great, names of columns are ok, but the names of the rows don't show. I'd also like plot as squares not rectagles. Any ideas?
I'm trying to make a plot in R. My x-axis is a week number converted to factor and my y-axis is an amount.
When I run plot() instead of dots I get horizontal lines.
Why does this happen?
Here is a sample dataset:
df <- data.frame(fin_week=as.factor(seq(1,20, by =1)), amount=(rnorm(20)^2)*100)
plot(df)
Looking at the documentation, it's because the first column is a factor. When R tries to find the right plot() to run, it looks into plot.dataframe, where it plots on the type of 1st column i.e a factor. Hence it plots using plot.factor(), which gives a line by default, which is used for box plots.
try using plot.default(df) to plot and you should get it the scatter plot
For my thesis i want to create a histogram on standardized earnings. This histogram should ideally have the following properties:
The histogram should be able to have the intervals of the data
(bins) played with.
Since i have my data in a spreadsheet. Is it possible to consider
more than one column?
Also it should have the ability to set the range of the data that is
included in the histogram for example from -50 mio. to 200 mio. (But
i could do this in my input)
Sadly I was not able to perform this task my own.
I have downloaded the data from orbis in spreadsheet (xlsx). Afterwards I cleaned my data of symbols that R can't read, saved everything as a Tab separated .txt and imported it into R-Studio:
setwd("/path")
getwd()
df<- read.table("importFile", header = TRUE)
View(df)
This worked nicely.
Now i tried creating the histogram
library(ggplot2)
myplot=ggplot(df, aes(JuStandartisiert2007))
myplot+ stat_count(width = 1000)
Then i received the following warning:
position_stack requires non-overlapping x intervals
My histogram looks horrible:
This perplexes me, I tried making a histogram on the airquality dataset and it works without problems.
Also note that i have to use stat_count for my histogram in a youtube video i saw, they did it the following way:
myplot+ geom_histogram(binwidth = 10)
My questions are now:
What is wrong with my Data why i have overlapping x Values? To my naked eye my data looks the same than that from R's airquality dataset.
How can I sepparate my x values?
Can i set max and min values for the data that enters my Histogram?
Can I consider more than one column in my dataset.
Here is my Dataset as TAB separated txt file.
https://www.dropbox.com/sh/jbscj6cftpcqaxh/AADglvv_xnG2wWN-o2SIrTwpa?dl=0
I would rather begin with base plotting such as:
hist(df$JuStandartisiert2007,breaks=1000,xlim=c(-2,2))
you can also observe the limits for the x-axis.
In order to have the plot of two columns try :
plot(df$JuStandartisiert2007,df$BilanzsummeAktiva2007,xlim = c(-5,5),ylim=c(-1,1000))
Once again observe the x and y limits represented by: xlim and ylim
I've successfully processed my data A, B and displayed them by barplots and by factors using ggplot2. The problem occurs when I want to export my plots: the same data B, produce the empty plot in basic R plotting and in exporting throught .png (right), however, they produce the "lines" in exporting to .pdf (right). I understand that the reason may be that my B data are actually not equal to 0, and .pdf demonstrate it by adding "lines" instead of bars in the plots. However, I don't want to report them in this way. Equally, I don't want to use the free_y_scale, as the aim of the plotting is mostly to communicate the difference in range of my values for A and B at a first glance.
Please, how can I get rid of these "lines" added by .pdf? Thank you !
# export data
png("plot1.png")
pdf("plot1.pdf", height = 6, width = 7, family = "Helvetica")
dev.off()
I want to combine a time series of in situ values (line) with boxplots of estimated values of special dates. I tried to understand this "Add a line from different result to boxplot graph in ggplot2" question, but my dates make me drive crazy. Sometimes I only have in situ values of a date, sometimes only estimated values and sometimes both together.
I uploaded a sample of my data here:
http://www.file-upload.net/download-9942494/estimated.txt.html
http://www.file-upload.net/download-9942495/insitu.txt.html
How can I create a plot with both data sets that looks like this http://www.file-upload.net/download-9942496/desired_outputplot.png.html
in the end?
I got help and have a solution now:
insitu <- read.table("insitu.txt",header=TRUE,colClasses=c("Date","numeric"))
est <- read.table("estimated.txt",header=TRUE,colClasses=c("Date","numeric"))
insitu.plot <- xyplot(insitu~date_fname,data=insitu,type="l",
panel=function(x,y,...){panel.grid(); panel.xyplot(x,y,...)},xlab=list(label="Date",cex=2))
est.plot <- xyplot(estimated~date,data=est,panel=panel.bwplot,horizontal=FALSE)
both <- insitu.plot+est.plot
update(both,xlim=range(c(est$date,insitu$date_fname))+c(-1,1),ylim=range(c(est$estimated,insitu$insitu)))