:)
I would like to ask you something about the plot in R. I would be very happy if someone could help me!!!
I wrote a code of Heston Model in R. It produced a vector of Option Prices, lets call it H1.
Each Option price (each element of the vector H1) corresponds to one day. The vector is very long ( 3556 elements ). I wanted to plot it and analyse the graph. I used the function plot(.....). Then I wanted on the axis x to have the dates and on the axis y the prices of my options. So I used the function axis(1, z) ( where z is the vector which contains all 3556 dates) and axis(2,H1) ( where H1 contains all 3556 option prices).
The point is all dates and all option prices are contained on my graph :/ and it looks very badly and none can see clearly anything because of the huge amount of dates in axis x and the huge amount of option prices in axis y. How can I reduce the number of dates and the option prices? I mean to write them with some interval?
If it is not clear write me please and I will send the whole code.
Thank you very much!!!!!!!!!!!!!!!!!!!!!!!!!! :)
How about using the ggplot2 library for plotting along with plyr::ddply() and cut() to reduce it to 20 (or whatever) intervals?
prices<-10+runif(3000)*3+(1:3000)/3000
dates<-as.Date(1:3000,origin="1980-01-01")
df<-data.frame(prices,dates)
#required libraries
require(plyr) # library for ddply call
require(ggplot2)# plotting library
#call explained below
plotdata<-ddply(df,.(date_group=cut(dates,20)),summarise,avg_price=mean(prices))
ggplot(plotdata) + # base plot
geom_line(aes(date_group,avg_price,group=1)) + # line
geom_smooth(aes(date_group,avg_price,group=1)) + # smooth fit with CI
theme(axis.text.x = element_text(angle = 90, hjust = 1)) # rotate x axis labels
#explanation of ddply function
ddply( #call ddply function, takes a dataframe, returns a dataframe
df, #input data data.frame 'df'
.(date_group=cut(dates,20)), #summarise by cut() - cuts the date into 20 blocks
summarise, #tell to summarise
avg_price=mean(prices) #for each value of the cut (each group), average the prices
)
Related
I have a dataset that looks like this one, with month (mese) in one column and the corresponding value in the other column and I'm trying to create a heatmap with the month(s) on the x axis, different "intervals" on the y axis (e.g. from 0 to 10, 10 to 20, 20 to 30 etc.) and the number of times a certain range of value repeats itself inside the month for each range.
I tried to use the cut function for both the x and the y axis in order to create a number of ranges of values, then putting everything into a table and plotting it with this code
x_c <- cut(x, 12)
y_c <- cut(y, 50)
z <- table(x_c, y_c)
image2D(z=z, border="black")
but it doesn't seem to work: the scale is always from 0 to 1 (and i need the actual values)... is there an easier solution?
Essentially, I need the end result to look something like this (sorry for my very poor paint skills): i.e. the level of sulphate is higher during the winter than the summer and the majority of the data follow a "curve" that reflect this tendency
You can use geom_bin2d from ggplot2. You can define the number of bins:
ggplot(data, aes(mese, nnso4)) +
geom_bin2d(bins=c(12,50)) +
scale_fill_gradient(low="yellow", high="red")
You can change the fill scale, for instance viridis package has some options.
Data:
data = data.frame(rnorm(250, 90, sd = 30))
I want to create a histogram where I have a bin of fixed width, but all observation which are bigger than arbitrary number or lower than another arbitrary number are group in their own bins. To take the above data as an example, I want binwidth = 10, but all values above 100 together in one bin and all values bellow 20 together in their own bin.
I looked at some answers, but they make no sense to me since they are mostly code. I would appreciate it greatly if somebody can explain the steps.
The examples below show how to create the desired histogram in base graphics and with ggplot2. Note that the resulting histogram will be quite distorted compared to one with a constant break size.
Base Graphics
The R function hist creates the histogram and allows us to set whatever bins we want using the breaks argument:
# Fake data
set.seed(1049)
dat = data.frame(value=rnorm(250, 90, 30))
hist(dat$value, breaks=c(min(dat$value), seq(20,100,10), max(dat$value)))
In the code above c(min(dat$value), seq(20,100,10), max(dat$value)) sets breaks that start at the lowest data value and end at the highest data value. In between we use seq to create a sequence of breaks that goes from 20 to 100 by increments of 10. Here's what the plot looks like:
ggplot2
library(ggplot2)
ggplot(dat, aes(value)) +
geom_histogram(breaks=c(min(dat$value), seq(20,100,10), max(dat$value)),
aes(y=..density..), color="grey30", fill=hcl(240,100,65)) +
theme_light()
I have observations in the form of ranges
For eg: A 13-20, B 15-30, C 23-40, D 2-11
I want to plot them in R in form of the starting value and the end value for eg. 13 and 20 for A(upper and lower limits if you may say) in order to visualize and find out what ranges are common to certain combinations of observations. Is there a quick way to do this in R ? I think this is a very trivial problem I am having but I cant think of anyway to do it right now.
Here is a solution using ggplot. It's not clear at all what format your data is in, so this assumes a data frame with columns id (A-D), min, and max.
df <- data.frame(id=LETTERS[1:4], min=c(13,15,23,2), max=c(20,30,40,11))
library(ggplot2)
ggplot(df, aes(x=id))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()
I've added a lot of customization just to give you an idea of how it's done. You use the aes(...) function to tell ggplot which columns in df map to various aesthetics of the graph. So for instance aes(x=id) tells ggplot that the values for the x-axis are to be found in the id column of df, etc.
EDIT: Response to OP's comment.
To change the size of axis text, use the theme(...) function, as in:
ggplot(df, aes(x=id))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()+
theme(axis.text.x=element_text(size=15))
Here I made the x-axis text bigger. Play around with size=... to get it the way you want. Also read the documentation (?theme) for a list of other formatting options.
It is not clear whether the dataset has range column as string or not i.e. '13-20', '15-30' etc. or if it is two numeric columns as showed in the created example.
matplot(m1, xaxt='n', pch=1, ylab='range')
axis(1, at=seq_len(nrow(m1)), labels=row.names(m1))
s1 <- seq_len(nrow(m1))
arrows(s1, m1[,1], s1, m1[,2], angle=90, length=0.1)
If the data has string column (d1)
library(splitstackshape)
d2 <- setDF(cSplit(d1, 'range', '-'))
matplot(d2[,-1], xaxt='n', pch=1, ylab='range')
axis(1, at=seq_len(nrow(d2)), labels=d2$Col1)
arrows(s1, d2[,2], s1, d2[,3], angle=90, length=0.1)
data
m1 <- matrix(c(13,20, 15,30, 23,40, 2,11),
byrow=TRUE,dimnames=list(LETTERS[1:4],NULL), ncol=2)
d1 <- data.frame(Col1=LETTERS[1:4],
range=c('13-20', '15-30', '23-40', '2-11'), stringsAsFactors=FALSE)
Warning: still new to R.
I'm trying to construct some charts (specifically, a bubble chart) in R that shows political donations to a campaign. The idea is that the x-axis will show the amount of contributions, the y-axis the number of contributions, and the area of the circles the total amount contributed at this level.
The data looks like this:
CTRIB_NAML CTRIB_NAMF CTRIB_AMT FILER_ID
John Smith $49 123456789
The FILER_ID field is used to filter the data for a particular candidate.
I've used the following functions to convert this data frame into a bubble chart (thanks to help here and here).
vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)
symbols(vals,counts, circles=sums, fg="white", bg="red", xlab="Amount of Contribution", ylab="Number of Contributions")
text(vals, counts, sums, cex=0.75)
However, this results in way too many intervals on the x-axis. There are several million records all told, and divided up for some candidates could still result in an overwhelming amount of data. How can I convert the absolute contributions into ranges? For instance, how can I group the vals into ranges, e.g., 0-10, 11-20, 21-30, etc.?
----EDIT----
Following comments, I can convert vals to numeric and then slice into intervals, but I'm not sure then how I combine that back into the bubble chart syntax.
new_vals <- as.numeric(as.character(sub("\\$","",vals)))
new_vals <- cut(new_vals,100)
But regraphing:
symbols(new_vals,counts, circles=sums)
Is nonsensical -- all the values line up at zero on the x-axis.
Now that you've binned vals into a factor with cut, you can just use tapply again to find the counts and the sums using these new breaks. For example:
counts = tapply(dfr$CTRIB_AMT, new_vals, length)
sums = tapply(dfr$CTRIB_AMT, new_vals, sum)
For this type of thing, though, you might find the plyr and ggplot2 packages helpful. Here is a complete reproducible example:
require(ggplot2)
# Options
n = 1000
breaks = 10
# Generate data
set.seed(12345)
CTRIB_NAML = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_NAMF = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_AMT = paste('$', round(runif(n, 0, 100), 2), sep='')
FILER_ID = replicate(10, paste(as.character((0:9)[sample(9)]), collapse=''))[sample(10, n, replace=T)]
dfr = data.frame(CTRIB_NAML, CTRIB_NAMF, CTRIB_AMT, FILER_ID)
# Format data
dfr$CTRIB_AMT = as.numeric(sub('\\$', '', dfr$CTRIB_AMT))
dfr$CTRIB_AMT_cut = cut(dfr$CTRIB_AMT, breaks)
# Summarize data for plotting
plot_data = ddply(dfr, 'CTRIB_AMT_cut', function(x) data.frame(count=nrow(x), total=sum(x$CTRIB_AMT)))
# Make plot
dev.new(width=4, height=4)
qplot(CTRIB_AMT_cut, count, data=plot_data, geom='point', size=total) + opts(axis.text.x=theme_text(angle=90, hjust=1))
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.