Creating vectors of equal length from R dataset and plotting them - r

R has some built in datasets, namely I'm using "lynx" and "LakeHuron" which are of different lengths. "lynx" contains data on annual lynx trappings from 1821-1934, and "LakeHuron" contains annual water level from 1872-1975.
I need to plot LakeHuron data on the y-axis, and lynx data on the x-axis, but only for the years 1875-1934 inclusive. I created two vectors:
lynx.years = c(lynx)
huron.years = c(LakeHuron)
I am stuck at the point of trying to only make a plot for the specified year range. Can someone help me figure out how to plot the data from the two vectors for only the years 1875-1934?
Thank you!

1) The question did not specify what sort of plot was desired so assume it is a two panel plot with one series in each panel with years on the X axis such that only the range of years mentioned is shown. That range of years is the intersection of the years of the two series so:
plot(na.omit(cbind(LakeHuron, lynx)))
Drop the na.omit if you want to plot the entirety of the two series.
2) If what is wanted is to rescale the two series so that their shapes can be shown on a single panel despite vastly different ranges:
ts.plot(scale(na.omit(cbind(LakeHuron, lynx))), col = 1:2)
Again, we could drop the na.omit if the entire series were desired.
3) If what is wanted is to plot one vs. the other then:
plot(unclass(cbind(LakeHuron, lynx)))

Related

Counts, bars, bins for each pandas DataFrame histogram subplot

I am making separate histograms of travel distance per departure hour. However, for making further calculations I'd like to have the value of each bin in a histogram, for all histograms.
Up until now, I have the following:
df['Distance'].hist(by=df['Departuretime'], color = 'red',
edgecolor = 'black',figsize=(15,15),sharex=True,density=True)
This creates in my case a figure with 21 small histograms.
With single histograms, I'd paste counts, bins, bars = in front of the entire line and the variable counts would contain the data I was looking for, however, in this case it does not work.
Ideally I'd like a dataframe or list of some sort for each histogram, containing the density values of the bins. I hope someone can help me out! Thanks in advance!
Edit:
Data I'm using, about 2500 columns of this, Distance is float64, the Departuretime is str
Histogram output I'm receiving
Of all these histograms I want to know the y-axis value of each bar, preferably in a dataframe with the distance binning as rows and the hours as columns
By using the 'cut' function you can withdraw the requested data directly from your dataframe, instead of from the graph. This is less error-sensitive.
df['DistanceBin'] = pd.cut(df['Distance'], bins=10)
Then, you can use pivot_table to obtain a table with the counts for each combination of DistanceBin and Departuretime as rows and columns respectively as you asked.
df.pivot_table(index='DistanceBin', columns='Departuretime', aggfunc='count')

How to create a 100% stacked bar chart in R by counting data?

I am trying to create a bar chart using ggplot that adds up difference scores and groups them with positive or negative values and then creates a graph of the percentage. I can't seem to figure out the right code to do this however and could use some guidance.
I have two columns I am focusing on: one for the grade level and then another column with the difference score. I tried summing up the values of positive and negative for an aggregate total, but kept running into errors manipulating that data.
I ended up making a new column and merged it to the data frame if the values in a row were less than or greater than 0. I was able to graph this, but I struggle to create a 100% stacked bar chart.
Ideally what I hope to do is to create a stacked bar chart with grades 6th - 10th in the X-axis and the y-axis being the percentage of students in that grade with a positive difference score against the % with a negative score.
# Attempting to create a new column of boolean values to create the chart
Pos_Neg_df <- c(Fall_Math_Data$RITDifference >0)
Percentage_Math_Data <- cbind(Fall_Math_Data, Pos_Neg_df)
# Plotted this
ggplot(Percentage_Math_Data) +geom_bar(aes(x = Grade, fill = Pos_Neg_df)
Can you provide some sample data? It's difficult to see what exactly you're trying to do. That said, in your geom_bar, adding position = "stack" may be what you're looking for (see ggplot2 documentation.)

Excel scatter plot x axis displays only sequential numbers but not real data selected for x axis

I have an excel scatter plot with 5 different data series on single chart. First 4 series are working well. When I want to add a new series with similar x-axis data (0.0, 0.4, 0.9 .. ) the plot is displayed with x-axis values as 1,2,3 but not as the data specified.
Changing the chart types did not help. Not sure how can I get the x-axis as data but not as sequential numbers. Any help is appreciated. Thanks.
Added the screenshot of chart and its xaxis data. The values are in number format only just as data for other series. Everytime I am adding a new series on to this, its starting with one number later.... (1,2,3...) next series x axis at (2,3,4....) but not with real x values as selected.
Solved it my slef... The problem is X-axis range is for 18 cells and all the cells had formula with IF condition... When I removed the IF condition, x-axis worked well as numbers
The IF condition I used was "=IF(A10<>"",B10=A10-A4,""), for some reason excel chart considered this as some text and populated the x axis as 1,2,3 but not as the values specified.

Adding multiple lines to plot, without ggplot

I would like to plot multiple lines on the same plot, without using ggplot.
I have scores for different individuals across a set time period and wish to plot a line between yearly scores for each individual. Data is organised with each row representing an individual and each column an observed value in a given year.
Currently I am using a for loop, but am aware that this is often not efficient in R, and am interested if there are any more suitable approaches available within base R.
I will be working with up 100,000 individuals
Thanks.
Code:
df=data.frame(runif(10,0,100),runif(10,0,100),runif(10,0,100),runif(10,0,100))
df=data.frame(t(df))
Years=seq(1,10,1)
plot(1,type="n",xlab="Year",ylab="Score", xlim=c(1,10), ylim=c(0,100))
for(x in 1:4){lines(Years,df[x,])}
Efficiency is not much of a consideration when plotting since plotting to a device is a slow operation in itself. You can use matplot (which uses a loop internally). It's basically a more sophisticated version of your code wrapped in a function.
matplot(Years, t(df), xlab="Year", ylab="Score", type = "l")

Plot boxplots and line of time series data in R

I want to combine a time series of in situ values (line) with boxplots of estimated values of special dates. I tried to understand this "Add a line from different result to boxplot graph in ggplot2" question, but my dates make me drive crazy. Sometimes I only have in situ values of a date, sometimes only estimated values and sometimes both together.
I uploaded a sample of my data here:
http://www.file-upload.net/download-9942494/estimated.txt.html
http://www.file-upload.net/download-9942495/insitu.txt.html
How can I create a plot with both data sets that looks like this http://www.file-upload.net/download-9942496/desired_outputplot.png.html
in the end?
I got help and have a solution now:
insitu <- read.table("insitu.txt",header=TRUE,colClasses=c("Date","numeric"))
est <- read.table("estimated.txt",header=TRUE,colClasses=c("Date","numeric"))
insitu.plot <- xyplot(insitu~date_fname,data=insitu,type="l",
panel=function(x,y,...){panel.grid(); panel.xyplot(x,y,...)},xlab=list(label="Date",cex=2))
est.plot <- xyplot(estimated~date,data=est,panel=panel.bwplot,horizontal=FALSE)
both <- insitu.plot+est.plot
update(both,xlim=range(c(est$date,insitu$date_fname))+c(-1,1),ylim=range(c(est$estimated,insitu$insitu)))

Resources