R : Odd Behavior with Width On Layout - r

I'm trying to have my bottom graphs have widths proportional to the length of the month.
However, I end up with
I have 2 levels of graphs, 1 with a larger plot that takes an entire row and another with 12 plots that take up the entire 2nd row.
For the 2nd row plots, I wish to have their widths proportional to the length of the month so I do
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=c(1,months))
months
[1] 31 28 31 30 31 30 31 31 30 31 30 31
The odd thing is when I manually adjust the numbers in the widths array ( using 1s and 2s). The sizes do fluctuate accordingly. However, in this case. This appears to not be the case.
January is much too short. Am I missing something in my logic?

You have only 12 plots on the bottom line but you provided 13 values for the widths= argument (number 1 plus all months values). Just use widths=months to get the result you need. Number of values you provided with the widths= argument should be equal to the number of plots in one longest row but not to the number of all plots.
months<-c( 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=months)
layout.show(n = 13)

The answer of Didzis Elferts is a good answer. Mine juste to note that when you deal with time series you can use xts package. Specially that plot are done in the base graphics .
here an example:
I use monthly.apply to split my object.
I use plot.xts to plts my time series.
First , I generate some random data
library(xts)
days <- seq.Date( as.Date("2011-01-01"), as.Date("2011-12-31") ,1)
dat <- xts(rnorm(365),days)
## I use monthly apply to compute months widths.
## no need to give them by hand.
widths <- coredata(apply.monthly(dat,length))
par(bg="lightyellow", mar=c(2,2,2,0))
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=widths*2)
mon <- months(days,abbreviate=T)
plot(dat,main = 'my year time series')
apply.monthly(dat,function(x) {
if(unique(format((index(x)),'%m')) =='01') {#JAN
par(mar=c(2,2,2,0)) ## special case of JAN because it contians y axis
plot(x,main='')
}
else{
par( mar=c(2,0,2,0))
plot(x,main='',ylab='')
}
})
Note that : JAN panel is not smaller than Feb one. It contains the y axis.

Related

Annex data in r

I am trying to carry out following operation in R
I have different series of data,
series 1: 75, 56, 100, 23, 38, 40 series 2: 60, 18, 86, 100, 44
I would like to annex these data. To do so, I have to multiply series 1 by 1.5 to make last data of series 1 (40) match with the first data of the second series (60) (40*1.5=60)
Same way I would like to match many different series, but for other series I will need to multiply by other numbers. For another series i.e Series1: ...20 ; Series 2: 80... I would have to multiply it by 4.
How can I carry out such an operation to many series in many data frames?
Thanks in advance,
Given two vectors x and y, the function f(x,y) below will convert x the way you desire.
f <- function(x,y) x*(y[1]/x[length(x)])
Usage:
x = c(75,56,100,23,38,40)
y = c(60,18,86,100,44)
f(x,y)
Output:
[1] 112.5 84.0 150.0 34.5 57.0 60.0
However, how this approach gets applied to "many series in many data frames" depends on the actual structure you have, and what type of output you want.

R: locate element previous in vector within for loop and report in new column

I've looked through many older posts but nothing is really hitting the answer I need. In short: I have a data frame that contains observation data and the time of observation in days.
My goal is to add a column for weeks. I have already subsetted the data so that I only have the time vector at intervals of 7 (t == 7, 14, 21, etc). I just need to make a for loop that creates a new vector of "weeks" that I can then cbind to my data. I'd prefer it to be a character string so I can use it more easily in ggplot geom_historgram, but isn't as necessary as just creating the new vector successfully.
The tricky part of the data is that there is not an equal number of observations per time- t # 28 has maybe 5x as many observations as t #7, etc.
I want to create code that evaluates what t is, then checks to see if it is greater than the last element in the t vector. If it isn't, then populate the week vector with the last value it did, and if so, then increase it by 1.
I know this is bad from a like, computer science/R perspective in a lot of ways, but any help would be useful:
#fake data (in reality this is a huge data set with many observations at intervals of 1 for t
L = rnorm(50, mean=10, sd=2)
t = c((rep.int(7,3)), (rep.int(14,6)), rep.int(21,8), rep.int(28,12), (rep.int(31, 5)), (rep.int(36,16)))
fake = cbind(L,t)
#create df that has only the observations that are at weekly time points
dayofweek = seq(7,120,7)
df = subset(fake, t %in% dayofweek)
#create empty week vector
week = c()
#for loop with if-else statement nested to populate the week vector
for (i in 1:length(dayofweek)){
if (t = t[t-1]){
week = i
} else if (t > t[t-1]{
week = i+1
}
}
Thanks!!
I'm not sure I can follow what you want to do. If you want to determine which week the data fall within, why not:
set.seed(1)
L = rnorm(50, mean=10, sd=2)
...
fake <- data.frame(L=L, t=t)
fake$week <- floor(fake$t/7) # comment this out so t==7 becomes week==1 + 1
head(fake)
# L t week
# 1 8.747092 7 2
# 2 10.367287 7 2
# 3 8.328743 7 2
# 4 13.190562 14 3
# 5 10.659016 14 3
# 6 8.359063 14 3

How do I convert a variable of more than 2 vectors into a dummy variable?

Right now I am trying to create a new dummy variable in a dataset out of a variable that has more than two vectors. More specifically, my dataset has a "State" variable, and I want to make a dummy where 1 = states in the North, and 0 = all other states. Here's a portion of the dataset (it's an extremely large set so I'll only include the essential data):
Year StateICP
1 1940 71
2 1940 21
3 1940 22
4 1940 32
5 1940 18
6 1940 22
7 1940 45
8 1940 40
9 1940 33
So what I would want to do is create a new Column (called "North") where if the StateICP = 21, 22, 40, or 45, then the new variable would = 1, and otherwise would be 0. Like I said, this is a very large dataset (over 1000000 observations), so I can't enter it row by row manually. I tried an ifelse function, but that only gave me errors.
I'm sure this isn't that complicated, but I am fairly new to R. I know how to create a dummy variable normally, but I am getting stuck here. Any help would be greatly appreciated! Thank you!
So, creating simple dataset to replicate what you have above:
df <- data.frame(Year = rep(1940,500), StateICP = sample(1:100, 500, TRUE))
This will create a data.frame with columns like you describe and 500 records. The StateICP values are randomly generated integers between 1 and 100. If we want to code a boolean we could simply add a new column:
df$boolean <- df$StateICP %in% c(21, 22, 40, 45)
If we want to code them specifically as 0,1 as you describe then you can use ifelse:
df$dummy <- ifelse(df$StateICP %in% c(21, 22, 40, 45), 1, 0)
You have to make sure you are using a vector in the ifelse (since it does not accept a data argument).

R: graph multiple columns on one line

This seems simple, but I've tried multiple variations of matplot, ggplot2, regular old plot...I can't get any to do what I need.
I have a gigantic dataframe of years, months, and observations. I simplified it down to number of observations per month, per year, see below. I'm not sure why it read in with the "X" in front of each column heading, but if it's not going to affect the code, right now I don't care.
head(storms)
X Month X1992 X1993 X1994
1 1 1 2 1
2 2 2 4 1
3 3 3 26 10
4 4 4 47 26
5 5 5 969 615
The full (simplified) set is 10 columns of years (1992-2001), each with 12 months/rows of totals (1 storm in Jan 1992, 26 storms in March 1993...). I need simply to plot these all on an x-axis 120 months long, # of observations per month on the y-axis. It could be a line or bars or vertical lines. I've seen many ways to plot 20 lines with 12 months on the x-axis; that is not what I'm going for. I also need to label the years every 12 months, but I think I can figure that out after I get this block out of the way.
In other words (I hope this is more clear if the previous is not):
y axis: # of storms, ylim=c(0-1000)
x axis: 10 sets of months (Jan-Dec, 1992-2001, 120 months total). The only labels will be the years, every 12 months of course.
I know I'm just thinking about it wrong, could someone please set my head straight?
(first post; please also tell me if I'm not formatting or inquiring properly!)
is this something you are looking for? If I am not mistaken, you may want to rearrange your data frame. You wanna make your data frame longer rather than wider. Then, you can draw a figure. The thing is that you have 120 month. So you may need to think plot space issue. But at least this example let you move forward. I hope this helps you.
library(tidyr)
library(ggplot2)
# Create a sample data
month <- rep(c(1:12), each = 1, times = 2)
nintytwo <- runif(24, 0, 20)
nintythree <- runif(24, 0, 20)
# Crate a data frame
ana <- data.frame(month, nintytwo, nintythree)
# Make the data longer rather than wider.
bob <- gather(ana, year, value, -month)
bob$month <- as.factor(bob$month)
# Draw a firure
cathy <- ggplot(bob, aes(x= year,y = value, fill = month)) + geom_bar(stat="identity", position="dodge")
cathy
Here's an example using base R :
# create an example data
set.seed(123)
df <- data.frame(Month=1:12)
for(y in 1992:2001){
tmp <- data.frame(X=as.integer(abs(rnorm(12,mean=2,sd=10))))
colnames(tmp) <- paste("X",y,sep="")
df <- cbind(df,tmp)
}
# reshape to long format (one column with n.of storms, and period columns)
long <- reshape(df[,-1], idvar="Month", ids=df$Month,
times=names(df[,-1]), timevar="Year",
varying = list(names(df[,-1])),
direction = "long",v.names="Storms")
# remove the "X" from the year
long$Year <- substr(long$Year,2,nchar(long$Year))
nYears <- length(unique(long$Year))
# plot the line
plot(x=1:nrow(long),y=long$Storms,type="l",
xaxt="n",main="Monthly Storms",
xlab="Period",ylab="Storms",col="RoyalBlue")
# add custom labels
axis(1,at=((1:nYears)*12)-6,labels=unique(long$Year))
# add vertical lines
abline(v=c(0.5,((1:nYears)*12)+0.5),col="Gray80",lty=2)
Result :

R Searching for elements and their index in an array

I have a matrix with 2 columns as described below:
TIME PRICE
10 45
11 89
13 89
15 12
16 09
17 34
19 89
20 90
23 21
26 09
in the above matrix, I need to iterate through the TIME column adding 5 seconds and accessing the corresponding PRICE that matches the row.
For ex: I start with 10. i need to access 15 (10+5), I would've been able to get to 15 easily if the numbers in the column were continuous data, but its not. so at 15 seconds time, i need to get hold of the corresponding price. and this goes on till the end of the entire data set. my next element that needs to be accessed is 20, and its corresponding price. now i again add 5 seconds and it hence goes on. incase the element is not present, the one immediately greater than it must be accessed to obtain the corresponding price.
If the rows you want to extract are m[1,1]+5, m[1,1]+10, m[1,1]+15 etc then:
m <- cbind(TIME=c(10,11,13,15,16,17,19,20,23,26),
PRICE=c(45,89,89,12,9,34,89,90,21,9))
r <- range(m[,1]) # 10,26
r <- seq(r[1]+5, r[2], 5) # 15,20,25
r <- findInterval(r-1, m[,1])+1 # 4,8,10 (values 15,20,26)
m[r,2] # 12,90,9
findInterval finds the index for values that are equal or less than the given value, so I give it a smaller value and then add 1 to the index.
Breaking the question apart into sub-pieces...
Getting the row with value 15:
Call your Matrix, say, DATA, and
[1] extract the row of interest:
DATA[DATA[,1] == 15, ]
Then snag the second column.
[2] Adding 5 to the first column ( I'm pretty sure you can just do this ):
DATA[,1] = DATA[,1] + 5
This should get you started. The rest seems to just be some funky iteration, incrementing by 5, using [1] to get the price you want each time, swapping 15 for some variable.
I leave the rest of the solution as an exercise to the reader. For tips on looping in R, and more, see the below tutorial ( I don't expect it to be taken down any time soon, but may want to keep a local copy. Good luck :) )
http://www.stat.berkeley.edu/users/vigre/undergrad/reports/VIGRERintro.pdf
As #Tommy commented above, it is not clear what TIME you exactly want to get. For me, it seems like you want to get the PRICE for the sequence 10,15,20,25,... If true, you could do that easily suing the mod (%%) function:
TIME <- c(10,11,13,15,16,17,19,20,23,26) # Your times
PRICE <- c(45,89,89,12,9,34,89,90,21,9) # your prices
PRICE[TIME %% 5 == 0] # Get prices from times in sequence 10, 15, 20, ...

Resources