ggplot2 + Date structure using scale X - r

I really need help here because I am way beyond lost.
I am trying to create a line chart showing several teams' performance over a year. I divided the year into quarters: 1/1/2012, 4/1/12. 8/1/12. 12/1/12 and loaded the csv data frame into R.
Month Team Position
1 1/1/12 South Africa 56
2 1/1/12 Angola 85
3 1/1/12 Morocco 61
4 1/1/12 Cape Verde Islands 58
5 4/1/12 South Africa 71
6 4/1/12 Angola 78
7 4/1/12 Morocco 62
8 4/1/12 Cape Verde Islands 76
9 8/1/12 South Africa 67
10 8/1/12 Angola 85
11 8/1/12 Morocco 68
12 8/1/12 Cape Verde Islands 78
13 12/1/12 South Africa 87
14 12/1/12 Angola 84
15 12/1/12 Morocco 72
16 12/1/12 Cape Verde Islands 69
When I try using ggplot2 to generate the graph the fourth quarter 12/1/12 inexplicably moves to the second spot.
ggplot(groupA, aes(x=Month, y=Position, colour=Team, group=Team)) + geom_line()
I then put this plot into a variable GA in order to try to use scale_x to format the date:
GA + scale_x_date(labels = date_format("%m/%d"))
But I keep getting this Error:
Error in structure(list(call = match.call(), aesthetics = aesthetics, :
could not find function "date_format"
And if I run this code:
GA + scale_x_date()
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
I am using a Mac OS X running R 2.15.2
Please help.

Its because, df$Month, (assuming your data.frame is df), which is a factor has its levels in this order.
> levels(df$Month)
# [1] "1/1/12" "12/1/12" "4/1/12" "8/1/12"
The solution is to re-order the levels of your factor.
df$Month <- factor(df$Month, levels=df$Month[!duplicated(df$Month)])
> levels(df$Month)
# [1] "1/1/12" "4/1/12" "8/1/12" "12/1/12"
Edit: Alternate solution using strptime
# You could convert Month first:
df$Month <- strptime(df$Month, '%m/%d/%y')
Then your code should work. Check the plot below:

Related

Plotting time series ggplot month-year, xaxis only show month with value?

I have a 8 year time series data. I am able to plot my data but I want the x axis only to show the month which I have data for.
My problem here is that my x axis shows january but I have data only for june, july and august for each year.
I would also like to add vertical line to separate each year..
Here is how my script looks like so far:
ggplot(data=CMRB, aes(x=D, y=Densite, group = habitat)) + geom_line() + scale_x_date(date_labels ="%b%Y")+ geom_point( aes(shape=habitat),size=4, fill="white")
And my data looks like:
Annee Grille Periode Densite SE Methode espece notes notes_2
82 2004 LG1 PP2 1.8888330 0.3990163 secr brun NA
83 2004 LG1 PP3 3.8880450 0.7570719 secr brun NA
84 2004 LG1 PP4 3.3281370 0.5573953 secr brun NA
85 2005 LG1 PP1 0.2367488 NA secr brun mnka NA
86 2005 LG1 PP2 0.4791649 0.2105729 secr brun NA
87 2005 LG1 PP3 0.1597214 0.1302571 secr brun NA
habitat Mois Date D
82 humid 07 07/1/2004 2004-07-01
83 humid 08 08/1/2004 2004-08-01
84 humid 08 08/1/2004 2004-08-01
85 humid 06 06/1/2005 2005-06-01
86 humid 07 07/1/2005 2005-07-01
87 humid 08 08/1/2005 2005-08-01
>
D is a column I have created to tranform Date(which is a character) into a date format.
Does somebody knows how to do that ? If possible I would also like the month without data to take less space into the graph to leave more space to see the data from june to august...
Cheers
Nico
This should convert into a date column.
CMRB <- as.Date(CMRB$D, format = "%Y-%m-%d")
If you want to plot time-series data, I suggest using dygraphs
For example,
library(dygraphs)
library(xts)
ts_object <- as.xts(CMRB$Densite, CMRB$D)
dygraph(ts_object)
Here's the holy grail of websites to guide you through dygraphs.
https://rstudio.github.io/dygraphs/

How to Convert Numeric Data into Currency in R?

Searched Google and SO and couldn't find a good answer. I have the following table:
Country Value
23 Bolivia 2575.684
71 Guyana 3584.693
125 Paraguay 3878.150
49 Ecuador 5647.638
126 Peru 6825.461
38 Colombia 7752.168
151 Suriname 9376.495
25 Brazil 11346.796
7 Argentina 11610.220
171 Venezuela 12766.725
168 Uruguay 14702.505
37 Chile 15363.098
All values are in US dollars - I'd like to add in the dollar signs and the commas. Bolivia's value should therefore read $2,575.684. Also, is there any real need to change row names to 1 through 12? If so, an easy way to do so?
Thanks in advance.
paste('$',formatC(df$Value, big.mark=',', format = 'f'))

R Merging Boxplots

I am trying to use R to show a merged boxplot, I am sure this is easy, I just am missing something:
boxplot(WHO$Male, WHO$Female, ylim=c(0,100))
boxplot(WHO$Female ~ WHO$Year, ylim=c(0,100))
boxplot(WHO$Male ~ WHO$Year, ylim=c(0,100))
All three work, but when I try:
boxplot(WHO$Male ~ WHO$Year, WHO$Female ~ WHO$Year, ylim=c(0,100))
It returns:
Error in as.data.frame.default(data) :
cannot coerce class ""formula"" to a data.frame
Note, Year, only contains three numbers, 1990, 2000, 2010
> head(WHO)
Year WHO.region Country Male Female
1 1990 Africa Algeria 66 68
2 1990 Africa Angola 39 43
3 1990 Africa Benin 45 50
4 1990 Africa Botswana 63 66
5 1990 Africa Burkina Faso 45 49
6 1990 Africa Burundi 47 50
reshape2 package does something similar. Actually there was quite similar question - Plot multiple boxplot in one graph, maybe it will be helpful.

How to reverse coordinates on a line graph ggplot2 R

I'm working on a data visualization project and am making some line graphs. This is my data set:
groupA <- read.csv("afcongroupA.csv", header=T, row.names=NULL)
groupA
Date Team Position
1 1/12 South Africa 56
2 1/12 Angola 85
3 1/12 Morocco 61
4 1/12 Cape Verde Islands 58
5 4/12 South Africa 71
6 4/12 Angola 78
7 4/12 Morocco 62
8 4/12 Cape Verde Islands 76
9 8/12 South Africa 67
10 8/12 Angola 85
11 8/12 Morocco 68
12 8/12 Cape Verde Islands 78
13 12/12 South Africa 87
14 12/12 Angola 84
15 12/12 Morocco 72
16 12/12 Cape Verde Islands 69
I then plotted them on a line graph to show the rise of decline in position standings:
groupA$Date <- factor(groupA$Date, levels=groupA$Date[!duplicated(groupA$Date)])
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) + geom_line()
What I want to do is reverse the y-axis so that the largest number is at the bottom. I tried this bit of code:
groupA <- coord_flip() + scale_x_reverse()
But I get this error message:
Error in coord_flip() + scale_x_reverse() :
non-numeric argument to binary operator
I'm using R 2.15.2 on a Mac running OS X.
As your column Date is a factor then scale_x_reverse() won't work. One solution is to order your levels of factors in data frame
groupA$Date <- factor(groupA$Date, levels=rev(unique(groupA$Date)))
Then just use your code to make plot and flip axis.
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) +
geom_line()+coord_flip()

Calculate Concentration Index by Region and Year (panel data)

This is my first post and very stuck on trying to build my first function that calculates Herfindahl measures on Firm gross output, using panel data (year=1998:2007) with firms = obs. by year (1998-2007) and region ("West","Central","East","NE") and am having problems with passing arguments through the function. I think I need to use two loops (one for time and one for region). Any help would be useful.. I really dont want to have to subset my data 400+ times to get herfindahl measures one at a time. Thanks in advance!
Below I provide: 1) My starter code (only returns one value); 2) desired output (2-bins that contain the hefindahl measures by 1) year and by 2) year-region); and 3) original data
1) My starter Code
myherf<- function (x, time, region){
time = year # variable is defined in my data and includes c(1998:2007)
region = region # Variable is defined in my data, c("West", "Central","East","NE")
for (i in 1:length(time)) {
for (j in 1:length(region)) {
herf[i,j] <- x/sum(x)
herf[i,j] <- herf[i,j]^2
herf[i,j] <- sum(herf[i,j])^1/2
}
}
return(herf[i,j])
}
myherf(extractiveoutput$x, i, j)
Error in herf[i, j] <- x/sum(x) : object 'herf' not found
2) My desired outcome is the following two vectors:
A. (1x10 vector)
Year herfindahl(yr)
1998 x
1999 x
...
2007 x
B. (1x40 vector)
Year Region hefindahl(yr-region)
1998 West x
1998 Central x
1998 East x
1998 NE x
...
2007 West x
2007 Central x
2007 East x
2007 northeast x
3) Original Data
Obs. industry year region grossoutput
1 06 1998 Central 0.048804830
2 07 1998 Central 0.011222478
3 08 1998 Central 0.002851575
4 09 1998 Central 0.009515881
5 10 1998 Central 0.0067931
...
12 06 1999 Central 0.050861447
13 07 1999 Central 0.008421093
14 08 1999 Central 0.002034649
15 09 1999 Central 0.010651283
16 10 1999 Central 0.007766118
...
111 06 1998 East 0.036787413
112 07 1998 East 0.054958377
113 08 1998 East 0.007390260
114 09 1998 East 0.010766598
115 10 1998 East 0.015843418
...
436 31 2007 West 0.166044176
437 32 2007 West 0.400031011
438 33 2007 West 0.133472059
439 34 2007 West 0.043669662
440 45 2007 West 0.017904620
You can use the conc function from the ineq library. The solution gets really simple and fast using data.table.
library(ineq)
library(data.table)
# convert your data.frame into a data.table
setDT(df)
# calculate inequality of grossoutput by region and year
df[, .(inequality = conc(grossoutput, type = "Herfindahl")), by=.(region, year) ]

Resources