How to Convert Numeric Data into Currency in R? - r

Searched Google and SO and couldn't find a good answer. I have the following table:
Country Value
23 Bolivia 2575.684
71 Guyana 3584.693
125 Paraguay 3878.150
49 Ecuador 5647.638
126 Peru 6825.461
38 Colombia 7752.168
151 Suriname 9376.495
25 Brazil 11346.796
7 Argentina 11610.220
171 Venezuela 12766.725
168 Uruguay 14702.505
37 Chile 15363.098
All values are in US dollars - I'd like to add in the dollar signs and the commas. Bolivia's value should therefore read $2,575.684. Also, is there any real need to change row names to 1 through 12? If so, an easy way to do so?
Thanks in advance.

paste('$',formatC(df$Value, big.mark=',', format = 'f'))

Related

How to filter values within a threshold in R

I have a data set that looks like this with the first 10 rows
country freq
Albania 2
Argentina 4
Australia 26
Austria 14
Belgium 22
Brazil 46
Bulgaria 2
Cambodia 2
Canada 37
Chile 19
I want to filter out counts(frequency) that are less than 30
i tried this code:
dd %>%
group_by(freq) %>%
filter(n()<30)
The output was same with the dataset. I did not get want i want
how do I resolve this?
Thanks in advance
Use simple indexing. Why are you grouping by?
dd <- dd[dd$freq >= 30, ]

Add columns to other columns

I would like to take two columns and add them two other columns. For example, I have the data below:
EU.Member.States X. Other.countries..continued. X..1
Austria 122 Cameroon 203
Belgium 150 Canada 156
Denmark 179 Canary Islands 132
Finland 156 Cape Verde 147
France 130 Cayman Islands 213
How can I take the rows under "Other.countries..continued." and "X..1" and add them directly under "EU.Member.States" and "X." respectively?
I have tried using unite of (tidyr) with no success.
Your question is almost identical to this one. Using the piping from dplyr package I can suggest a solution by first duplicating your column names, and then applying classic rbind. I used only the first 2 lines of your example:
df %>% setNames(names(df)[c(1,2,1,2)]) %>% {rbind(.[,1:2], .[,3:4])}
#### EU.Member.States X.
#### 1 Austria 122
#### 2 Belgium 150
#### 3 Cameroon 203
#### 4 Canada 156
Note: the brackets are here to tell the piping not to take the . as an implicit first argument.

R Merging Boxplots

I am trying to use R to show a merged boxplot, I am sure this is easy, I just am missing something:
boxplot(WHO$Male, WHO$Female, ylim=c(0,100))
boxplot(WHO$Female ~ WHO$Year, ylim=c(0,100))
boxplot(WHO$Male ~ WHO$Year, ylim=c(0,100))
All three work, but when I try:
boxplot(WHO$Male ~ WHO$Year, WHO$Female ~ WHO$Year, ylim=c(0,100))
It returns:
Error in as.data.frame.default(data) :
cannot coerce class ""formula"" to a data.frame
Note, Year, only contains three numbers, 1990, 2000, 2010
> head(WHO)
Year WHO.region Country Male Female
1 1990 Africa Algeria 66 68
2 1990 Africa Angola 39 43
3 1990 Africa Benin 45 50
4 1990 Africa Botswana 63 66
5 1990 Africa Burkina Faso 45 49
6 1990 Africa Burundi 47 50
reshape2 package does something similar. Actually there was quite similar question - Plot multiple boxplot in one graph, maybe it will be helpful.

Coding for the onset of an event in panel data in R

I was wondering if you could help me devise an effortless way to code this country-year event data that I'm using.
In the example below, each row corresponds with an ongoing event (that I will eventually fold into a broader panel data set, which is why it looks bare now). So, for example, country 29 had the onset of an event in 1920, which continued (and ended) in 1921. Country 23 had the onset of the event in 1921, which lasted until 1923. Country 35 had the onset of an event that occurred in 1921 and only in 1921, et cetera.
country year
29 1920
29 1921
23 1921
23 1922
23 1923
35 1921
64 1926
135 1928
135 1929
135 1930
135 1931
135 1932
135 1933
135 1934
120 1930
70 1932
What I want to do is create "onset" and "ongoing" variables. The "ongoing" variable in this sample data frame would be easy. Basically: Data$ongoing <- 1
I'm more interested in creating the "onset" variable. It would be coded as 1 if it marks the onset of the event for the given country. Basically, I want to create a variable that looks like this, given this example data.
country year onset
29 1920 1
29 1921 0
23 1921 1
23 1922 0
23 1923 0
35 1921 1
64 1926 1
135 1928 1
135 1929 0
135 1930 0
135 1931 0
135 1932 0
135 1933 0
135 1934 0
120 1930 1
70 1932 1
If you can think of effortless ways to do this in R (that minimizes the chances of human error when working with it in a spreadsheet program like Excel), I'd appreciate it. I did see this related question, but this person's data set doesn't look like mine and it may require a different approach.
Thanks. Reproducible code for this example data is below.
country <- c(29,29,23,23,23,36,64,135,135,135,135,135,135,135,120,70)
year <- c(1920,1921,1921,1922,1923,1921,1926,1928,1929,1930,1931,1932,1933,1934,1930,1932)
Data=data.frame(country=country,year=year)
summary(Data)
Data
This should work, even with multiple onsets per country:
Data$onset <- with(Data, ave(year, country, FUN = function(x)
as.integer(c(TRUE, tail(x, -1L) != head(x, -1L) + 1L))))
You could also do this:
library(data.table)
setDT(Data)[, onset := (min(country*year)/country == year) + 0L, country]
This could be very fast when you have a larger dataset.

ggplot2 + Date structure using scale X

I really need help here because I am way beyond lost.
I am trying to create a line chart showing several teams' performance over a year. I divided the year into quarters: 1/1/2012, 4/1/12. 8/1/12. 12/1/12 and loaded the csv data frame into R.
Month Team Position
1 1/1/12 South Africa 56
2 1/1/12 Angola 85
3 1/1/12 Morocco 61
4 1/1/12 Cape Verde Islands 58
5 4/1/12 South Africa 71
6 4/1/12 Angola 78
7 4/1/12 Morocco 62
8 4/1/12 Cape Verde Islands 76
9 8/1/12 South Africa 67
10 8/1/12 Angola 85
11 8/1/12 Morocco 68
12 8/1/12 Cape Verde Islands 78
13 12/1/12 South Africa 87
14 12/1/12 Angola 84
15 12/1/12 Morocco 72
16 12/1/12 Cape Verde Islands 69
When I try using ggplot2 to generate the graph the fourth quarter 12/1/12 inexplicably moves to the second spot.
ggplot(groupA, aes(x=Month, y=Position, colour=Team, group=Team)) + geom_line()
I then put this plot into a variable GA in order to try to use scale_x to format the date:
GA + scale_x_date(labels = date_format("%m/%d"))
But I keep getting this Error:
Error in structure(list(call = match.call(), aesthetics = aesthetics, :
could not find function "date_format"
And if I run this code:
GA + scale_x_date()
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
I am using a Mac OS X running R 2.15.2
Please help.
Its because, df$Month, (assuming your data.frame is df), which is a factor has its levels in this order.
> levels(df$Month)
# [1] "1/1/12" "12/1/12" "4/1/12" "8/1/12"
The solution is to re-order the levels of your factor.
df$Month <- factor(df$Month, levels=df$Month[!duplicated(df$Month)])
> levels(df$Month)
# [1] "1/1/12" "4/1/12" "8/1/12" "12/1/12"
Edit: Alternate solution using strptime
# You could convert Month first:
df$Month <- strptime(df$Month, '%m/%d/%y')
Then your code should work. Check the plot below:

Resources