How to remove duplicate values in specific column without removing related row - r

Want to remove duplicate values in specific column without deleting the rows related with duplicate column values as below example:
Input
-----
Date Market Quantity
4/2/2018 Indonesia 1000
4/2/2018 Australia 500
4/2/2018 India 300
4/2/2018 USA 500
4/2/2018 Germany 200
5/2/2018 India 400
5/2/2018 Japan 400
5/2/2018 Russia 457
6/2/2018 Austria 260
6/2/2018 Swiss 700
6/2/2018 USA 1200
6/2/2018 Indonesia 400
output
------
Date Market Quantity
4/2/2018 Indonesia 1000
Australia 500
India 300
USA 500
Germany 200
5/2/2018 India 400
Japan 400
Russia 457
6/2/2018 Austria 260
Swiss 700
USA 1200
Indonesia 400
And if possible , how to plot a graph(bar/column) for same output(something like given)?
Sample Graph

I would add this to comments but I don't have rights yet...
I don't think you actually want to change the data, but as a few mentioned in the comments there are easy ways to do that.
If you're just trying to show the multi-dimensional data in plotly and you're just not familiar with the library syntax try the code below...
df <- data.frame(Date = c('2018/04/02','2018/04/02','2018/04/02','2018/04/02','2018/04/02','2018/05/02','2018/05/02','2018/05/02','2018/06/02','2018/06/02','2018/06/02','2018/06/02'),
Market = c('Indonesia','Australia','India','USA','Germany','India','Japan','Russia','Austria','Swiss','USA','Indonesia'),
Quantity = c(1000,500,300,500,200,400,400,457,260,700,1200,400),
stringsAsFactors = F)
plotly::ggplotly(
ggplot2::ggplot(df, ggplot2::aes(x=Market, y=Quantity)) +
ggplot2::geom_col(ggplot2::aes(fill=Market))+
ggplot2::facet_grid(~Date,scale='free_x') +
ggthemes::theme_tufte()
)

Related

Count the number of repetitive values separated by commas

I have the set that looks something like :
colA
Nepal , India , USA
USA
India
USA
Nepal , India
USA
USA, Nepal
Nepal
Japan
so I want the count as :
COlB
Count
Nepal
4
India
3
USA
5
Japan
4
Is there a way to do it, without going into the Tableau Prep and directly from Tableau Reader with the use of calculative fields or something similar within it.

How to refer to specific row in R by row name?

I have just loaded built-in R data set 'emissions'.
I would like to remove from data set first row 'United States'.
Apparently I can do it like:
data2 <- data[1,]
but what, if i know the name of row but not a position in data set?
How to remove it refering only to name, knowing that this row is named 'United States'?
Here is how data set looks like:
GDP perCapita CO2
UnitedStates 8083000 29647 6750
Japan 3080000 24409 1320
Germany 1740000 21197 1740
France 1320000 22381 550
UnitedKingdom 1242000 21010 675
Italy 1240000 21856 540
Russia 692000 4727 2000
Canada 658000 21221 700
Spain 642400 16401 370
Australia 394000 20976 480
Netherlands 343900 21755 240
Poland 280700 7270 400
Belgium 236300 23208 145
Sweden 176200 19773 75
I only tried to refer to it by row positions. Works fine, but I guess in bigger data sets I will not scroll trough rows and count them...
You could filter your dataframe by row.names using the following code:
data2[!(row.names(data2) %in% "UnitedStates"),]
#> GDP perCapita CO2
#> Japan 3080000 24409 1320
#> Germany 1740000 21197 1740
#> France 1320000 22381 550
#> UnitedKingdom 1242000 21010 675
#> Italy 1240000 21856 540
#> Russia 692000 4727 2000
#> Canada 658000 21221 700
#> Spain 642400 16401 370
#> Australia 394000 20976 480
#> Netherlands 343900 21755 240
#> Poland 280700 7270 400
#> Belgium 236300 23208 145
#> Sweden 176200 19773 75
Created on 2022-12-26 with reprex v2.0.2
Make sure you spelled the row name right.
Data:
data2 <- read.table(text = ' GDP perCapita CO2
UnitedStates 8083000 29647 6750
Japan 3080000 24409 1320
Germany 1740000 21197 1740
France 1320000 22381 550
UnitedKingdom 1242000 21010 675
Italy 1240000 21856 540
Russia 692000 4727 2000
Canada 658000 21221 700
Spain 642400 16401 370
Australia 394000 20976 480
Netherlands 343900 21755 240
Poland 280700 7270 400
Belgium 236300 23208 145
Sweden 176200 19773 75', header = TRUE)
yet another approach:
setdiff(rownames(data2),
c('UnitedStates', 'SkipThis', 'OmitThatToo')
) %>%
data2[., ]
Using which:
mtcars[which(rownames(mtcars)!='Mazda RX4'),]
As it has been said before:
df[!row.names(df) == "United States",]

R output each data frame by a list of data

I have a list of data and I want to sort them out by their name into individual data frame.
list:
[1]
Name Year Wage
John 2000 500
Paul 2000 600
Peter 2000 800
Mary 2000 700
Kai 2000 800
[2]
Name Year Wage
John 2005 600
Paul 2005 700
Peter 2005 1000
Mary 2005 750
Kai 2005 850
[3]
Name Year Wage
John 2010 1600
Paul 2010 900
Peter 2010 1200
Mary 2010 950
Kai 2010 950
[n]
Name Year Wage
John 2011 1800
Paul 2011 1000
Peter 2011 1600
Mary 2011 850
Kai 2011 1050
Desired data frame 1:
Name Year Wage
John 2000 500
John 2005 600
John 2010 1600
John 2011 1800
Desired data frame 2:
Name Year Wage
Paul 2000 600
Paul 2005 700
Paul 2010 900
Paul 2011 1000
and every name has its own .csv output.
I tried
listy <- list.files(path = "./",pattern = "*_output.csv", full.names = FALSE,recursive = TRUE)
lapply(listy, read.csv)
Then I have no idea how to continue. Thank you for your help.
We can rbind the list of data.frames into a single dataset and then do the split
library(dplyr)
lstN <- bind_rows(lst) %>%
split(., .$Name)
lapply(names(lstN), function(nm) write.csv(lstN[[nm]], paste0(nm, ".csv"),
row.names = FALSE, quote = FALSE)
data
lst <- lapply(listy, read.csv, stringsAsFactors=FALSE)

How to Convert Numeric Data into Currency in R?

Searched Google and SO and couldn't find a good answer. I have the following table:
Country Value
23 Bolivia 2575.684
71 Guyana 3584.693
125 Paraguay 3878.150
49 Ecuador 5647.638
126 Peru 6825.461
38 Colombia 7752.168
151 Suriname 9376.495
25 Brazil 11346.796
7 Argentina 11610.220
171 Venezuela 12766.725
168 Uruguay 14702.505
37 Chile 15363.098
All values are in US dollars - I'd like to add in the dollar signs and the commas. Bolivia's value should therefore read $2,575.684. Also, is there any real need to change row names to 1 through 12? If so, an easy way to do so?
Thanks in advance.
paste('$',formatC(df$Value, big.mark=',', format = 'f'))

Tips on differencing values in R data frame by group [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Beginner tips on using plyr to calculate year-over-year change across groups
What is a good way to calcualte a year-on-year difference (new variable) for an existing data frame variable (i.e. sales) across multiple variable groups (i.e. Region and Food)?
Below is a example of the data frame structure:
Date Region Type Sales
1/1/2001 East Food 120
1/1/2001 West Housing 130
1/1/2001 North Food 130
1/2/2001 East Food 133
1/3/2001 West Housing 140
1/4/2001 North Food 150
….
….
1/29/2013 East Food 125
1/29/2013 West Housing 137
1/29/2013 North Food 1350
Also, in addition to differening the data, I would like to calcuate a a trailing (say 7 day) moving average.
Any guidance would be greatly appreciated.
Here is something to get you started. data.table is a great package for this sort of things as it provides a concise and easy-to-use syntax (once you are past the learning curve) for these kinds of things.
library(data.table)
Create a reproducible example
set.seed(128)
regions = c("East", "West", "North", "South")
types = c("Food", "Housing")
dates <- seq(as.Date('2009-01-01'), as.Date('2011-12-31'), by = 1)
n <- length(dates)
dt <- data.table(Date = dates,
Region = sample(regions, n, replace = TRUE),
Type = sample(types, n, replace = TRUE),
Sales = round(rnorm(n, mean = 100, sd = 10)))
Add Year column
dt[, Year := year(Date)]
> dt
Date Region Type Sales Year
1: 2009-01-01 West Food 119 2009
2: 2009-01-02 North Housing 102 2009
3: 2009-01-03 North Housing 102 2009
4: 2009-01-04 North Food 101 2009
5: 2009-01-05 West Food 101 2009
---
1091: 2011-12-27 East Housing 122 2011
1092: 2011-12-28 East Housing 88 2011
1093: 2011-12-29 North Food 115 2011
1094: 2011-12-30 West Housing 96 2011
1095: 2011-12-31 East Food 101 2011
Calculate summary by year
summary <- dt[, list(Sales = sum(Sales)), by = 'Year,Region,Type']
setkey(summary, 'Year')
> head(summary)
Year Region Type Sales
1: 2009 West Food 4791
2: 2009 North Housing 3517
3: 2009 North Food 6774
4: 2009 South Housing 4380
5: 2009 East Food 4144
6: 2009 West Housing 4275
Function to create year-on-year diffs for each region/product combo.
YoYdiff <- function(dt) {
# Calculate year-on-year difference for Sales column
data.table(Sales.Diff = diff(dt$Sales), Year = dt$Year[-1])
}
Calculate year-on-year difference by column. This works for my example as setkey(dt, Year) sorts the data table by Year, but if your example misses some years for some products/regions you have to be more careful.
> summary[, YoYdiff(.SD), by = 'Region,Type']
Region Type Sales.Diff Year
1: West Food -412 2010
2: West Food 121 2011
3: North Housing 1907 2010
4: North Housing -1457 2011
5: North Food -3087 2010
6: North Food 369 2011
7: South Housing -539 2010
8: South Housing 575 2011
9: East Food 1264 2010
10: East Food -1732 2011
11: West Housing 298 2010
12: West Housing -410 2011
13: South Food -889 2010
14: South Food 1045 2011
15: East Housing 1146 2010
16: East Housing 1169 2011

Resources