Count the number of repetitive values separated by commas - count

I have the set that looks something like :
colA
Nepal , India , USA
USA
India
USA
Nepal , India
USA
USA, Nepal
Nepal
Japan
so I want the count as :
COlB
Count
Nepal
4
India
3
USA
5
Japan
4
Is there a way to do it, without going into the Tableau Prep and directly from Tableau Reader with the use of calculative fields or something similar within it.

Related

Find string, if does not exist, find another string

I have many files from OECD that have data available for different regional granularities. An example would be:
File A
REG_ID Region
AUS Australia
AU1GS Sydney
AU1 New South Wales
AU2 Victoria
AU2GM Melbourne
File B
REG_ID Region
AUS Australia
AU1GS Sydney
AU2GM Melbourne
File C
REG_ID Region
AUS Australia
AU1 New South Wales
AU1GS Sydney
AU2 Victoria
I want to extract the most granular region, in this case Sydney only, and not New South Wales. However, if Sydney is unavailable, I want to extract New South Wales.
How do I write code that is generalisable to all these files?

Summarize data using doBy package at region level

I have a dataset Data as below,
Region Country Market Price
EUROPE France France 30.4502
EUROPE Israel Israel 5.14110965
EUROPE France France 8.99665
APAC CHINA CHINA 2.6877232
APAC INDIA INDIA 60.9004
AFME SL SL 54.1729685
LA BRAZIL BRAZIL 56.8606917
EUROPE RUSSIA RUSSIA 11.6843732
APAC BURMA BURMA 63.5881232
AFME SA SA 115.0733685
I would like to summarize the data at Region level and get the SUM of Price at every Region Level.
I want the ouput to be Like below.
Data Output
Region Country Price
EUROPE France 30.4502
EUROPE Israel 5.14110965
EUROPE France 8.99665
EUROPE RUSSIA 11.6843732
Europe 56.27233285
APAC BURMA 63.5881232
APAC CHINA 2.6877232
APAC INDIA 60.9004
Apac 127.1762464
AFME BAHARAIN 54.1729685
AFME SA 115.0733685
AFME 169.246337
LA BRAZIL 56.8606917
LA 56.8606917
I have used summaryBy function of doBy package, i have tried the code below.
summaryBy
myfun1 <- function(x){c(s=Sum(x)}
DB= summaryBy(Data$Price ~Region + Country , data=Data, FUN=myfun1)
Anyhelp on this regard is very much appreciated.
You can do this by using dplyr to generate a summary table:
library(dplyr)
totals <- data %>% group_by(Region) %>% summarise(Country="",Price=sum(Price))
And then merging the summary with the rest of the data:
summary <- rbind(data[-3], totals)
Then you can sort by Region to put the summary with the region:
summary <- summary %>% arrange(Region)
Output:
Region Country Price
1 AFME SL 54.1730
2 AFME SA 115.0734
3 AFME 169.2463
4 APAC CHINA 2.6877
5 APAC INDIA 60.9004
6 APAC BURMA 63.5881
7 APAC 127.1762
8 EUROPE France 30.4502
9 EUROPE Israel 5.1411
10 EUROPE France 8.9967
11 EUROPE RUSSIA 11.6844
12 EUROPE 56.2723
13 LA BRAZIL 56.8607
14 LA 56.8607
You have to split data by Region factor and sum Price for each factor
lapply(split(data, data$Region), function(x) sum(x$Price))
Or, if you need to present result as you have shown:
totals = lapply(split(data, data$Region), function(x) rbind(x,data.frame(Region=unique(x$Region), Country="", Market="", Price=sum(x$Price))))
do.call(rbind, totals)

How can I count the number of instances a value occurs within a subgroup in R?

I have a data frame that I'm working with in R, and am trying to check how many times a value occurs within its larger, associated group. Specifically, I'm trying to count the number of cities that are listed for each particular country.
My data look something like this:
City Country
=========================
New York US
San Francisco US
Los Angeles US
Paris France
Nantes France
Berlin Germany
It seems that table() is the way to go, but I can't quite figure it out — how can I find out how many cities are listed for each country? That is to say, how can I find out how many fields in one column are associated with a particular value in another column?
EDIT:
I'm hoping for something along the lines of
3 US
2 France
1 Germany
I guess you can try table.
table(df$Country)
# France Germany US
# 2 1 3
Or using data.table
library(data.table)
setDT(df)[, .N, by=Country]
# Country N
#1: US 3
#2: France 2
#3: Germany 1
Or
library(plyr)
count(df$Country)
# x freq
#1 France 2
#2 Germany 1
#3 US 3

Mean of time - hh:mm:ss - group by a variable

Need to calculate the mean of Time by Country. Time is a Date variable - hh:mm:ss.
This command with(df,tapply(as.numeric(times(df$Time)),Country,mean))
is not returning the correct mean in hh:mm:ss.
Country Time
1 Germany 2:26:21
2 Germany 2:19:19
3 Brazil 2:06:34
4 USA 2:06:17
5 Eth 2:18:58
6 Japan 2:08:35
7 Morocco 2:05:27
8 Germany 2:13:57
9 Romania 2:21:30
10 Spain 2:07:23
Output:
>with(df,tapply(as.numeric(times(df$Time)),Country,mean))
Andorra Australia Brazil Canada China
0.09334491 0.09634259 0.09578125 0.09634645 0.09481192
Eritrea Ethiopia France Germany Great Britain
0.09709491 0.09010031 0.10025463 0.09713349 0.09524306
Ireland Italy Japan Kenya Morocco
0.09593750 0.09520255 0.09579630 0.08934854 0.09400463
New Zeland Peru Poland Romania Russia
0.09664931 0.09809606 0.09638889 0.09875000 0.09327932
Spain Switzerland Uganda United States Zimbabwe
0.09314236 0.09620949 0.10068287 0.09399016 0.09892940
I see you've discovered the agony of working with date and time values in R...
Is this what you had in mind?
df$nTime <- difftime(strptime(df$Time,"%H:%M:%S"),
strptime("00:00:00","%H:%M:%S"),
units="secs")
df.means <- aggregate(df$nTime,by=list(df$Country),mean)
df.means$Time <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
df.means
Group.1 x Time
# 1 Brazil 7594.000 02:06:34
# 2 Eth 8338.000 02:18:58
# 3 Germany 8392.333 02:19:52
# 4 Japan 7715.000 02:08:35
# 5 Morocco 7527.000 02:05:27
# 6 Romania 8490.000 02:21:30
# 7 Spain 7643.000 02:07:23
# 8 USA 7577.000 02:06:17
The first line adds a column nTime which is the time, in seconds, since midnight.
The second line calculates the means.
The third line converts back to H:M:S.
The problem you were having is the strptime(...), when forced to convert to numeric, returns the number of second between 1970-01-01 and the indicated time today. So, a really big number. This code just subtracts out the number of second from 1970-01-01 and 00:00:00 today.
Are you trying to do this -
dades$Time <- strptime(dades$Time,'%H:%M:%S')
by(dades$Time, dades$Country, mean)
If I didn't understand your question, can you please post sample output.

R data.frame transformation?

I have a R data frame that looks like this:
Country Property Value
Canada Capital Ottawa
Canada Population 38
Canada Language1 French
Canada Language2 English
United States Capital Washington
United States Population 280
United States Language1 English
United States Language2 NA
I want to re-arrange this so that it looks like this:
Country Capital Population Language1 Language2
Canada Ottawa 38 French English
United States Washington 280 English NA
Is there any way to do this transformation ?
Thanks.
As per Paul Hiemstra's comment:
the reshape2 package's dcast will do this nicely:
dcast(data=yourdataframe, Country~Property, value.var='Value')
If you've got duplicated values in there though it will try to aggregate them using length as a default, which isn't what you want!

Resources