Create multiple columns of values that are in second column and fill new data frame with number of occurences according to first column [duplicate] - r

This question already has answers here:
Frequency counts in R [duplicate]
(2 answers)
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I am new to stack overflow and sorry if I am not asking question properly.
I have two columns country and Year.
INDIA 1970
USA 1970
USA 1971
INDIA 1970
.
.
UK 1972
I want new data frame like this and I need to fill it with occurrences.
1970 1971 1972....
INDIA 2
USA 1 1
UK 1

An option could be to use reshape2::dcast with fun.aggregate argument set as length:
library(reshape2)
dcast(df, Country~Year, length)
# Country 1970 1971 1972
# 1 INDIA 2 0 0
# 2 UK 0 0 1
# 3 USA 1 1 0
Data:
df <- read.table(text =
"Country Year
INDIA 1970
USA 1970
USA 1971
INDIA 1970
UK 1972",
header = TRUE, stringsAsFactors = FALSE)

Related

Aggregates by group and including counts across rows [duplicate]

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.
You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.

How do I melt or reshape binned data in R? [duplicate]

This question already has answers here:
Count number of rows within each group
(17 answers)
Closed 7 years ago.
I have binned data reflecting the width of rivers across each continent. Below is a sample dataset. I pretty much just want to get the data into the form I have shown.
dat <- read.table(text =
"width continent bin
5.32 Africa 10
6.38 Africa 10
10.80 Asia 20
9.45 Africa 10
22.66 Africa 30
9.45 Asia 10",header = TRUE)
How do I melt the above toy dataset to create this dataframe?
Bin Count Continent
10 3 Africa
10 1 Asia
20 1 Asia
30 1 Africa
We could use either one of the aggregate by group.
The data.table option would be to convert the 'data.frame' to 'data.table' (setDT(dat)), grouped by 'continent' and 'bin' variables, we get the number of elements per group (.N)
library(data.table)
setDT(dat)[,list(Count=.N) ,.(continent,bin)]
# continent bin Count
#1: Africa 10 3
#2: Asia 20 1
#3: Africa 30 1
#4: Asia 10 1
Or a similar option with dplyr by grouping the variables and then use n() instead of .N to get the count.
library(dplyr)
dat %>%
group_by(continent, bin) %>%
summarise(Count=n())
Or we can use aggregate from base R and using the formula method, we get the length.
aggregate(cbind(Count=width)~., dat, FUN=length)
# continent bin Count
#1 Africa 10 3
#2 Asia 10 1
#3 Asia 20 1
#4 Africa 30 1
From #Frank's and #David Arenburg's comments, some additional options using data.table and dplyr. We convert the dataset to data.table (setDT(dat)), convert to 'wide' format with dcast, then reconvert it back to 'long' using melt, and subset the roww (value>0)
library(data.table)
melt(dcast(setDT(dat),continent~bin))[value>0]
Using count from dplyr
library(dplyr)
count(dat, bin, continent)
With sqldf:
library(sqldf)
sqldf("SELECT bin, continent, COUNT(continent) AS count
FROM dat
GROUP BY bin, continent")
Output:
bin continent count
1 10 Africa 3
2 10 Asia 1
3 20 Asia 1
4 30 Africa 1

R aggregating on date then character

I have a table that looks like the following:
Year Country Variable 1 Variable 2
1970 UK 1 3
1970 USA 1 3
1971 UK 2 5
1971 UK 2 3
1971 UK 1 5
1971 USA 2 2
1972 USA 1 1
1972 USA 2 5
I'd be grateful if someone could tell me how I can aggregate the data to group it first by year, then country with the sum of variable 1 and variable 2 coming afterwards so the output would be:
Year Country Sum Variable 1 Sum Variable 2
1970 UK 1 3
1970 USA 1 3
1971 UK 5 13
1971 USA 2 2
1972 USA 3 6
This is the code I've tried to no avail (the real dataframe is 125,000 rows by 30+ columns hence the subset. Please be kind, I'm new to R!)
#making subset from data
GT2 <- subset(GT1, select = c("iyear", "country_txt", "V1", "V2"))
#making sure data types are correct
GT2[,2]=as.character(GT2[,2])
GT2[,3] <- as.numeric(as.character( GT2[,3] ))
GT2[,4] <- as.numeric(as.character( GT2[,4] ))
#removing NA values
GT2Omit <- na.omit(GT2)
#trying to aggregate - i.e. group by year, then country with the sum of Variable 1 and Variable 2 being shown
aggGT2 <-aggregate(GT2Omit, by=list(GT2Omit$iyear, GT2Omit$country_txt), FUN=sum, na.rm=TRUE)
Your aggregate is almost correct:
> aggGT2 <-aggregate(GT2Omit[3:4], by=GT2Omit[c("country_txt", "iyear")], FUN=sum, na.rm=TRUE)
> aggGT2
country_txt iyear V1 V2
1 UK 1970 1 3
2 USA 1970 1 3
3 UK 1971 5 13
4 USA 1971 2 2
5 USA 1972 3 6
dplyr is almost always the answer nowadays.
library(dplyr)
aggGT1 <- GT1 %>% group_by(iyear, country_txt) %>% summarize(sv1=sum(V1), sv2=sum(V2))
Having said that, it is good to learn basic R functions like aggregate and by.

Combine lists of different lengths

I am new to R and started learning two weeks ago. I want to take a list of tropical cyclone counts for various years (where some years are absent, because there were no tropical cyclones) and create a list with a column of every year from 1907-2013 and a column of the number of tropical cyclones.
In the example I include the list of occurrences to 1973 (before 1912 there were none).
Year Count
1 1912 1
2 1913 1
3 1921 1
4 1940 1
5 1953 1
6 1958 1
7 1959 1
8 1960 1
9 1966 1
10 1969 1
11 1971 1
12 1973 2
I tried using a for loop and if/else statement, but it does not work. I get the message "longer object length is not a multiple of shorter object length" and "the condition has length > 1 and only the first element will be used."
tc.SP=matrix(0,len.tc.yr,2)
tc.SP[,1]=tc.year.list
for (i in 1:len.tc.yr) #107 yrs (1907-2013)
{
if (tc.SP5.count[,1] == tc.SP[,1]) #tc.SP5.count is various years of TC occ.
{tc.SP[,2]= tc.SP5.count[,2]}
else
{tc.SP[,2]= 0}
}
Thank you for any help in advance.
When you say list, i'm going to assume you want to create a data.frame. Let's say the data above is in a data.frame called cyclone. The easiest way to create a data.frame for every year is just to merge it with a complete list. For example
cyclone.full <- merge(cyclone, data.frame(Year=1907:2013), all=T)
Here the data.frames will automatically merge on the Year column because both sets have that column. This will put NA values in all the missing years. If you want the default to be 0, you can do
cyclone.full$Count[is.na(cyclone.full$Count)] <- 0
Then yo uget
head(cyclone.full)
# Year Count
# 1 1907 0
# 2 1908 0
# 3 1909 0
# 4 1910 0
# 5 1911 0
# 6 1912 1

How to count levels of a factor in a data.frame, grouped by another value of that data.frame [GNU R] [duplicate]

This question already has answers here:
simple data.frame reshape
(3 answers)
Reshaping data frame with duplicates
(4 answers)
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 9 years ago.
I have a data.frame like this
VAR1 VAR2
1999 USA
1999 USA
1999 UK
2000 GER
2000 USA
2000 GER
2000 USA
2001 USA
How do I count any level of VAR2 for each year?
What I want is a plot, where the x-axe is the year, and the y-axe is the count of any level in VAR2
The data.table solution
library(data.table)
new.dat = data.table(dat)[,length(unique(var2)),by=var1]
new.dat=as.matrix(new.dat)
plot(x=new.dat[,1],y=new.dat[,2])
The simplest way I can think of:
let dat = your data frame
with(dat,table(VAR1,VAR2))
The output will look something like this:
VAR2
VAR1 GER UK USA
1999 0 1 2
2000 2 0 2
2001 0 0 1
Hope this helps.
There are a large number of ways and this question is undoubtedly a duplicate. What have you tried? You can use dcast in the reshape2 pacakge.
require(reshape2)
dcast( df , Country ~ Year , length )
# Country 1999 2000 2001
#1 GER 0 2 0
#2 UK 1 0 0
#3 USA 2 2 1

Resources