Combine lists of different lengths - r

I am new to R and started learning two weeks ago. I want to take a list of tropical cyclone counts for various years (where some years are absent, because there were no tropical cyclones) and create a list with a column of every year from 1907-2013 and a column of the number of tropical cyclones.
In the example I include the list of occurrences to 1973 (before 1912 there were none).
Year Count
1 1912 1
2 1913 1
3 1921 1
4 1940 1
5 1953 1
6 1958 1
7 1959 1
8 1960 1
9 1966 1
10 1969 1
11 1971 1
12 1973 2
I tried using a for loop and if/else statement, but it does not work. I get the message "longer object length is not a multiple of shorter object length" and "the condition has length > 1 and only the first element will be used."
tc.SP=matrix(0,len.tc.yr,2)
tc.SP[,1]=tc.year.list
for (i in 1:len.tc.yr) #107 yrs (1907-2013)
{
if (tc.SP5.count[,1] == tc.SP[,1]) #tc.SP5.count is various years of TC occ.
{tc.SP[,2]= tc.SP5.count[,2]}
else
{tc.SP[,2]= 0}
}
Thank you for any help in advance.

When you say list, i'm going to assume you want to create a data.frame. Let's say the data above is in a data.frame called cyclone. The easiest way to create a data.frame for every year is just to merge it with a complete list. For example
cyclone.full <- merge(cyclone, data.frame(Year=1907:2013), all=T)
Here the data.frames will automatically merge on the Year column because both sets have that column. This will put NA values in all the missing years. If you want the default to be 0, you can do
cyclone.full$Count[is.na(cyclone.full$Count)] <- 0
Then yo uget
head(cyclone.full)
# Year Count
# 1 1907 0
# 2 1908 0
# 3 1909 0
# 4 1910 0
# 5 1911 0
# 6 1912 1

Related

r - Fill in missing years in Data frame [duplicate]

This question already has answers here:
Extend an irregular sequence and add zeros to missing values
(9 answers)
Closed 1 year ago.
I have some data in R that looks like this.
year freq
<int> <int>
1902 2
1903 2
1905 1
1906 4
1907 1
1908 1
1909 1
1912 1
1914 1
1915 1
The data was read in using the following code.
data = read.csv("earthquakes.csv")
my_var <- c('year')
new_data <- data[my_var]
counts <- count(data, 'year')
This is 1 page of a 7 page table. I need to fill in the missing years with a count of 0 from 1900-1999. How would I go about this? I haven't been able to find an example online where year is the primary column.
We may use complete on the 'counts' data
library(tidyr)
complete(counts, year = 1990:1999, fill = list(freq = 0))
1) Convert the input, shown in the Note, to zoo class and then to ts class. The latter will fill iln the missing years with NA. Replace the NA's with 0, convert back to data frame and set the names to the original names.
If a ts series is ok as output then omit the last two lines. If in addition it is ok to use NA rather than 0 then omit the last three lines.
library(zoo)
DF |>
read.zoo() |>
as.ts() |>
na.fill(0) |>
fortify.zoo() |>
setNames(names(DF))
giving:
year freq
1 1902 2
2 1903 2
3 1904 0
4 1905 1
5 1906 4
6 1907 1
7 1908 1
8 1909 1
9 1910 0
10 1911 0
11 1912 1
12 1913 0
13 1914 1
14 1915 1
2) for a base solution use merge. Omit the last line if NA is ok instead of 0.
m <- merge(DF, data.frame(year = min(DF$year):max(DF$year)), all = TRUE)
transform(m, freq = replace(freq, is.na(freq), 0))
Note
Lines <- "year freq
1902 2
1903 2
1905 1
1906 4
1907 1
1908 1
1909 1
1912 1
1914 1
1915 1"
DF <- read.table(text = Lines, header = TRUE)

rowwise multiplication of two different dataframes dplyr

I have two dataframes and I want to multiply one column of one dataframe (pop$Population) with parts of the other dataframe, sometimes with the mean of one column or a subset (here e.g.: multiplication with mean of df$energy).
As I want to have my results per Year i need to additionally multiply it by 365 (days).
I need the results for each Year.
age<-c("6 Months","9 Months", "12 Months")
energy<-c(2.5, NA, 2.9)
Df<-data.frame(age,energy)
Age<-1
Year<-c(1990,1991,1993, 1994)
Population<-c(200,300,400, 250)
pop<-data.frame(Age, Year,Population)
pop:
Age Year Population
1 1 1990 200
2 1 1991 300
3 1 1993 400
4 1 1994 250
df:
age energy
1 6 Months 2.5
2 9 Months NA
3 12 Months 2.9
my thoughts were, but I got an Error:
pop$energy<-pop$Population%>%
rowwise()%>%
transmute("energy_year"= .%*% mean(Df$energy, na.rm = T)%*%365)
Error in UseMethod("rowwise") :
no applicable method for 'rowwise' applied to an object of class "c('double', 'numeric')"
I wished to result in a dataframe like this:
Age Year Population energy_year
1 1 1990 200 197100
2 1 1991 300 295650
3 1 1993 400 394200
4 1 1994 250 246375
pop$Population is a vector and not a data frame hence the error.
For your use case the simplest thing to do would be:
pop %>% mutate(energy_year= Population * mean(Df$energy, na.rm = T) * 365)
This will give you the output:
Age Year Population energy_year
1 1 1990 200 197100
2 1 1991 300 295650
3 1 1993 400 394200
4 1 1994 250 246375

Importing .csv file with tidydata

I am having difficulty importing my data in the way I would like to from a .csv file to tidydata.
My data set is made up of descriptive data (age, country, etc.) and then 15 condition columns that I would like to have in just one column (long format). I have previously tried 'melting' the data in a few ways, but it does not turn out the way I intended it to. These are a few things I have tried, I know it is kind of messy. There are quite a few NAs in the data, which seem to be causing an issue. I am trying to create this specific column "Vignette" which will serve as the collective column for the 15 vignette columns I would like in long format.
head(dat)
ID Frequency Gender Country Continent Age
1 5129615189 At least weekly female France Europe 30-50 years
2 5128877943 At least daily female Spain Europe > 50 years
3 5126775994 At least weekly female Spain Europe 30-50 years
4 5126598863 At least daily male Albania Europe 30-50 years
5 5124909744 At least daily female Ireland Europe > 50 years
6 5122047758 At least weekly female Denmark Europe 30-50 years
Practice Specialty Seniority AMS
1 University public hospital centre Infectious diseases 6-10 years Yes
2 Other public hospital Infectious diseases > 10 years Yes
3 University public hospital centre Intensive care > 10 years Yes
4 University public hospital centre Infectious diseases > 10 years No
5 Private hospial/clinic Clinical microbiology > 10 years Yes
6 University public hospital centre Infectious diseases 0-5 years Yes
Durations V01 V02 V03 V04 V05 V06 V07 V08 V09 V10 V11 V12 V13 V14 V15
1 range 7 2 7 7 7 5 7 14 7 42 42 90 7 NA 5
2 range 7 10 10 5 14 5 7 14 10 42 21 42 14 14 14
3 range 7 5 5 7 14 5 5 13 10 42 42 42 5 0 7
4 range 10 7 7 5 7 10 7 5 7 28 14 42 10 10 7
5 range 7 5 7 7 14 7 7 14 10 42 42 90 10 0 7
6 fixed duration 7 3 3 7 10 10 7 14 7 90 90 90 10 7 7
dat_long %>%
gather(Days, Age, -Vignette)
dat$new_sp = NULL
names(dat) <- gsub("new_sp", "", names(dat))
dat_tidy<-melt(
data=dat,
id=0:180,
variable.name="Vignette",
value.name="Days",
na.rm=TRUE
)
dat_tidy<- mutate(dat_tidy,
Days= sub("^V", "", Days)
)
It keeps saying "Error: id variables not found in data: NA"
I have tried to get rid of NAs but it doesn't seem to do anything.
I am guessing you are loading the melt function from reshape2. I will recommend that you try tidyr which is basically the next generation of reshape2.
Your error is presumable that the argument id=0:180. This is basically asking it to keep columns 0-180 as "identifier" columns, and melt the rest (i.e. create a new row for each value in each column).
When you subset more column indices than columns in a data.frame, the non-existing columns are filled in with pure NA - you asked for them, so you get them!
I would recommend loading tidyr, as it is newer. There should be some new verbs in the package that are more intuitive, but I'll give you a solution with the older semantic:
library(tidyr)
dat_tidy <- dat %>% gather('Vignette', 'Days', starts_with('V'))
# or a bit more verbose
dat_tidy <- dat %>% gather('Vignette', 'Days', V01, V02, V03, V04)
And check out the comment #heck1 for asking even better questions.

R aggregating on date then character

I have a table that looks like the following:
Year Country Variable 1 Variable 2
1970 UK 1 3
1970 USA 1 3
1971 UK 2 5
1971 UK 2 3
1971 UK 1 5
1971 USA 2 2
1972 USA 1 1
1972 USA 2 5
I'd be grateful if someone could tell me how I can aggregate the data to group it first by year, then country with the sum of variable 1 and variable 2 coming afterwards so the output would be:
Year Country Sum Variable 1 Sum Variable 2
1970 UK 1 3
1970 USA 1 3
1971 UK 5 13
1971 USA 2 2
1972 USA 3 6
This is the code I've tried to no avail (the real dataframe is 125,000 rows by 30+ columns hence the subset. Please be kind, I'm new to R!)
#making subset from data
GT2 <- subset(GT1, select = c("iyear", "country_txt", "V1", "V2"))
#making sure data types are correct
GT2[,2]=as.character(GT2[,2])
GT2[,3] <- as.numeric(as.character( GT2[,3] ))
GT2[,4] <- as.numeric(as.character( GT2[,4] ))
#removing NA values
GT2Omit <- na.omit(GT2)
#trying to aggregate - i.e. group by year, then country with the sum of Variable 1 and Variable 2 being shown
aggGT2 <-aggregate(GT2Omit, by=list(GT2Omit$iyear, GT2Omit$country_txt), FUN=sum, na.rm=TRUE)
Your aggregate is almost correct:
> aggGT2 <-aggregate(GT2Omit[3:4], by=GT2Omit[c("country_txt", "iyear")], FUN=sum, na.rm=TRUE)
> aggGT2
country_txt iyear V1 V2
1 UK 1970 1 3
2 USA 1970 1 3
3 UK 1971 5 13
4 USA 1971 2 2
5 USA 1972 3 6
dplyr is almost always the answer nowadays.
library(dplyr)
aggGT1 <- GT1 %>% group_by(iyear, country_txt) %>% summarize(sv1=sum(V1), sv2=sum(V2))
Having said that, it is good to learn basic R functions like aggregate and by.

count unique values in one column for specific values in another column,

I have a data frame on bills that has (among other variables) a column for 'year', a column for 'issue', and a column for 'sub issue.' A simplified example df looks like this:
year issue sub issue
1970 4 20
1970 3 21
1970 4 22
1970 2 8
1971 5 31
1971 4 22
1971 9 10
1971 3 21
1971 4 22
Etc., for about 60 years. I want to count the unique values in the issue and sub issue columns for each year, and use those to create a new df- dat2. Using the df above, dat2 would look like this:
year issues sub issues
1970 3 4
1971 4 4
Weary of factors, I confirmed that the values in all columns are integers, if that makes a difference. I am new at R (obviously), and I haven't been able to find relevant code for this specific purpose online. Thanks for any help!!
That's a one-liner, with aggregate:
with(d,aggregate(cbind(issue,subissue) ~ year,FUN=function(x){length(unique(x))}))
returning:
year issue subissue
1 1970 3 4
2 1971 4 4

Resources