This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 5 years ago.
I have the following dataset
structure(list(Year = c("Oranges", "Cherrys", "Apples", "Bananas"
), `42461` = c(0, NA, 12, NA), `42491` = c(1, 12, NA, NA), `42522` = c(1,
12, 7, NA), `42552` = c(NA, 12, 6, NA), `42583` = c(2, NA, 8,
NA), `42614` = c(NA, 12, 5, NA), `42644` = c(NA, NA, 4, NA),
`42675` = c(NA, 12, NA, NA), `42705` = c(NA, 3, NA, NA),
`42736` = c(NA, NA, 12, NA), `42767` = c(NA, NA, 12, NA),
`42795` = c(NA, 12, NA, NA), Total = c(0, 0, 0, 0)), .Names = c("Year",
"42461", "42491", "42522", "42552", "42583", "42614", "42644",
"42675", "42705", "42736", "42767", "42795", "Total"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
I would like to pivot it to look like:
Category-Values-Year
I tried the following:
datdat %>% gather(Cat,Var)
but the problem is that the year is the name of each column.
I removed the "Totals" column, I'm not sure if this is what you're asking for:
library (data.table)
dat = data.table (structure(list(Year = c("Oranges", "Cherrys", "Apples",
"Bananas"
), `42461` = c(0, NA, 12, NA), `42491` = c(1, 12, NA, NA), `42522` = c(1,
12, 7, NA), `42552` = c(NA, 12, 6, NA), `42583` = c(2, NA, 8,
NA), `42614` = c(NA, 12, 5, NA), `42644` = c(NA, NA, 4, NA),
`42675` = c(NA, 12, NA, NA), `42705` = c(NA, 3, NA, NA),
`42736` = c(NA, NA, 12, NA), `42767` = c(NA, NA, 12, NA),
`42795` = c(NA, 12, NA, NA), Total = c(0, 0, 0, 0)), .Names = c("Year",
"42461", "42491", "42522", "42552", "42583", "42614", "42644",
"42675", "42705", "42736", "42767", "42795", "Total"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L)))
names (dat)[1] = "Category"
dat [, "Total" := NULL]
melt.dat = melt (dat, id.vars = c("Category"), variable.name = "Year")
melt.dat gives you:
> head (melt.dat)
Category Year value
1: Oranges 42461 0
2: Cherrys 42461 NA
3: Apples 42461 12
4: Bananas 42461 NA
5: Oranges 42491 1
6: Cherrys 42491 12
Also note, the table is a data.table, not a data.frame :)
Forgot to mention, run install.packages ("data.table") if you don't have it yet
Related
I want to give names to individual starplots in R
stars(norm_datas[, 1:12], full = TRUE,radius = TRUE,len = 1.0, key.loc = c(14,1), labels = abbreviate(case.names(norm_datas)),main = "Provision of Ecosystem services", draw.segments = TRUE, lwd = 0.25, lty = par("lty"), xpd = TRUE).
This is what I tried but it just labeled each star plot as 1, 2, 3.
Kindly help resolve.
structure(list(Type_Garden = c("AG", "AG", "AG"), Pollinators = c(10,
6, 5.5), Flower_abundance = c(384, 435, 499), Climate_regulation = c(1,
7, 2), Crop_area = c(34, 25, 10), Plant_diversity = c(22, 53,
41), Nitrogen_balance = c(0.95, 0.26, NA), Phosphorus_balance = c(0.24,
0.04, NA), Habitat_provision = c(1, 2, 0), Recreation_covid = c(1,
NA, NA), Aesthetic_appreciation = c(3, NA, NA), Reconnection_nature = c(4,
NA, NA), Mental_health = c(1, NA, NA), Physical_health = c(1,
NA, NA)), class = "data.frame", row.names = c(NA, -3L))
I got a df like this
structure(list(id = c(1, 1, 2, 3), datein_1 = c("1998/01/09",
"2006/03/03", "2015/03/10", "2007/04/10"), dateout_1 = c("1998/01/16",
"2006/06/21", "2015/03/25", "2007/04/11"), datein_2 = c(NA, NA,
"2011/09/19", "2006/06/01"), date2_out = c(NA, NA, "2015/03/09",
"2007/04/09"), date3_in = c(NA, NA, "2015/03/09", NA), date3_out = c(NA,
NA, "2015/03/26", NA)), class = "data.frame", row.names = c(NA,
-4L))
and I want to restructure the data as follows
structure(list(id2 = c(1, 1, 2, 2, 3), datein_1 = c("1998/01/09",
"2006/03/03", "2015/03/10", NA, "2007/04/10"), dateout_1 = c("1998/01/16",
"2006/06/21", "2015/03/25", NA, "2007/04/11"), datein_2_3 = c(NA,
NA, "2011/09/19", "2015/03/09", "2006/06/01"), dateout_2_3 = c(NA,
NA, "2015/03/09", "2015/03/26", "2007/04/09")), class = "data.frame", row.names = c(NA,
-5L))
I want the columns detain_2 and detain_3 to be one column and datein_3 and date out_3 into one column.We will have to insert an extra line for id 2 here in the datein_1 and dateout_1.
rbind.fill solution should work for you
library(dplyr)
plyr::rbind.fill(df[,1:5] %>%
rename(datein_2_3 = datein_2,
dateout_2_3 = date2_out),
df[,c(1,6:7)] %>%
filter(!is.na(date3_in) | !is.na(date3_out)) %>%
rename(datein_2_3 = date3_in ,
dateout_2_3 = date3_out)) %>%
arrange(id, datein_1, datein_2_3) ## suppose thats a proper order
output:
id datein_1 dateout_1 datein_2_3 dateout_2_3
1 1 1998/01/09 1998/01/16 <NA> <NA>
2 1 2006/03/03 2006/06/21 <NA> <NA>
3 2 2015/03/10 2015/03/25 2011/09/19 2015/03/09
4 2 <NA> <NA> 2015/03/09 2015/03/26
5 3 2007/04/10 2007/04/11 2006/06/01 2007/04/09
I would like to calculate the mean of the data frame that has some missing values. The sum of the data frame is 500 and the number of cells is 28. therefore the mean should be 17.8571. However, when calculating in R I need to mark the missing cells with 0 that changes the mean value
Sample data:
df<-structure(list(`10` = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10), `20` = c(20, 20, 20, 20, 20, 20, 20, 20, NA,
NA, NA, NA, NA, NA), `30` = c(30, 30, 30, 30, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), `40` = c(40, 40, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
Sample code:
Where is my mistake?
df1<-rowMeans(df, na.rm=TRUE) # I also tried colMeans
df2<-mean(df1)
sum(df,na.rm = TRUE)/sum(!is.na(df))
You can convert your data.frame to a vector using unlist and calculate then the mean with the argument na.rm=TRUE to skip NA.
mean(unlist(df), na.rm=TRUE)
#[1] 17.85714
Another option is to convert the data.frame to a matrix.
mean(as.matrix(df), na.rm=TRUE)
#[1] 17.85714
To match mean with excel you can repeat the time value df number of times.
mean(rep(df$time, df$df))
#[1] 17.85714
In the interest of learning better coding practices, can anyone show me a more efficient way of solving my problem? Maybe one that doesn't require new columns...
Problem: I have two data frames: one is my main data table (t) and the other contains changes I need to replace in the main table (Manual_changes). Example: Sometimes the CaseID is matched with the wrong EmployeeID in the file.
I can't provide the main data table, but the Manual_changes file looks like this:
Manual_changes = structure(list(`Case ID` = c(46605, 25321, 61790, 43047, 12157,
16173, 94764, 38700, 41798, 56198, 79467, 61907, 89057, 34232,
100189), `Employee ID` = c(NA, NA, NA, NA, NA, NA, NA, NA, 906572,
164978, 145724, 874472, 654830, 846333, 256403), `Age in Days` = c(3,
3, 3, 12, 0, 0, 5, 0, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-15L), class = c("tbl_df", "tbl", "data.frame"))
temp = merge(t, Manual_changes, by = "Case ID", all.x = TRUE)
temp$`Employee ID.y` = ifelse(is.na(temp$`Employee ID.y`), temp$`Employee ID.x`, temp$`Employee ID.y`)
temp$`Age in Days.y`= ifelse(is.na(temp$`Age in Days.y`), temp$`Age in Days.x`, temp$`Age in Days.y`)
temp$`Age in Days.x` = NULL
temp$`Employee ID.x` = NULL
colnames(temp) = colnames(t)
t = temp
We could use coalesce
library(dplyr)
left_join(t, Manual_changes, by = "Case ID") %>%
mutate(Employee_ID.y = coalesce(`Employee ID.x`, `Employee ID.y`),
`Age in Days.y` = coalesce(`Age in Days.x`, `Age in Days.y`))
Or with data.table
library(data.table)
setDT(t)[Manual_changes,
c('Employee ID', 'Age in Days') :=
.(fcoalesce(`Employee ID.x`, `Employee ID.y`),
fcoalesce(`Age in Days.x`, `Age in Days.y`)),
on = .(`Case ID`)]
I'm actually working on an end-studies project on stallion fertility.
And so, I have few factors that I would like to test their effect on stallion fertility.
I have a large table with 54 columns and about 300 rows, each column is a factor, either quantitative or qualitative. The fertility is transcript by "yes" or "non" in the column "DG".
So to test all factors and maybe interactions, I would like to do an AFDM analysis but before that, I have to run missMDA function because I have empty values.
But, when I try to do missMDA function, have always error messages as for exemple :
> res.impute<-imputeFAMD(Tableau_analyse_juments_finies, ncp = 3)
Error in eigen(crossprod(X, X), symmetric = TRUE) : 0 x 0 matrix
> res.impute<-estim_ncpFAMD(Tableau_analyse_juments_finies)
Error in `[.data.frame`(jeu, , (nbquanti + 1):ncol(jeu), drop = F) :
undefined columns selected
I'm not very good in statistics and statisticians from my school don't have time to help me so I'm embarrassed. Could someone help me?
PS : I'm French so if someone is French as well, he can speak me French, it will be easier for me :)
When I do dput function, I have this :
structure(list(Jument = c("Darling-de-Courcy", "Darling-de-Courcy",
"Doublette", "Undoctra-d-Helby", "Unfee-du-Clos-Marman", "Hadelle-de-Padoue",
"Prunelle-de-la-Vallee", "Prunelle-de-la-Vallee", "Quelle-dame-du-Mesnil",
"Quiara-de-Saint-A"), Etalon = c("ARMITAGES", "ARMITAGES", "ARMITAGES",
"ARMITAGES", "ARMITAGES", "BY-CERA", "BY-CERA", "BY-CERA", "CANTURANO",
"CANTURANO"), Age = c(7, 7, 17, 12, 12, 3, 17, 17, 16, 16), Historique = c("S",
"S", "P", "TE-2019", "MPNS", "P", "MPNS", "MPNS", "P", "MPNS"
), `NEC-1` = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3), Antecedents = c("Poulin",
"Poulin", "Compet", "Compet", "Poulin", "Elevage", "Poulin",
"Poulin", "Compet", "Poulin"), Habitat = c("PT", "PT", "PI",
"Box", "PT", "PT", "PT", "PT", "PT", "PT"), `Nb-chaleurs` = c(2,
2, 1, 1, 1, 1, 2, 2, 1, 2), Lait = c("oui", "oui", "non", "non",
"non", "non", "non", "non", "non", "non"), `TM-1` = c("IAR",
"IAR", "IAR", "IAR", "IAC", "IAC", "IAR-12", "IAR-12", "IAR",
"IAR"), `N.-inse-1` = c(1, 2, 1, 1, 1, 1, 2, 2, 1, 1), D1 = structure(c(1586390400,
1588204800, 1586390400, 1585958400, 1588896000, 1587772800, 1584489600,
1586304000, 1587600000, 1584144000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), H1 = c("MA", "MA", "AM", "MA", "M", "M", "AM", "AM",
"AM", "MA"), `N.-paillettes-1` = c(NA, NA, NA, NA, 2, 3, NA,
NA, NA, NA), I1 = c(6, 6, 12, 24, 0, 24, 48, 24, 24, 6), Mob1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0.8), CC1 = c(NA, NA, NA, NA,
NA, NA, NA, NA, 303, 234), `N.-Saut-1` = c(NA, NA, NA, NA, NA,
NA, NA, NA, 1, 1), `Gel-1` = c(NA, NA, NA, NA, NA, NA, NA, NA,
0, 1), `DG-1` = c("non", "oui", "oui", "oui", "oui", "oui", "non",
"oui", "oui", "non")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
But I have the impress that R consider each stallion ("Etalon" in my database), as an unique individual, but for exemple the stallion "ARMITAGES" reproduced many mares ("Jument" in my database).
When I try to do your function for missing values, my database become empty and they are no data anymore.
I had the same issue and found out that all character variables need to be converted to factor. For all character variables, convert them using as.factor().