merge two data frames to single object of lists - r

I would like to create specific object based of 2 data frames. First contains basic information about students, second information how many points each student get in each day.
students <- data.frame(
studentId = c(1,2,3),
name = c('Sophia', 'Mike', 'John'),
age = c(13,12,15)
)
studentPoints <- data.frame(
studentId = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
date = rep(c(Sys.Date()+c(1:5)),3),
point = c(5,1,3,9,9,9,5,2,4,5,8,9,5,8,4)
)
As a result I would like to get object:
result <- list(
list(
studentId = 1,
name = 'Sophia',
age = 13,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(5,1,3,9,9)
)
),
list(
studentId = 2,
name = 'Mike',
age = 12,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(9,5,2,4,5)
)
),
list(
studentId = 3,
name = 'John',
age = 15,
details = list(
date = c("2021-04-11", "2021-04-12", "2021-04-13", "2021-04-14", "2021-04-15"),
point = c(8,9,5,8,4)
)
)
)
I created it manually, any idea how to create it automatically? because my database contains few thousands of students

Split the lists by studentId, convert each column to it's own list and combine the datasets.
result <- Map(function(x, y) list(x, details = y),
lapply(split(students, students$studentId), as.list),
lapply(split(studentPoints, studentPoints$studentId), as.list))

Related

How to loop dataframe in R

I want to get data from IMF.However the API data is limited
Therefor I get the data by continent.
How to loop the dateframe? (The data can get from "Before loop part",load data from api)
The reference cannot work.https://stackoverflow.com/questions/25284539/loop-over-a-string-variable-in-r
Before the loop
library(imfr)
library(countrycode)
data(codelist)
country_set <- codelist
country_set<- country_set %>%
select(country.name.en , iso2c, iso3c, imf, continent, region) %>% filter(!is.na(imf) & !is.na(iso2c))
africa_iso2<- country_set$iso2c[country_set$continent=="Africa"]
asia_iso2<- country_set$iso2c[country_set$continent=="Asia"]
americas_iso2<- country_set$iso2c[country_set$continent=="Americas"]
europe_iso2<- country_set$iso2c[country_set$continent=="Europe"]
oceania_iso2<- country_set$iso2c[country_set$continent=="Oceania"]
loop part
continent <- c("africa", "asia", "americas","europe","oceania")
for(i in 1:length(continent)){
var <- paste0("gdp_nsa_xdc_", continent[i])
var1 <- paste0(continent[i],"_iso2")
[[var]]<- imf_data(database_id = "IFS" , indicator = c("NGDP_NSA_XDC"),country =[[var1]],start = 2010, end = 2022,return_raw = TRUE)
[[var]]<- [[var]]$CompactData$DataSet$Series
}
data sample is
list(CompactData = list(`#xmlns:xsi` = "http://www.w3.org/2001/XMLSchema-instance",
`#xmlns:xsd` = "http://www.w3.org/2001/XMLSchema", `#xsi:schemaLocation` = "http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message https://registry.sdmx.org/schemas/v2_0/SDMXMessage.xsd http://dataservices.imf.org/compact/IFS http://dataservices.imf.org/compact/IFS.xsd",
`#xmlns` = "http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message",
Header = list(ID = "18e0aeae-09ec-4dfe-ab72-60aa16aaea84",
Test = "false", Prepared = "2022-10-19T12:02:28", Sender = list(
`#id` = "1C0", Name = list(`#xml:lang` = "en", `#text` = "IMF"),
Contact = list(URI = "http://www.imf.org", Telephone = "+ 1 (202) 623-6220")),
Receiver = list(`#id` = "ZZZ"), DataSetID = "IFS"), DataSet = list(
`#xmlns` = "http://dataservices.imf.org/compact/IFS",
Series = list(`#FREQ` = "Q", `#REF_AREA` = "US", `#INDICATOR` = "NGDP_NSA_XDC",
`#UNIT_MULT` = "6", `#TIME_FORMAT` = "P3M", Obs = structure(list(
`#TIME_PERIOD` = c("2020-Q1", "2020-Q2", "2020-Q3",
"2020-Q4", "2021-Q1", "2021-Q2", "2021-Q3", "2021-Q4",
"2022-Q1", "2022-Q2"), `#OBS_VALUE` = c("5254152",
"4930197", "5349433", "5539370", "5444406", "5784816",
"5883177", "6203369", "6010733", "6352982")), class = "data.frame", row.names = c(NA,
10L))))))
I suggest you create a list first, to which you will assign the value you want your loop to create. The following code creates a named list, and then at the end of the loop, assigns the value of each iteration to that named list:
continent <-
sapply(c("africa", "asia", "americas","europe","oceania"),
c, simplify = FALSE, USE.NAMES = TRUE)
for(i in seq_len(length(continent))) {
var <- paste0("gdp_nsa_xdc_", continent[i])
var1 <- get(paste0(continent[i],"_iso2"))
var <- imf_data(database_id = "IFS" , indicator = c("NGDP_NSA_XDC"),
country = var1, start = 2010, end = 2022,
return_raw = TRUE)
continent[[i]] <- var$CompactData$DataSet$Series
}
I don't necessarily understand the double brackets around [[var]]. Let me know if my answer does not correspond to what you were looking for!
We could use assign to create objects in the global env
for(i in 1:length(continent)){
var <- paste0("gdp_nsa_xdc_", continent[i])
var1 <- paste0(continent[i],"_iso2")
assign(var, imf_data(database_id = "IFS" , indicator = c("NGDP_NSA_XDC"),country =[[var1]],start = 2010, end = 2022,
return_raw = TRUE))
assign(var, get(var)$CompactData$DataSet$Series)
}

In R--Change name of list of list of lists when lists have differing lengths

I've tried everything I can...I have a large list of lists. They are of varying depths, but they all have a variable name that I need to rename. I tried breaking apart the list into data frames, it just seems unpractical and doesn't even do what I want.
Here's a toy example:
list1 = list(changethis = c("1", "2"))
list2 = list(varname1 = c("1,2,3,4"), changethis = c("5,6,7,8"), varname2 = c("9, 10, 11"))
list3 = list(varname3 = list(varname4 = c("first", "second", "third", list(changethis = c("15, 16, 19"), varname5 = "cat", "dog", "fish"))))
list4 = list(varname6 = list(varname7 = list2, varname8 = list2))
list5a = list(varname13 = c("hat", "key"), changethis = c("5"))
list5 = list(varname9 = list(varname10 = list5a, varname11 = list5a))
list6 = list(varname12 = list5)
list7 = list(first = list1)
listbig = list(sublist1 = list3, sublist2 = list4, sublist3 = list5, sublist4= list6, sublist5=list7, sublist6 = list5a)
Here's a toy code that produces what I want it to look like. The 'changethis' var is renamed to 'change'
sollist1 = list(changed= c("1", "2"))
sollist2 = list(varname1 = c("1,2,3,4"), changed = c("5,6,7,8"), varname2 = c("9, 10, 11"))
sollist3 = list(varname3 = list(varname4 = c("first", "second", "third", list(changed = c("15, 16, 19"), varname5 = "cat", "dog", "fish"))))
sollist4 = list(varname6 = list(varname7 = sollist2, varname8 = sollist2))
sollist5a = list(varname13 = c("hat", "key"), changed = c("5"))
sollist5 = list(varname9 = list(varname10 = sollist7, varname11 = sollist5a))
sollist6 = list(varname12 = sollist5)
sollist7 = list(first = sollist1)
solution_list = list(sublist1 = sollist3, sublist2 = sollist4, sublist3 = sollist5, sublist4= sollist6, sublist5=sollist7, sublist6=sollist7)
Here is one of my many attempts to do this. I extracted sublist1 from the big list and tried to just change the name for it, but nothing gets changed.
extr_sublist1 <- listbig[1]
names(extr_sublist1[[1]][[1]][[1]][4]) <- "changed" #does not do it...
Another failed attempt:
In this I extract a differently hierarchial sublist and create a Var name which I hope to loop over so I can change the name of the 'changethis' var. Also does not work.
extr_sublist4 <- listbig[4]
numvars <- length(extr_sublist4[[1]][[1]][[1]])
for (i in 1: numvars){
varname<-paste("Var",numvar, sep = "")
paste('Var',[i]) <- extr_sublist4[[1]][[1]][[1]][[1]][2]
namespaste('Var',[i])[paste('Var',[i]) == 'changethis'] <- 'changed'
}
I'm sure there's a simple and elegant solution to this...but have no idea what. Thanks in advance for your help.
You can use a recursive method to do the transformation:
changefun <- function(x, change_name, new_name){
idx <- names(x) == change_name
if(any(idx)) names(x)[idx] <- new_name
if (is.list(x)) lapply(x, changefun, change_name, new_name)
else x
}
Now just call
changefun(listbig, 'changethis', 'changed')

Is it possible to make it more readable? treemap

I just wanna know how can I make it more readable.
marketcap <- data.frame(Marketcap = c(641899161594, 30552518424, 271028619181,
9277626785, 3986737880, 1202315485,
6049985280, 30722840711),
id = c('Bitcoin', 'Dogecoin', 'Ethereum', 'Litecoin', 'Monero', 'Nem', 'Stellar', 'xrp'),
row.names = c('Bitcoin', 'Dogecoin', 'Ethereum', 'Litecoin', 'Monero', 'Nem',
'Stellar', 'xrp')); df
#install.packages('treemap')
library(treemap)
df1 <- na.omit(marketcap[,c('id','Marketcap')])
df1$Marketcap <- as.numeric(round(df1$Marketcap, 0))
df1$formatted_market_cap = paste0(df1$id, '\n', '$', formatC(c("642","30.5","271","9.3","4","1.2","6.044","30.7"), format = "e", digits = 2))
treemap(df1, index = 'formatted_market_cap', vSize = 'Marketcap', title = 'Cryptocurrency Market Cap (bn)', fontsize.labels=c(15, 4), palette='Set3')
For example, Nem is looking poor

How to use gather function to select 0 column if there is no column name of a data frame

I have big data frame. In the "0" column my variable names are recorded (i.e. gene names / AGI) however, there's no column name for this column (this df derived from DESeq).
I want to use the gather function to summarize data. How can I drop "0" column when there is no column name?
For an e.g.
If there is a column name (e.g. gene names / AGI) I know how to do this like below)
df1 <- structure(list(AGI = c("ATCG01240", "ATCG01310", "ATMG00070"), aox2_0h__1 = c(15.79105291, 14.82652303, 14.70630068), aox2_0h__2 = c(16.06494674, 14.50610036, 14.52189807), aox2_0h__3 = c(14.64596287, 14.73266459, 13.07143141), aox2_0h__4 = c(15.71713641, 15.15430026, 16.32190068 ), aox2_12h__1 = c(14.99030606, 15.08046949, 15.8317372), aox2_12h__2 = c(15.15569857, 14.98996474, 14.64862254), aox2_12h__3 = c(15.12144791, 14.90111092, 14.59618842), aox2_12h__4 = c(14.25648197, 15.09832061, 14.64442686), aox2_24h__1 = c(15.23997241, 14.80968391, 14.22573239 ), aox2_24h__2 = c(15.57551513, 14.94861669, 15.18808897), aox2_24h__3 = c(15.04928714, 14.83758685, 13.06948037), aox2_24h__4 = c(14.79035385, 14.93873234, 14.70402827), aox5_0h__1 = c(15.8245918, 14.9351844, 14.67678306), aox5_0h__2 = c(15.75108628, 14.85867002, 14.45704948 ), aox5_0h__3 = c(14.36545859, 14.79296855, 14.82177912), aox5_0h__4 = c(14.80626019, 13.43330964, 16.33482718), aox5_12h__1 = c(14.66327372, 15.22571466, 16.17761867), aox5_12h__2 = c(14.58089039, 14.98545497, 14.4331578), aox5_12h__3 = c(14.58091828, 14.86139511, 15.83898617 ), aox5_12h__4 = c(14.48097297, 15.1420725, 13.39369381), aox5_24h__1 = c(15.41855602, 14.9890092, 13.92629626), aox5_24h__2 = c(15.78386057, 15.19372889, 14.63254456), aox5_24h__3 = c(15.55321382, 14.82013321, 15.74324956), aox5_24h__4 = c(14.53085803, 15.12196994, 14.81028556 ), WT_0h__1 = c(14.0535031, 12.45484834, 14.89102226), WT_0h__2 = c(13.64720361, 15.07144643, 14.99836235), WT_0h__3 = c(14.28295759, 13.75283646, 14.98220861), WT_0h__4 = c(14.79637443, 15.1108037, 15.21711524 ), WT_12h__1 = c(15.05711898, 13.33689777, 14.81064042), WT_12h__2 = c(14.83846779, 13.62497318, 14.76356308), WT_12h__3 = c(14.77215863, 14.72814995, 13.0835214), WT_12h__4 = c(14.70685445, 14.98527337, 16.12727292), WT_24h__1 = c(15.43813077, 14.56918572, 14.92146565 ), WT_24h__2 = c(16.05986898, 14.70583866, 15.64566505), WT_24h__3 = c(14.87721853, 13.22461859, 16.34119942), WT_24h__4 = c(14.92822133, 14.74382383, 12.79146694)), class = "data.frame", row.names = c(NA, -3L))
sdf1 <- gather(df1, "group", "Expression",-AGI) %>% separate(group, c("sample", "time", "r")) %>% unite(tgroup, c("sample", "time")) %>% group_by(AGI, tgroup) %>% summarize(expression_mean = mean(Expression)) %>% spread(tgroup, expression_mean) %>% column_to_rownames(colnames(.)[1])
Above sample AGI is in the first column, my current working df's this column is in "0" column and it has not named as "AGI"
Q1- How can I drop "0" column for gather?
Q2 - How can I give a column name to "0" column (i.e. AGI)?

Highcharter Unlimited Drill down using R

I looked at this SO question, that SO question. I followed this github method check out ISSUES to help get ideas on how to solve this.
Step 1: Copy the following into text editor and save as test.csv file:
My datasource is NVD JSON FEEDS which I processed and cleaned and created this dataset with the first 25 rows shown here.
Year,Vendor name,Product name,CVE,Major,Minor,Build,Revision
1988,eric_allman,sendmail,CVE-1999-0095,5,5.58,,
1988,ftp,ftp,CVE-1999-0082,*,,,
1988,ftpcd,ftpcd,CVE-1999-0082,*,,,
1989,bsd,bsd,CVE-1999-1471,4,4.2,,
1989,bsd,bsd,CVE-1999-1471,4,4.3,,
1989,sun,sunos,CVE-1999-1122,4,4.0,,
1989,sun,sunos,CVE-1999-1122,4,4.0,4.0.1,
1989,sun,sunos,CVE-1999-1122,4,4.0,4.0.3,
1989,sun,sunos,CVE-1999-1467,4,4.0,,
1989,sun,sunos,CVE-1999-1467,4,4.0,4.0.1,
1989,sun,sunos,CVE-1999-1467,4,4.0,4.0.2,
1989,sun,sunos,CVE-1999-1467,4,4.0,4.0.3,
1989,sun,sunos,CVE-1999-1467,4,4.0,4.0.3c,
1990,digital,vms,CVE-1999-1057,5,5.3,,
1990,freebsd,freebsd,CVE-2000-0388,3,3.0,,
1990,freebsd,freebsd,CVE-2000-0388,3,3.1,,
1990,freebsd,freebsd,CVE-2000-0388,3,3.2,,
1990,freebsd,freebsd,CVE-2000-0388,3,3.3,,
1990,freebsd,freebsd,CVE-2000-0388,3,3.4,,
1990,hp,apollo_domain_os,CVE-1999-1115,sr10,sr10.2,,
1990,hp,apollo_domain_os,CVE-1999-1115,sr10,sr10.3,,
1990,next,nex,CVE-1999-1392,1,1.0a,,
1990,next,next,CVE-1999-1198,2,2.0,,
1990,next,next,CVE-1999-1391,1,1.0,,
1990,next,next,CVE-1999-1391,1,1.0a,,
Step 2: Copy and paste following code into R and run it:
My codes follow this SO question especially the discussion by #NinjaElvis. I think it is possible to create 3 levels or more. Just doing more research and figuring out.
############################
suppressPackageStartupMessages(library("highcharter"))
library("dplyr")
library("purrr")
library("data.table")
second_el_to_numeric <- function(ls){
map(ls, function(x){
x[[2]] <- as.numeric(x[[2]])
x
})
}
cve_affected_product <- fread("test.csv")
# LAYER ONE YEAR DRILLDOWN VIEW #########################
Year <- cve_affected_product[,c(1:2)]
Year <- unique(Year[,list(Year,`Vendor name`)])
Year <- Year[,c(1)][,count:=1]
Year <- setDT(aggregate(.~ Year ,data=Year,FUN=sum))
#setorder(Year, Year, `Vendor name`)
years_df <- tibble(
name = c(Year$Year),
y = c(Year$count),
drilldown = tolower(paste(name,'id'))
)
ds <- list_parse(years_df)
names(ds) <- NULL
# Vendor View HC ###########
hc <- highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Basic drilldown") %>%
hc_xAxis(type = "category") %>%
hc_yAxis(visible = FALSE,reversed = FALSE) %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(
series = list(
boderWidth = 0,
dataLabels = list(enabled = TRUE)
)
) %>%
hc_add_series(
name = "Vendors",
colorByPoint = TRUE,
data = ds
)
# LAYER TWO VENDOR DRILLDOWN VIEW #########################
Vendor <- cve_affected_product[,c(1:3)]
Vendor <- unique(Vendor[,list(Year,`Vendor name`,`Product name`)])
Vendor <- Vendor[,c(1:2)][,count:=1]
Vendor <- setDT(aggregate(.~ Year+`Vendor name` ,data=Vendor,FUN=sum))
setorder(Vendor, Year, `Vendor name`)
years <- as.character(unique(Vendor$Year))
for(i in 1:length(years)){
tempdf <- Vendor[Vendor$Year==years[i],]
dfname <- paste("df",years[i],sep="")
dsname <- paste("ds",years[i],sep="")
X <- tibble(
name = c(tempdf$`Vendor name`),
y = c(tempdf$count),
drilldown = tolower(paste(name,'id'))
)
Y <- second_el_to_numeric(list_parse2(assign(dfname,X)))
assign(dsname,Y)
}
# Vendor View HC ###########
hc <- hc %>%
hc_drilldown(
allowPointDrilldown = TRUE,
series = list(
list(
id = "1988 id",
data = ds1988,
colorByPoint = TRUE,
keys = list('name','y','drilldown')
),
list(
id = "1989 id",
data = ds1989,
keys = list('name','y','drilldown')
),
list(
id = "1990 id",
data = ds1990,
keys = list('name','y','drilldown')
)#,
)
)
# LAYER THREE PRODUCT DRILLDOWN VIEW #########################
Product <- cve_affected_product[,c(1:4)]
Product <- unique(Product[,list(Year,`Vendor name`,`Product name`, CVE)])
Product <- Product[,c(1:3)][,count:=1]
Product <- setDT(aggregate(.~ Year+`Vendor name`+`Product name` ,data=Product,FUN=sum))
setorder(Product, Year, `Vendor name`,`Product name`)
vendors <- as.character(unique(Product$`Vendor name`))
for(i in 1:length(vendors)){
tempdf <- Product[Product$`Vendor name`==vendors[i],]
dfname <- paste("df",vendors[i],sep="")
dsname <- paste("ds",vendors[i],sep="")
X <- tibble(
name = c(tempdf$`Product name`),
y = c(tempdf$count),
drilldown = tolower(paste(name,'id'))
)
Y <- second_el_to_numeric(list_parse2(assign(dfname,X)))
assign(dsname,Y)
}
# Product View HC ###########
hc <- hc %>%
hc_drilldown(
allowPointDrilldown = TRUE,
series = list(
list(
id = "eric_allman id",
data = dseric_allman,
keys = list('name','y','drilldown')
),
list(
id = "ftp id",
data = dsftp,
keys = list('name','y','drilldown')
),
list(
id = "ftpcd id",
data = dsftpcd,
keys = list('name','y','drilldown')
),
list(
id = "bsd id",
data = dsbsd,
keys = list('name','y','drilldown')
),
list(
id = "sun id",
data = dssun,
keys = list('name','y','drilldown')
),
list(
id = "digital id",
data = dsdigital,
keys = list('name','y','drilldown')
),
list(
id = "freebsd id",
data = dsfreebsd,
keys = list('name','y','drilldown')
),
list(
id = "hp id",
data = dshp,
keys = list('name','y','drilldown')
),
list(
id = "next id",
data = dsnext,
keys = list('name','y','drilldown')
)#,
)
)
# LAYER FOUR CVE DRILLDOWN VIEW #########################
Product_CVE <- cve_affected_product[,c(1:5)]
Product_CVE <- unique(Product_CVE[,list(Year,`Vendor name`,`Product name`, CVE, Major)])
Product_CVE <- Product_CVE[,c(1:4)][,count:=1]
Product_CVE <- setDT(aggregate(.~ Year+`Vendor name`+`Product name`+CVE ,data=Product_CVE,FUN=sum))
setorder(Product_CVE, Year, `Vendor name`,`Product name`, CVE)
products <- as.character(unique(Product_CVE$`Product name`))
for(i in 1:length(products)){
tempdf <- Product_CVE[Product_CVE$`Product name`==products[i],]
ifelse(tempdf$`Vendor name`==tempdf$`Product name`,
dfname <- paste("df_",products[i],sep=""),
dfname <- paste("df",products[i],sep=""))
ifelse(tempdf$`Vendor name`==tempdf$`Product name`,
dsname <- paste("ds_",products[i],sep=""),
dsname <- paste("ds",products[i],sep=""))
X <- tibble(
name = gsub("-", "", c(tempdf$CVE)),
y = c(tempdf$count),
drilldown = tolower(paste(name,'id'))
)
Y <- second_el_to_numeric(list_parse2(assign(dfname,X)))
assign(dsname,Y)
}
# CVE View HC ###########
hc <- hc %>%
hc_drilldown(
allowPointDrilldown = TRUE,
series = list(
list(
id = "sendmail id",
data = dssendmail,
keys = list('name','y','drilldown')
),
list(
id = "ftp id",
data = ds_ftp,
keys = list('name','y','drilldown')
),
list(
id = "ftpcd id",
data = ds_ftpcd,
keys = list('name','y','drilldown')
),
list(
id = "bsd id",
data = ds_bsd,
keys = list('name','y','drilldown')
),
list(
id = "sunos id",
data = dssunos,
keys = list('name','y','drilldown')
),
list(
id = "vms id",
data = dsvms,
keys = list('name','y','drilldown')
),
list(
id = "freebsd id",
data = ds_freebsd,
keys = list('name','y','drilldown')
),
list(
id = "apollo_domain_os id",
data = dsapollo_domain_os,
keys = list('name','y','drilldown')
),
list(
id = "nex id",
data = dsnex,
keys = list('name','y','drilldown')
),
list(
id = "next id",
data = ds_next,
keys = list('name','y','drilldown')
)
)
)
Output First Level:
Output Second Level Clicking on 1988:
ISSUES:
I should be able to click on vendor eric_allman and get a drill down, however I am not. I want to be able to drill down all the way to Revision if it exists. This is just prototypying new functionality for my app to get it working. However, highcharter does not make this easy or efficient. My dataset has almost 4 million observations. That will be the next struggle how to handle that.
I am even considering using D3 by creating a JSON file in python if I cannot do it using R. However, creating the JSON file in python is not trivial but doable. I am currently working on python code as a back up.
Thank you for the help and any suggestions
Interestingly enough after posting this question, i decided to do more research, since most of the research was pointing me to example of how to do it in javascript. Google search took me to this SO response by #K. Rohde
This post is from 2015 and I really appreciate how he explained the two different approaches. I ended up using hybrid approach borrowing from both.
For those interested in seeing how the drill down is working, go to my shiny app. Once you on the page Click on Visualizations, then "Drill Down For Vendor CVE Affected Products Versions and Revisions"and try it out. Again I would not have done it without #K. Rohde write up.

Resources