unexpected error when manipulating list of data.frame - r

I have list of data.frame as an output of custom function, so I intend to split each data.frame by its last column, where threshold is given. However, I manipulated the two list nicely, and combined them to get only one table. But I have an error when manipulating this new table. I can't figure out where is issue come from. How can I fix this error ? Can anyone point me out to possibly fix this error ? If this error can be fixed, I want to implement wrapper. How can I easily manipulate list of data.frame ? Any better idea to debug the error ?
mini example :
savedDF <- list(
bar = data.frame(.start=c(12,21,37), .stop=c(14,29,45), .score=c(5,9,4)),
cat = data.frame(.start=c(18,42,18,42,81), .stop=c(27,46,27,46,114), .score=c(10,5,10,5,34)),
foo = data.frame(.start=c(3,3,33,3,33,91), .stop=c(24,24,10,24,10,17), .score=c(22,22,6,22,6,7))
)
discardedDF <- list(
bar = data.frame(.start=c(16,29), .stop=c(20,37), .score=c(2,11)),
cat = data.frame(.start=c(21,31), .stop=c(23,43), .score=c(1,9)),
foo = data.frame(.start=c(54, 79), .stop=c(71,93), .score=c(3,8))
)
I can manipulate this way :
both <- do.call("rbind", c(savedDF, discardedDF))
cn <- c("letter", "seq")
# FIXME :
DF <- cbind(
read.table(text = chartr("_", ".", rownames(both)), header=T, sep = ".", col.names = cn),
both)
DF <- transform(DF, isPassed = ifelse(.score > 8, "Pass", "Fail"))
by(DF, DF[c("letter", "isPassed")],
function(x) write.csv(x[-(1:length(savedDF))],
sprintf("%s_%s_%s.csv", x$letter[1], x$isPassed[1])))
But I have an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 15 did not have 2 elements
Why I have this error ? Can anyone point me out how to fix this ?
my desired output is list of CSV file as follows :
bar.saved.Pass.csv
bar.saved.Fail.csv
bar.discarded.Pass.csv
bar.discarded.Fail.csv
cat.saved.Pass.csv
cat.saved.Fail.csv
cat.discarded.Pass.csv
cat.discarded.Fail.csv
foo.saved.Pass.csv
foo.saved.Fail.csv
foo.discarded.Pass.csv
foo.discarded.Fail.csv
But I think controlling exported CSV files still not desired. How can I improve functionality of this wrapper ? I intend to let use choose output directory by custom, or more dynamic would be nice. Any idea ? Thanks a lot

Is this what you are looking for?
library(tidyverse)
library(magrittr)
both <- do.call("rbind", c(savedDF, discardedDF))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"),
isDiscard = ifelse(is.na(seq), "Saved", "Discarded"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$isDiscard))
csv_names <- paste0("/Users/nathanday/Desktop/", names(list_of_dfs), ".csv") # change this path
mapply(write.csv, list_of_dfs, csv_names)
The %<>% operator is short hand so both %<>% rownames_to_columm(var = "cn") is identical to both <- rownames_to_column(both, var = "cn")
To make it more "dynamic" for allowing output path input, you could wrap this in the function structure you already have like this:
output_where <- function(output_path, list1, list2) {
if (!dir.exists(output_path)) {
dir.create(file.path(output_path))
}
both <- do.call(rbind, c(list1, list2))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"), isDiscard = ifelse(is.na(seq), "Saved", "Discarded"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$isDiscard))
csv_names <- paste0(output_path, names(list_of_dfs), ".csv")
return(mapply(write.csv, list_of_dfs, csv_names))
}
output_where("~/Desktop/", savedDF, discardedDF)
for even more dynamics:
output_where <- function(output_path, list1, list2) {
if (!dir.exists(output_path)) {
dir.create(file.path(output_path))
}
names(list1) <- paste("list1", names(list1), sep = ".")
names(list2) <- paste("list2", names(list2), sep = ".")
both <- do.call(rbind, c(list1, list2))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("original_list", "letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$original_list))
csv_names <- paste0(output_path, names(list_of_dfs), ".csv")
return(mapply(write.csv, list_of_dfs, csv_names))
}

Related

Function for looping over common variable names with different suffixes in R

I have some code which I'm looking to replicate many times, each for a different country as the suffix.
Assuming 3 countries as a simple example:
country_list <- c('ALB', 'ARE', 'ARG')
I'm trying to create a series of variables called a_m5_ALB, a_m5_ARE, a_m5_ARG etc which have various functions e.g. addcol or round_df applied to reg_math_ALB, reg_math_ARE, reg_math_ARG etc
for (i in country_list) {
paste("a_m5", i , sep = "_") <- addcol(paste("reg_math", i , sep = "_"))
}
for (i in country_list) {
paste("a_m5", i , sep = "_") <- round_df(paste("reg_math", i , sep = "_"))
}
where addcol and round_df are defined as:
addcol = function(y){
dat1 = mutate(y, p.value = ((1 - pt(q = abs(reg.t.value), df = dof))*2))
return(dat1)
}
round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
The loop errors when any of the functions are added in brackets before the paste variable part but it works if doing it manually e.g.
a_m5_ALB <- addcol(reg_math_ALB)
Please could you help? I think it's the application of the function in a loop which i'm getting wrong.
Errors:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "character"
Error in round(x[numeric_columns], digits) :
non-numeric argument to mathematical function
Thank you
From your examples, you're really in a case where everything should be in a single dataframe. Here, keeping separate variables for each country is not the right tool for the job. Say you have your per-country dataframes saved as csv, you can rewrite everything as:
library(tidyverse)
country_list <- c('ALB', 'ARE', 'ARG')
read_data <- function(ctry){
read_csv(paste0("/path/to/file/", "reg_math_", ctry)) %>%
add_column(country = ctry)
}
total_df <- map_dfr(country_list, read_data)
total_df %>%
mutate(p.value = (1 - pt(q = abs(reg.t.value), df = dof))*2) %>%
mutate(across(where(is.numeric), round, digits = digits))
And it gives you immediate access to all other dplyr functions that are great for this kind of manipulation.

issues with Iterating over f(x, y) using map2_

I am using a wrapper for the LastFM API to search for track Tags.
the wrapper function is...
devtools::install_github("juyeongkim/lastfmr")
track_getInfo("track", "artist", api_key= lastkey)
I defined my own function as
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y,api_key = lastkey)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(output_1)
}
Then prepared my list elements from my larger data frame
artist4lf <- c(small_descriptive[1:10,2])
track4lf <- c(small_descriptive[1:10,3])
x<- vector("list", length = length(track4lf))
y<- vector("list", length = length(artist4lf))
names(x) <- track4lf
names(y) <- artist4lf
Then...
map2_df(track4lf, artist4lf, INFOLM)
I get a 0x0 tibble back everytime... does anyone have a suggestion?
I think your INFOLM function will work if you just delete the api_key argument from the track_getInfo function.
Also, not sure you need to use purrr::map2 here, you should be able to use your small_descriptive dataframe with rowwise and mutate to add the column(s) you want.
Here's a go, using testdf as if it's your small_descriptive dataframe with only the track and artist columns.
library(lastfmr)
library(dplyr)
library(tidyr)
testdf <- tribble(
~Artist, ~Track,
"SmashMouth", "All Star",
"Garth Brooks", "The Dance"
)
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(paste(output_1, collapse = ","))
}
testdf %>% rowwise %>%
mutate(stuff = INFOLM(Track, Artist)) %>%
tidyr::separate(stuff, c("Tag1", "Tag2", "Tag3", "Tag4", "Tag5"), sep = ",")

Data Harvesting in R: Get nested lists, unlist, make edits, re-nest them back

the following code harvests data from a website. I retrieve a list of lists, I want to unlist one of the lists, edit it, then re-nest it back into the data into the form the data was received. Here is my code below, it fails one the re-nesting.
library(jsonlite)
library(plyr)
library(ckanr)
library(purrr)
library(dplyr)
ckanr_setup(url = "https://energydata.info/")
package_search(q = 'organization:world-bank-grou')$count
json_data2 <- fromJSON("https://energydata.info/api/3/action/package_search?q=organization:world-bank-grou", flatten = TRUE)
dat2 <- json_data2$result
str(dat2)
###########
#Get the datasets and unlist metadata
###########
df <- as.data.frame(json_data2$result$results)
Tags <- select(df, id, topic)
#Make some edits
Tags$topic <- tolower(Tags$topic)
res <- rbind.fill(lapply(Tags,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
res$V1 = paste0("Some edit:",res$V1)
res$V2 = paste0("Some edits:", res$V2)
res$V3 = paste0("Some edit:", res$V3)
res[res=="Some edit:NA"]<-NA
res$V1 <- gsub(" ", "_", res$V1)
res$V2 <- gsub(" ", "_", res$V2)
res$V3 <- sub(" ", "_", res$V3)
res
###########
#Re-nest
###########
#turning res df back into list of lists
nestedList <- flatten(by_row(res, ..f = function(x) flatten_chr(x), .labels = FALSE)) #FAILS HERE
ERROR: Error in flatten(by_row(res, ..f = function(x) flatten_chr(x),
.labels = FALSE)) : could not find function "by_row"
Unclear from the question wording exactly what kind of list of lists you want to end up with, but maybe this is what you're looking for?
res %>%
rowwise() %>%
as.list()
or
res %>%
t() %>%
as.data.frame() %>%
rowwise() %>%
as.list()

read_excel won't trim whitespace

I am using the package readxl to load an excel file. As default it should strip the white space however it is not doing so.
The file can be downloaded directly from the link below or alternatively it can be downloaded through the website where it is Appendix B
http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/Electricity-Ten-Year-Statement/
http://www2.nationalgrid.com/WorkArea/DownloadAsset.aspx?id=8589937799
require(readxl);require(tidyverse)
test <- read_excel("ETYS 2016 Appendix B.xlsx", skip = 1, sheet = 22, trim_ws = TRUE)
print(test$`MVAr Generation`)
test$`MVAr Generation` %>% str_count(patter = "\\s")
test$`MVAr Generation` %>% table #all are numeric
test$`MVAr Generation` %>% class #however the class is characer
test$`MVAr Generation` %>% str_count(patter = "\\s") %>%
sum(na.rm = T) #It should be 0 however it is 2
This problem is causing problems in the analysis as can be seen by this example in which the numeric column is a character.
Help would be appreciated
library(readxl)
readxl::excel_sheets('ETYS 2016 Appendix B.xlsx')[22]
test <- read_excel("ETYS 2016 Appendix B.xlsx", skip = 1, sheet = 22,
trim_ws = FALSE)
test$`MVAr Generation` <- as.numeric(gsub('^\\s', "", test$`MVAr Generation`))
The error is probably due to character encoding. I get this error when I forced numeric interpretation of the column:
Expecting numeric in D9 / R9C4: got 'Â 225'
You can manually avoid this by substituting leading spaces with gsub.
Maybe this is what you want:
library(xlsx)
test <- read.xlsx("ETYS 2016 Appendix B.xlsx", sheetName = 22,
colIndex = 1:7, startRow = 2, header = TRUE,
stringsAsFactors = FALSE)
# remove whitespace
test <- data.frame(lapply(test, function(y) {
y <- gsub("^\\s+", "", y);
y <- gsub("Â", "", y); y
y <- gsub("^\\s+", "", y);
}))
# set tidy cols to numeric
cols = c(3, 4, 5, 7)
test[,cols] = apply(test[,cols], 2, function(x) as.numeric(x))
# test
class(test$Unit.Number)
test$MVAr.Absorption
The insight of #troh with the character encoding got me to think about using regex. #jaySF 's application across the whole dataframe was a good way to process all the columns at same time. The two suggestions lead me to the below answer.
require(dplyr);require(purrr);require(readr)
RemoveSymbols <-function(df) {
df %>% mutate_all( funs(gsub("^[^A-Z0-9]", "", ., ignore.case = FALSE))) %>%
map_df(parse_guess)
}
test2 <- RemoveSymbols(test)
sapply(test2,class)

recognizing items in values list in R in for loop

I am having some trouble getting R to recognize items in my Values list (in RStudio) in a function call (just referring to it as a generic function here). Here's an example...the following works just fine if I type it in directly:
result <- function(cnv.chr1.S1, cnv.chr1.S2, cnv.chr1.S3)
because cnv.chr1.S1, cnv.chr1.S2, and cnv.chr1.S3 are objects (specifically GRanges objects) that I've created previously.
But as I'm looping over different chromosomes and there are really many more than 3 samples (S1, S2, S3), I've tried the following (simplified here)
chrom <- paste("chr", 1:1, sep = "")
sample.names <- paste("S", 1:3, sep = "")
for (thischrom in chrom)
{
for (sample in sample.names)
{
a <- function(list(paste(paste("cnv", thischrom, sep = "."), sample.names, sep = ".")))
}
}
However, it doesn't work because
paste(paste("cnv", thischrom, sep = "."), sample.names, sep = ".")
just creates a character list of items that have the same names as the items in my Values list. How do I get R to access the appropriate objects in my Values list?
Thanks for any thoughts you might have!
Steve
Are you looking for something like this?
library(dplyr)
chrom <- paste("chr", 1:1, sep = "")
sample.names <- paste("S", 1:2, sep = "")
cnv.chr1.S1 = c(1, 2)
cnv.chr1.S2 = c(2, 3)
result =
data_frame(chrom = chrom) %>%
merge(data_frame(sample.names = sample.names) ) %>%
rowwise %>%
mutate(object =
paste("cnv", chrom, sample.names, sep = ".") %>%
parse(text = .) %>%
eval %>%
list)

Resources