Using apply family and multiple functions on lists in R

Using apply family and multiple functions on lists in R - r

I have a question following my answer to this question on this question
Matching vertex attributes across a list of edgelists R
My solution was to use for loops, but we should always try to optimize(vectorize) when we can.
What I'm trying to understand is how I would vectorize the solution I made in the post.
My solution was
for(i in 1:length(graph_list)){
graph_list[[i]]=set_vertex_attr(graph_list[[i]],"gender", value=attribute_df$gender[match(V(graph_list[[i]])$name, attribute_df$names)])
}
Ideally we could vectorize this with lapply but I'm having some trouble conceiving how to do that. Here's what I've got
graph_lists_new=lapply(graph_list, set_vertex_attr, value=attribute_df$gender[match(V(??????????)$name, attribute_df$names)]))
What I'm unclear about is what I'd put in the part with the ??????. The thing inside the V() function should be each item in the list, but what I don't get is what I'd put inside when I'm using lapply.
All data can be found in the link I posted, but here's the data anyway
attribute_df<- structure(list(names = structure(c(6L, 7L, 5L, 2L, 1L, 8L, 3L,
4L), .Label = c("Andy", "Angela", "Eric", "Jamie", "Jeff", "Jim",
"Pam", "Tim"), class = "factor"), gender = structure(c(3L, 2L,
3L, 2L, 3L, 1L, 1L, 2L), .Label = c("", "F", "M"), class = "factor"),
happiness = c(8, 9, 4.5, 5.7, 5, 6, 7, 8)), class = "data.frame", row.names = c(NA,
-8L))
edgelist<-list(structure(list(nominator1 = structure(c(3L, 4L, 1L, 2L), .Label = c("Angela",
"Jeff", "Jim", "Pam"), class = "factor"), nominee1 = structure(c(1L,
2L, 3L, 2L), .Label = c("Andy", "Angela", "Jeff"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L)), structure(list(nominator2 = structure(c(4L, 1L, 2L, 3L
), .Label = c("Eric", "Jamie", "Oscar", "Tim"), class = "factor"),
nominee2 = structure(c(1L, 3L, 2L, 3L), .Label = c("Eric",
"Oscar", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L)))
graph_list<- lapply(edgelist, graph_from_data_frame)

Since you need to use graph_list[[i]] multiple times in your call, to use lapply you need to write a custom function, such as this anonymous function. (It's the same code as your loop, I just wrapped it in function(x) and replaced all instances of graph_list[[i]] with x.)
graph_list = lapply(graph_list, function(x)
set_vertex_attr(x, "gender", value = attribute_df$gender[match(V(x)$name, attribute_df$names)])
)
(Note that I didn't test this, but it should work unless I made a typo.)
lapply isn't vectorization---it's just "loop hiding". In this case, I think your for loop is a nicer way to do things than lapply. Especially since you are modifying existing objects, your simple for loop will probably be more efficient than an lapply solution, as well as more readable.
When we talk about vectorization for efficiency, we almost always mean atomic vectors, not lists. (It's vectorization, after all, not listization.) The reason to use lapply and related functions (sapply, vapply, Map, most of the purrr package) isn't computer efficiency, it's readability, and human-efficiency to write.
Let's say you have a list of data frames, my_list = list(iris, mtcars, CO2). If you want to get the number of rows for each of the data frames in the list and store it in a variable, we could use sapply or a for loop:
# easy to write, easy to read
rows_apply = sapply(my_list, nrow)
# annoying to read and write
rows_for = integer(length(my_list))
for (i in seq_along(my_list)) rows_for[i] = nrow(my_list[[i]])
But the more complex your task gets, the more readable a for loop becomes compared to an alternative like these. In your case, I'd prefer the for loop.
For more reading on this, see the old question Is apply more than syntactic sugar?. Since those answers were written, R has been upgraded to include a just-in-time compiler, which further speeds up for loops relative to apply. In the nearly 10-year-old answers there, you'll see that sometimes *apply is slightly faster than a for loop. Since the JIT compiler, I think you'll find the opposite: most of the time a for loop is slightly faster than *apply.
But in both of those cases, unless you're doing something absolutely trivial inside the for/apply, whatever you do inside for/apply will dominate the timings.

Related

Distribution of files by folder from the specified path in R

I have csv file, which indicates paths to the jpg files in their folders. The columns indicate names of the folders in which jpg must be copied, and in rows there are paths to the jpg in their original folder (from which it must be copied). Sharing the example by dput()
mydata=structure(list(x1 = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 1L, 1L),
.Label = c("", "C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\17992279.png", "C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44245909_10_173_201907311705.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44253326_03_61_201907311507.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44249755_10_191_201907311444.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44253009_10_935_201907311358.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44254483_01_241_201907311457.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44537611_10_71_201908281506.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44548452_10_973_201908291551.jpg"),
class = "factor"), x2 = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 1L, 1L),
.Label = c("", "C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44243943_10_916_201907311338.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44245909_10_173_201907311705.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44299011_10_52_201908281735.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44305733_10_845_201908261634.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44249755_10_191_201907311444.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44254483_01_241_201907311457.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44537550_10_155_201908310857.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\DKRBP18729589_08_881_201907311205.jpg"),
class = "factor"), x3 = structure(1:11, .Label = c("C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44243943_10_916_201907311338.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44245909_10_173_201907311705.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44265269_10_52_201908280944.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44299011_10_52_201908281735.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44305733_10_845_201908261634.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCI44540448_10_973_201908291524.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44249755_10_191_201907311444.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44254483_01_241_201907311457.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44537550_10_155_201908310857.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\CCSRBP44537577_10_890_201908271624.jpg",
"C:\\Users\\OCR\\Downloads\\OCR pass 2\\input\\DKRBP18729589_08_881_201907311205.jpg"),
class = "factor")), .Names = c("x1", "x2", "x3"), class = "data.frame", row.names = c(NA,
-11L))
So, all the jpg file paths which are indicated in x1 column must be copied to C:\\X1\
and all jpg files paths which are indicated in x2 column must be copied to C:\\X2\.
All jpg files paths which are indicated in x3 column must be copied to C:\\X3\.
How to do it, via R?

max,
it looks like, that when you read in your csv, you didn't put the option stringsAsFactors=FALSE, which leads to problems with the functions after.
You can convert x1 etc through
mydata=mydata %>% mutate_all(na_if,"") #sets the empty entries to NA
mydata=lapply(mydata, as.character) #sets all to character
mydata=lapply(mydata, na.exclude) #removes the NAs, empty elements would throw errors.
file.copy(from=mydata$x1,to=file.path("C:/X1",basename(mydata$x1)))}) #copies for the first "column".
I never do things like copying with lapply, because if one does it wrong it can get messy. Depending on how big you data frame is, you could try to do that too.
But I recommend to just rewrite the last line, which will also give you more control on where files are going.
file.copy(from=mydata$x2,to=file.path("C:/X2",basename(mydata$x2)))})
file.copy(from=mydata$x3,to=file.path("C:/X3",basename(mydata$x3)))})
Note: in R it's far more convenient to use / as separators in file paths, this also works on windows.

Converting init into date?

I have been working with a dataset that comes with a date column. When I run typeof(headlineDat$Date) I get a type integer.
I've tried pasting in a few things I found off google but none have seemed to work. I've tried running this piece of code
as.POSIXct(strptime(headlineDat$Time.read,format= "%Y-%m-%d"))
My aim is to have the same format as the year column below. The reason why I want to do this is that I want to be able to create a unique identifier so I can easily match dates when I merge the two data frames.
Any help on this would be greatly appreciated !
This is my dput output:
dput(droplevels(headlineDat[1:5, ]))
structure(list(Date = structure(c(1L, 3L, 3L, 2L, 4L), .Label = c("2018-04-26T11:31:02+00:00",
"2018-05-02T21:10:20+00:00", "2018-05-03T15:30:59+00:00", "2018-05-03T18:00:39+00:00"
), class = "factor"), Headline = structure(c(5L, 2L, 4L, 3L,
1L), .Label = c("Bitcoin Futures Trading Questioned By Chinese National Media",
"Daily Volatility Decline? Bitcoin Has Seen $1K Range 43 Times In 2018",
"Reddit to Relaunch Bitcoin Payments (And Add More Cryptos)",
"Sell In May and Go Away? Not for Bitcoin Bulls", "Square Books Small Profit for First Quarter of Bitcoin Sales"
), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

You are starting with a standard format, so as.Date does the conversion just fine.
headlineDat$Date = as.Date(headlineDat$Date)

read.table from write.table in R

I'm trying to do a qdap::multigsub in order to fix some typos, misspelled names, variant expressions and some other "aberrations" in a list of climatic event types (yes, it's the NOAA's data set on storms that belongs to an assignment in a coursera class on reproducible research; although this fixing is neither required nor expected in the assignment: it's me trying my best!).
So I have events named "flash flood", "flash flooding", "flash floods" and the like, and I'd like to group them all in a level called "flash flood". So what I did first was:
expr <- c("^flash.*floo.*","thun.*")
repl <- c("flash flood","thunderstorm")
Length of each vector is 51 and this is a knitr assignment, so in order to keep it readable (margin column=80), I had to go with something like
expr <- c(expr,"new_expr_1","new_expr_2")
repl <- c(repl,"new_repl_1","new_repl_2") # repeated many, many times
Which makes the code kind of messy. Of course, I have the complete expr and repl vectors, so I would like to have each pair (expr and repl) of correspondent values in a row, so the reader of the code would have an easy time (that's why dput won't work here: they don't align each pair of values).
I tried this:
a <- data.frame(expr=expr,repl=repl)
print(a,rownames=FALSE)
# copying the output, and then
b <- read.table(header=TRUE,text="paste_text_here")
but it failed (I think because print throws the output without quotation marks and there are some two-word expr or repl). I also tried
write.table(a,rownames=FALSE)
# copying the output, and then
b <- read.table(header=TRUE,text="paste_text_here")
but it doesn't work either (I think because write.table outputs each item between quotes, and read.table finds too many quotation marks to handle).
I'd like to have in my Rmarkdown file something like this:
exprRepl <- read.table(header=TRUE,text="expr repl
expr_1 repl_1
expr_2 repl_2")
How can I achieve this from the data I have now?
dput of the first 5 rows of data frame follow:
> dput(a[1:5,])
structure(list(expr = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("^BLIZZARD.*",
"^FLASH.*FLOOD.*", "^HAIL.*", "^HEAVY.*RAIN.*", "^HURRICANE.*"
), class = "factor"), repl = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("BLIZZARD",
"FLASH FLOOD", "HAIL", "HEAVY RAIN", "HURRICANE"), class = "factor")), .Names = c("expr",
"repl"), row.names = c(NA, 5L), class = "data.frame")
If there's any other approach to replace the wrong/variant names, I'd be very happy to hear from it and give it a try!

One solution is to use a singe quote ' around the pasted text (this works as long as there are no ' in your data):
d <- structure(list(expr = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("^BLIZZARD.*",
"^FLASH.*FLOOD.*", "^HAIL.*", "^HEAVY.*RAIN.*", "^HURRICANE.*"
), class = "factor"), repl = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("BLIZZARD",
"FLASH FLOOD", "HAIL", "HEAVY RAIN", "HURRICANE"), class = "factor")), .Names = c("expr",
"repl"), row.names = c(NA, 5L), class = "data.frame")
write.table(d, row.names=FALSE)
# copy paste output of write.table in text field below:
read.table(header = TRUE, text='"expr" "repl"
"^HURRICANE.*" "HURRICANE"
"^BLIZZARD.*" "BLIZZARD"
"^FLASH.*FLOOD.*" "FLASH FLOOD"
"^HAIL.*" "HAIL"
"^HEAVY.*RAIN.*" "HEAVY RAIN"')

Error in charToDate(x) : character string is not in a standard unambiguous format

I am writing because I have nowhere else to go to get an answer. I am trying to shrink my existing table a bit. It is of the next form:
Živilec; Proizvodnja; Kariera d.o.o.; 18.11.2014 hh.mm.ss; Ljubljana
Živilec; Prehrambena industrija; Kariera d.o.o.; 18.11.2014 hh.mm.ss; Ljubljana
Vodja; Strojništvo; Adecco; 18.11.2014 hh.mm.ss; Maribor
Vodja; Tehnične storitve; Adecco; 18.11.2014 hh.mm.ss; Maribor
Vodja; Elektrotehnika; Adecco; 18.11.2014 hh.mm.ss; Celje
, the dates are actually inserted as 18.11.2014 8:35:59 but I dont need the time, just the date.
And what I wish to get to is this:
Živilec; Proizvodnja,Preh. industrija; Kariera d.o.o.; 18.11.2014; Ljubljana
Vodja; Stroj.,Teh. stor., Elektro.; Adecco; 18.11.2014; Maribor, Celje
I have tryed getting this with the help of this R-code:
matrik<-matrix(0,600,30)
for (i in 1:dim(a)[1]){
if (is.element(a[i,3],matrik[,15])==TRUE & is.element(a[i,1],matrik[,1])==TRUE){
katero<-which(a[i,1]==matrik[,1])
kdo<-which(a[i,15]==matrik[,15])
kje<-min(intersect(kdo,katero))
if (kje!=0){
prosto<-min(which(matrik[kje,2:14]==0))
matrik[kje,prosto]<-as.character(a[i,2])
prosti<-min(which(matrik[kje,17:30]==0))
matrik[kje,prosti]<-as.character(a[i,5])
}
if (kje==0){
povrsti<-min(which(matrik[,1]==0))
matrik[povrsti,1]<-as.character(a[i,1])
prosto<-min(which(matrik[povrsti,2:14]==0))+1
matrik[povrsti,prosto]<-as.character(a[i,2])
matrik[povrsti,15]<-as.character(a[i,3])
matrik[povrsti,16]<-as.character(a[i,4])
prosti<-min(which(matrik[povrsti,17:30]==0))+1
matrik[povrsti,prosti]<-as.character(a[i,5])
}
}
else {
povrsti<-min(which(matrik[,1]==0))
matrik[povrsti,1]<-as.character(a[i,1])
prosto<-min(which(matrik[povrsti,2:14]==0))+1
matrik[povrsti,prosto]<-as.character(a[i,2])
matrik[povrsti,15]<-as.character(a[i,3])
matrik[povrsti,16]<-as.character(a[i,4])
prosti<-min(which(matrik[povrsti,17:30]==0))+16
matrik[povrsti,prosti]<-as.character(a[i,5])
}
}
Basically I make a new matrix in which I will store the values, because i cannot store the categories like teh. storitve, strojništvo, elektro in one cell and just 2 values in another cell in the same column I decided to look at the maximum value of all the categories and make that many cells. If this problem is solvable otherwise please let me know aswell if you could. So anyways after making a zero matrix, I check if the first element (so "Živilec") and the third element (so "Kariera d.o.o.) are the same, if that is true I would like to just add values to the second and fifth(last) column. If not I see that I must add a new row to the existing matrix with all the values from the table. As I run this code I get the error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
What to do? Any solutions?
Thank you for your time.

In order to parse the dates, you can do it like below:
library(lubridate)
x <- c("18.11.2014 8:35:59")
as.Date(dmy_hms(x))
Otherwise, you should give the community some sample data...use
dput(your_data)
people will show you the way in no time.
UPDATE
Here is a solution:
Load some useful libraries...
library(stringr)
library(dplyr)
your data...
toy_data <-
structure(list(V1 = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Vodja",
"Živilec"), class = "factor"), V2 = structure(c(5L, 4L, 2L, 3L,
1L), .Label = c(" Elektrotehnika", " Strojništvo",
" Tehnične storitve", " Prehrambena industrija", " Proizvodnja"
), class = "factor"), V3 = structure(c(2L, 5L, 1L, 4L, 3L), .Label = c(" Adecco",
" Kariera d.o.o.", " Adecco", " Adecco",
" Kariera d.o.o."), class = "factor"), V4 = structure(c(2L, 2L,
1L, 1L, 1L), .Label = c(" 18.11.2014", " 18.11.2014"
), class = "factor"), V5 = structure(c(2L, 2L, 3L, 3L, 1L), .Label = c(" Celje",
" Ljubljana", " Maribor"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-5L))
an useful function...
my_str_c <- function(x){str_c(unique(x), collapse = ";")}
a code for your desired output...
toy_data %>%
mutate_each(funs(str_trim)) %>%
group_by(V1) %>%
summarise_each(funs(my_str_c))

Include a text representation of an object (like dput) in a function call for reproducible research

I have created a shiny app in which a user can load a file and use the object as a function argument. I also print the code to run the function locally (so that I or anyone else could copy and paste to reproduce the result).
What I would like to do is to be able to use something like dput but to save the text representation of the loaded object to an object rather than the console. dput outputs to the console, but simply returns a copy of it's first argument. I can use deparse but it fails when the length of the object exceeds width.cutoff (default 60 and max 500).
The following hacky reproducible example illustrates. In it I use image as the example function. In my case I have other functions with more arguments.
#create example matrices
m2 <- matrix(1:4,2,2)
m4 <- matrix(1:4,4,4)
#this is what I want to recreate
image(z=m2,col=rainbow(4))
image(z=m4,col=rainbow(4))
#convert the matrices to their text representation
txtm2 <- deparse(m2)
txtm4 <- deparse(m4)
#create a list of arguments
lArgs2 <- list( z=txtm2, col=rainbow(4) )
lArgs4 <- list( z=txtm4, col=rainbow(4) )
#construct arguments list
vArgs2 <- paste0(names(lArgs2),"=",lArgs2,", ")
vArgs4 <- paste0(names(lArgs4),"=",lArgs4,", ")
#remove final comma and space
vArgs2[length(vArgs2)] <- substr(vArgs2[length(vArgs2)],0,nchar(vArgs2[length(vArgs2)])-2)
vArgs4[length(vArgs4)] <- substr(vArgs4[length(vArgs4)],0,nchar(vArgs4[length(vArgs4)])-2)
#create the text function call
cat("image(",vArgs2,")")
cat("image(",vArgs4,")")
#the 1st one when pasted works
image( z=structure(1:4, .Dim = c(2L, 2L)), col=c("#FF0000FF", "#80FF00FF", "#00FFFFFF", "#8000FFFF") )
#the 2nd one gives an error because the object has been split across multiple lines
image( z=c("structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, ", "2L, 3L, 4L), .Dim = c(4L, 4L))"), col=c("#FF0000FF", "#80FF00FF", "#00FFFFFF", "#8000FFFF") )
#In an ideal world I would also like it to work when I did this, but maybe that's asking too much
image(z=txtm2,col=rainbow(4))
I realise that the way I construct the function call is a hack, but when I looked at it a while ago I couldn't find a better way of doing. Open to any suggestions. Thanks.

You can do something like :
## an object that you want to recreate
m2 <- matrix(1:4,2,2)
## use capture.output to save structure as a string in a varible
xx <- capture.output(dput(m2))
## recreate the object
m2_ <- eval(parse(text=xx))
image(z=m2_,col=rainbow(4))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using apply family and multiple functions on lists in R - r

Related

Distribution of files by folder from the specified path in R

Converting init into date?

read.table from write.table in R

Error in charToDate(x) : character string is not in a standard unambiguous format

Include a text representation of an object (like dput) in a function call for reproducible research

Categories

Resources