I am trying to expand out some input data that comes in character format. In the below example --
jv17 <- list(4.2017,5.2017,6.2017,7.2017,8.2017,9.2017,10.2017)
jv18 <- list(4.2018,5.2018,6.2018,7.2018,8.2018,9.2018,10.2018)
mylist1 <- list("jv17")
mylist2 <- list("jv17", "jv18")
This does what I want:
eval(parse(text = mylist1))
This does not:
eval(parse(text = mylist2))
The output I am looking for:
list(jv17,jv18)
I feel very confident this could be done with lapply but I'm not as good as I should be with it. Here is my best guess but I get an error that is not intuitive (at least to me).
lapply(x = mylist2, FUN = eval(parse(text = .)))
Any help is greatly appreciated, thank you.
Related
I am trying to automatically spell-check a string column of a data.table/data.frame.
Looking around, I found several approaches that all give an "out of bounds" error in the case hunspell.suggest returns no suggestions (that is, an empty list, e.g. "pippasnjfjsfiadjg"), see approaches here (the accepted answer here yields NA so does work in principal) and here
We seem to require unlist in order to identify these empty suggestions and then exclude them from the part of the code that picks the first suggestion but I cannot figure out how.
library(dplyr)
library(stringi)
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("pippasnjfjsfiadjg came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"),
stringsAsFactors = FALSE)
# Get bad words.
badwords <- hunspell(df1$Text) %>% unlist
# Extract the first suggestion for each bad word.
suggestions <- sapply(hunspell_suggest(badwords), "[[", 1)
mutate(df1, Text = stri_replace_all_fixed(str = Text,
pattern = badwords,
replacement = suggestions,
vectorize_all = FALSE)) -> out
You'll want to filter the list of bad words and suggestions to get rid of those without suggestions
badwords <- hunspell(df1$Text) %>% unlist()
# note use of '[' rather than '[['
suggestions <- sapply(hunspell_suggest(badwords), '[', 1)
badwords <- badwords[!is.na(suggestions)]
suggestions <- suggestions[!is.na(suggestions)]
Suppose I have the following:
format.string <- "#AB#-#BC#/#DF#" #wanted to use $ but it is problematic
value.list <- c(AB="a", BC="bcd", DF="def")
I would like to apply the value.list to the format.string so that the named value is substituted. So in this example I should end up wtih a string: a-bcd/def
I tried to do it like the following:
resolved.string <- lapply(names(value.list),
function(x) {
sub(x = save.data.path.pattern,
pattern = paste0(c("#",x,"#"), collapse=""),
replacement = value.list[x]) })
But it doesn't seem to be working correctly. Where am I going wrong?
The glue package is designed for this. You can change the opening and closing delimiters using .open and .close, but they have to be different. Also note that value.list has to be either a list or a dataframe:
library(glue)
format.string <- "{AB}-{BC}/{DF}"
value.list <- list(AB="a", BC="bcd", DF="def")
glue_data(value.list, format.string)
# a-bcd/def
To answer your actual question, by using lapply over names(value.list) you, as your output shows, take each of the elements of value.list and perform the replacement. However, all this happens independently, i.e., the replacements aren't ultimately combined to a single result.
As to make something very similar to your approach work, we can use Reduce which does exactly this combining:
Reduce(function(x, y) sub(paste0(c("#", y, "#"), collapse = ""), value.list[y], x),
init = format.string, names(value.list))
# [1] "a-bcd/def"
If we call the anonymous function f, then the result is
f(f(f(format.string, "A"), "B"), "C")
exactly as you intended, I believe.
We can use gsubfn that can take a key/value pair as replacement to change the pattern with the 'value'
library(gsubfn)
gsub("#", "", gsubfn("[^#]+", as.list(value.list), format.string))
#[1] "a-bcd/def"
NOTE: 'value.list' is a vector and not a list
As a part of the Recommender systems course at Coursera, I am doing assignments in R (https://github.com/eponkratova/projects-recommender-system/blob/master/recommender_knit.Rmd) and so far I got a N result.
Is there a way to rename col (var renamed_mean_1) more elegantly during the step where I calculate the average by a column (var dataset_mean_1)?
install.packages('gsheet', repos="http://cran.rstudio.com/")
library('gsheet')
url <- 'https://docs.google.com/spreadsheets/d/1XDBRCYFTxsw27AivxJ5pWxDHN0WA6GqSP46PVe2BCQ4/edit?usp=sharing'
dataset <- gsheet2tbl(url)
dataset_mean_1 <- data.frame(colMeans(dataset, na.rm = TRUE))
install.packages('plyr', repos="https://cran.r-project.org")
library('plyr')
renamed_mean_1 <- rename(dataset_mean_1,c('colMeans.dataset..na.rm...TRUE.'='Mean'))
ordered_mean_1 <- head(renamed_mean_1[order(-renamed_mean_1$Mean),,drop=FALSE],n=4)
I don't have much experience with R, and for this reason, my code is a bit bulky.
Could you please help me?
Try this:
dataset_mean_1 <- data.frame(colMeans(dataset, na.rm = TRUE))
colnames(dataset_mean_1) <- "renamed_mean_1"
Or just to one call:
dataset_mean_1 <- data.frame(renamed_mean_1 =colMeans(dataset, na.rm = TRUE))
I have the following data frame
id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33
Note, they are already sorted by value within each (id,category). What I would like to be able to do is to get the top from each (id,category) and make a string, followed by the second in each (id,category) and so on. So for the above example it would look like
A,D,G,B,E,C,F
Is there a way to do it easily in R? Or am I better off relying on a Perl script to do it?
Thanks much in advance
This appears to work, but I'm certain we could simplify it somewhat, particularly if you are able to relax your ordering requirements:
library(plyr)
d <- read.table(text = "id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33",sep = ',',header = TRUE)
d <- ddply(d,.(category),transform,r = seq_along(category))
d <- arrange(d,id)
> paste(d$id[order(d$r)],collapse = ",")
[1] "A,D,G,B,E,C,F"
This version is probably more robust to ordering, and avoids plyr:
d$r <- unlist(sapply(rle(d$category)$lengths,seq_len))
d$s <- 1:nrow(d)
with(d,paste(id[order(r,s)],collapse = ","))
This works:
onion$yearone$id %in% mask$yearone
This doesn't:
onion[1][1] %in% mask[1]
onion[1]['id'] %in% mask[1]
Why? Short of an obvious way to vectorize in parallel columns in DF and in memberids (so I only get rows within each year when ids are present in both DF and memberids), im using a for loop, but I'm not being lucky at finding the right way to express the index... Help?
Example data:
yearone <- data.frame(id=c("b","b","c","a","a"),v=rnorm(5))
onion <- list()
onion[[1]] <- yearone
names(onion) <- 'yearone'
mask <- list()
mask[[1]] <- c('a','c')
names(mask) <- 'yearone'
The '$' operator is not the same as the '[' operator. If the "yearone' and 'ids' are in fact the first items in those lists you should see that this is giving the same results as the first call:
DF[[1]][[1]] %in% memberids[[1]]
Why we should think that accessing yearpathall should give the same results is entirely unclear at this point, but using the "[[" operator will possibly give an atomic vector, whereas using "[" will certainly not. The "[" operator always returns a result that is the same class as its first argument so in this case would be a list rather than a vector, for both 'DF' and 'memberids'. The %in% operator is just an infix version fo match and needs an atomic vector as both of its arguments
Here is an approach using Map
# some data
onion <- replicate(5,data.frame(id = sample(letters[1:3], 5,T), v = 1:5),
simplify = F)
mask <- replicate(5, sample(letters[1:3],2), simplify = F)
names(onion) <- names(mask) <- paste0('year', seq_along(onion))
A function that will do the matching
get_matches <- function(data, id, mask){
rows <- data[[id]] %in% mask
data[rows,]
}
Map(get_matches , data = onion, mask = mask, MoreArgs = list(id = 'id'))
This seems to be the answer I was seeking:
merge(mask[1],onion[[1]], by.x = names(mask[1]), by.y = names(onion[[1]][1]))
And applied to parallel lists of dataframes:
result <- list()
for (i in 1:(length(names(onion)))) {
result[[i]] <- merge(mask[i],onion[[i]], by.x = names(mask[i]), by.y = names(onion[[i]][1]))
}