I have the following code which takes keywords from the user so that I can search a database of word_vectors:
theme_terms = lemmatize_words(scan(strip.white = TRUE, sep = '\n', what = character()))
I want to use the words provided in the following assignment, but the user can enter as many words as they like, and my current assignment is unfortunately static.
result = word_vectors[theme_terms[1], , drop = F] + word_vectors[theme_terms[2], , drop = F]
I want to recreate the above statement in a for loop that will take as many words as the user inputs.
I have tried the following:
a) doesnt work as I am trying to bind a non-numeric argument to a binary operator
for(i in 1:length(theme_terms)){
temp = word_vectors[theme_terms[i], , drop = F]
result = result + temp
}
b) doesnt work as its a string and I also have to remove string marks and extra + at the end
for(i in 1:length(theme_terms)){
temp = paste0(paste0("word_vectors[", theme_terms[i],", , drop = F] + "))
result = paste(result, temp)
}
Any suggestions? thank you.
edit.
word_vectors is a matrix of values which can be reproduced as follows:
terms = c("cancer", "blood", "machine")
v1 = c(0.002, 0.313, 0.1313)
v2 = c(0.23, 0.14, 0.155)
v3 = c(0.141, 0.41, 0.125)
word_vectors = as.matrix(data.frame(terms, v1, v2, v3))
Use sapply :
result <- sapply(theme_terms, function(x) word_vectors[x, , drop = F])
With for loop we can initialize a list for storing the output
out <- vector('list', length(theme_terms))
for(i in seq_along(theme_terms)) {
out[[i]] <- word_vectors[theme_terms[i], , drop = FALSE]
}
Related
I want to perform a set of operations (in R) on a number of data frames located within a list. In particular, for each of one I create a "library" column, which is then used to determine which kind of filtering operation to perform. This is the actual code:
sampleList <- list(RNA1 = "data/not_processed/dedup.Bp1R4T2_S2.txt",
RNA2 = "data/not_processed/dedup.Bp1R4T3_S4.txt",
RNA3 = "data/not_processed/dedup.Bp1R5T2_S1.txt",
RNA4 = "data/not_processed/dedup.Bp1R5T3_S2.txt",
RNA5 = "data/not_processed/dedup.Bp1R14T5_S1.txt",
RNA6 = "data/not_processed/dedup.Bp1R14T6_S1.txt",
RNA7 = "data/not_processed/dedup.Bp1R14T6_S2.txt",
RNA8 = "data/not_processed/dedup.Bp1R14T7_S2.txt",
RNA9 = "data/not_processed/dedup.Bp1R14T8_S3.txt",
RNA10 = "data/not_processed/dedup.Bp1R14T9_S3.txt",
RNA11 = "data/not_processed/dedup.Bp1R14T9_S4.txt",
DNA1 = "data/not_processed/dedup.dna10_1_S4.txt",
DNA2 = "data/not_processed/dedup.dna10_2_S5.txt",
DNA3 = "data/not_processed/dedup.dna10_3_S6.txt",
DNA4 = "data/not_processed/dedup.dna50_1_S1.txt",
DNA5 = "data/not_processed/dedup.dna50_2_S2.txt",
DNA6 = "data/not_processed/dedup.dna50_3_S3.txt",
DNA7 = "data/not_processed/dedup.dna50_pcrcocktail_S7.txt")
batch <- lapply(names(sampleList),function(mysample){
aux <- read.table(sampleList[[mysample]], col.names=c(column1, column2, ..., ID, library, column4, etc...))
aux %>% mutate(library = mysample, R = Fw_ref + Rv_ref, A = Fw_alt + Rv_alt) %>% distinct(ID, .keep_all=T)
if (grepl("DNA", aux$library)){
aux %>% filter(aux$R>1 & aux$A>1)
} else {
aux %>% filter((aux$R+aux$A)>7 & aux$Fw_ref>=1 & aux$Rv_ref>=1 & aux$Fw_alt>=1 & aux$Rv_alt>=1)
}
aux
})
batch_file <- do.call(rbind, batch)
write.table(batch_file, "data/batch_file.txt", col.names = T, sep = "\t")
The possible values of the library column are DNA1 to DNA7, and RNA1 to 11. I tried also with "char" %in%, but it gives the same problem:
Error in if (grepl("DNA", aux$library)) { : argument is of length zero
Seems like the if condition is not able to identify the value in library. However, when I tried to apply the if/else condition on the batch_file (not filtered, basically obtained with this code without the if/else part) it worked perfectly.
Many thanks in advance.
I am new to R and have a question not knowing how to solve it. Maybe you can help?
I do have a separated name/value input string: param1=test;param2=3;param3=140;
I would like to access a value via it's name in R.
Something like using
myParams["param1]
I already tried something like:
input = "param1=test;param2=3;param3=140;"
output1 = strsplit(input,";")[[1]]
output2 = do.call(rbind, strsplit(output1, "="))
to get a matrix but am missing the rest..
You could define a custom function myParams:
# Your sample data
input = "param1=test;param2=3;param3=140;"
output1 = strsplit(input,";")[[1]]
output2 = do.call(rbind, strsplit(output1, "="))
# Define function
myParams <- function(par, df = output2) {
return(df[which(df[, 1] == par), 2])
}
myParams("param1");
#[1] "test"
myParams("param2");
#[1] "3"
A simple way would be to create a dataframe out of that matrix first and then access the value via row names
input = "param1=test;param2=3;param3=140;"
output1 = strsplit(input,";")[[1]]
output2 = do.call(rbind, strsplit(output1, "="))
temp = data.frame(output2,row.names = TRUE)
# X2
#param1 test
#param2 3
#param3 140
temp[,"param1"]
#test
temp[,"param2"]
#3
temp[,"param3"]
#140
I'm working in R & using a for loop to iterate through my data:
pos = c(1256:1301,6542:6598)
sd_all = null
for (i in pos){
nameA = paste("A", i, sep = "_")
nameC = paste("C", i, sep = "_")
resA = assign(nameA, unlist(lapply(files, function(x) x$percentageA[x$position==i])))
resC = assign(nameC, unlist(lapply(files, function(x) x$percentageC[x$position==i])))
sd_A = sd(resA)
sd_C = sd(resC)
sd_all = ?
}
now I want to generate a vector called 'sd_all' that contains the standard deviations of resA & resC. I cannot just do 'sd_all = c(sd(resA), sd(resC))', because then I only use one value in 'pos'. I want to do it for all values in 'pos' off course.
It looks like you'd be best served with sd_all as a list object. That way you can insert each of your 2 values ( sd(resA) and sd(resC) ).
Initialising a list is simple (this would replace the second line of your code):
sd_all <- list()
Then you can insert both the values you want to into a single list element like so (this would replace the last line in your for loop):
sd_all[[ i ]] <- c( sd( resA ), sd( resC ) )
After your loop, you can then insert this list as a column in a data.frame if that's what you'd like to do.
I am running into a strange problem in R.
In the attached script, I have 2 dataframe containing same data just in reversed order (data_asc, data_desc). Then I apply a same function (fnDoIt) with same parameters to both dataframes to create new column in each (cost).
The function split the strings by "|" then create a dataframe and return the "cost" element of the dataframe. The parameter_desc contain the parameter names separated by "|" while the parameter_value contain parameter values separated by "|" in same order.
However, when I run the script, it returns different values depend on what order my dataframe is. It seems to return the result of the first set of parameter.
What I expect to see is:
A-----price|cost--------------10|7----------7
B-----price|cost|tax_rate-----12|6|0.10-----6
But what i get (depend on the order of data frame) are either:
A-----price|cost--------------10|7----------7
B-----price|cost|tax_rate-----12|6|0.10-----7
or
B-----price|cost|tax_rate-----12|6|0.10-----6
A-----price|cost--------------10|7----------6
I am not sure how to get around this … really appreciate any help or insight from you guys
Thanks
stringsAsFactors=FALSE
fnDoIt = function(model
, parameter_desc
, parameter_value) {
#process parameters
#split string, then unlist
parameter_desc = unlist(strsplit(parameter_desc
, split = '|'
, fixed = TRUE))
#split string, then unlist, then convert to number
parameter_value = as.numeric(unlist(strsplit(parameter_value
, split = '|'
, fixed = TRUE)))
#build dataframe for parameters
parameter = as.data.frame(t(parameter_value)) #transpose vector to horizontal
names(parameter) = parameter_desc #rename columns
fnDoIt = parameter$cost
}
data = data.frame(model = c('A','B')
, parameter_desc = c('price|cost','price|cost|tax_rate')
, parameter_value = c('10|7','12|6|0.10'))
data_asc = data
data_desc = data[order(data$model, decreasing = TRUE),]
data_asc$cost = fnDoIt(data_asc$model
, data_asc$parameter_desc
, data_asc$parameter_value)
data_desc$cost = fnDoIt(data_desc$model
, data_desc$parameter_desc
, data_desc$parameter_value)
UPDATED:
options(stringsAsFactors = FALSE)
fnDoIt = function(model
, production
, parameter_desc
, parameter_value) {
#process parameters
#split string, then unlist
parameter_desc = unlist(strsplit(parameter_desc
, split = '|'
, fixed = TRUE))
#split string, then unlist, then convert to number
parameter_value = as.numeric(unlist(strsplit(parameter_value
, split = '|'
, fixed = TRUE)))
if (model == 'A') {
temp = parameter_value[parameter_desc == 'cost']
} else if (model == 'B') {
temp = parameter_value[parameter_desc == 'tax_rate']
}
fnDoIt = temp * production
}
data = data.frame(model = c('A','B','B')
, production = c(100,185,210)
, parameter_desc = c('price|cost','price|cost|tax_rate','price|cost|tax_rate')
, parameter_value = c('10|7','14|9|0.20','12|6|0.10'))
data$cost = ifelse(data$model == 'A'
, fnDoIt('A'
, data$production
, data$parameter_desc
, data$parameter_value)
, fnDoIt('B'
, data$production
, data$parameter_desc
, data$parameter_value))
I received the error:
In temp * production: longer object length is not a multiple of
shorter object length
I think this is what you are looking for,
fnGetCost <- function(df){
apply(df, 1,
function(r){
parms <- unlist(strsplit(r[2], split="\\|"))
costIX <- which(parms == "cost")
as.numeric(unlist(strsplit(r[3], split="\\|"))[costIX])
})
}
data_asc$cost = fnGetCost(data_asc)
data_desc$cost = fnGetCost(data_desc)
Your original solution is considering all the rows at once. Check the output of
unlist(strsplit(as.character(data_asc$parameter_desc)
, split = '|'
, fixed = TRUE))
So finally you have multiple columns with column names cost while you are returning only one of them. If you really want to use your function replace the last three line with
parameter_value[parameter_desc == "cost"]
Additionally note that the original function throws an error as the columns of data are coerced to factor
I need help determining how I can use the input for the function below as an input for another r file.
Hotel <- function(hotel) {
require(data.table)
dat <- read.csv("demo.csv", header = TRUE)
dat$Date <- as.Date(paste0(format(strptime(as.character(dat$Date),
"%m/%d/%y"),
"%Y/%m"),"/1"))
library(data.table)
table <- setDT(dat)[, list(Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)),
by = list(Hotel, Date)]
answer <- na.omit(table[table$Hotel == hotel, ])
if (nrow(answer) == 0) {
stop("invalid hotel")
}
return(answer)
}
I would input Hotel("Hotel Name")
Here's the other R file using the Hotel name I inputted above.
#Reads the dataframe from the Hotel Function
star <- (Hotel("Hotel Name"))
#Calculates the Revpolu and Index
Revpolu <- star$Revenue / star$Hours
Index <- star$Index
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(Index, Revpolu, main = "Hotel Name", col = "green", pch = 20)
testing <- cor.test(Index, Revpolu)
write.table(testing[["p.value"]], file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
dev.off()
I would like for this part to become automated instead of having to copy and paste from the first file an input and then storing it as a variable. Or if it's easier, then make all of this just one function.
Also instead of having to input one Hotel name for the function. Is it possible to make the first file read all the hotel names if they are identified as row names in the .csv file and have that input read in the second file?
Since your example is not reproducible and your code has some bugs (using the column "Rooms" which is not produced by your function), I can't give you a tested answer, but here's how you can structure your code to produce the statistics you want for all hotels without having to copy and paste hotel names:
library(data.table)
# Use fread instead of read.csv, it's faster
dat <- fread("demo.csv", header = TRUE)
dat[, Date := as.Date(paste0(format(strptime(as.character(Date), "%m/%d/%y"), "%Y/%m"),"/1"))
table <- dat[, list(
Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)
), by = list(Hotel, Date)]
# You might want to consider using na.rm=TRUE in cor.test instead of
# using na.omit, but I kept it here to keep the result similar.
answer <- na.omit(table)
# Calculate Revpolu inside the data.table
table[, Revpolu := Revenue / Hours]
# You can compute a p-value for all hotels using a group by
testing <- table[, list(p.value = cor.test(Index, Revpolu)[["p.value"]]), by=Hotel]
write.table(testing, file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
# You can get individual plots for each hotel with a for loop
hotels <- unique(table$Hotel)
for (h in hotels) {
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(table[Hotel == h, Index], table[Hotel == h, Revpolu], main = h, col = "green", pch = 20)
dev.off()
}