This is what I'm trying to do:
I have a large excel sheet I'm importing to R.
The data needs to be cleaned so one of the procedures is to test for character length.
Once the program finds a string that is too long, it needs to prompt the operator for a replacement
The operator inputs an alternative, and the program replaces the original with the input text.
The code I have seems to work procedurally, but the variable I have is not overwriting the original value.
library(tidyr)
library(dplyr)
library(janitor)
library(readxl)
fileToOpen <-read_excel(file.choose(),sheet="Data")
MasterFile <- fileToOpen
#This line checks the remaining bad strings in the column
CPNErrors <- nrow(filter(MasterFile,nchar(Field_to_Check) > 26))
#This line selects the bad field from the first in the list of strings to exceed the limit
TEST <- select(filter(MasterFile,nchar(Field_to_Check) > 26),Field_to_Check)[1,]
#This is the loop -- prompts the operator for a replacement, assigns a variable to the input and then replaces the bad value in the data frame
while (CPNErrors >= 1) {message("Replace ",TEST," with what?"); var=readline();MasterFile$Field_to_Check[MasterFile$Field_to_Check == TEST] <- var;print(var)}
The prompt works and assigns the readline() to the var, but the code will not replace the original string as a variable. When I run the code separately outside the loop, it will replace as long as I input an exact string (no variable assignment), so there's some syntactical thing I'm missing.
I've been searching for hours, and am just starting out in R, so if anyone can offer any assistance I'd greatly appreciate it.
EDIT -- ok... I think I found the source of the problem, but I don't know how to fix it. When I run
MasterFile$Field_to_Check[MasterFile$Field_to_Check == TEST]
It comes with a null result, but if I run
MasterFile$Field_to_Check[MasterFile$Field_to_Check == "Some Text that's in the data frame"]
It comes out with a result. Any idea on why I can't filter this list by the variable? The TEST variable comes out as expected.
Try this approach with a for loop :
CPNErrors <- which(nchar(MasterFile$Field_to_Check) > 26)
for(i in CPNErrors){
var=readline(paste0("Replace ",MasterFile$Field_to_Check[i]," with what? "))
MasterFile$Field_to_Check[i] <- var
}
Related
I'm trying to use "for loop" to write a function extracting the data of one column from 2 csv.files, and work out the mean of these data. I'm wondering why the output of for loop need to be assigned as a empty vector c() to make the function work? I printed the sub-output for "foor loop" tring to figure out the reason. when I tried to exclude means<-c(), I got unexpected 1,2,3,4,5,6 for "means" from each time of loop. Could anyone kindly do a explanation for my confusion?
Thank you very much.
pollutantmean <- function(directory, pollutant, id = 1:332) {
means<-c()
for(monitor in id){
path <- paste(getwd(), "/", directory, "/", sprintf("%03d", monitor), ".csv", sep = "")
monitor_data <- read.csv(path)
interested_data <- monitor_data[pollutant]
means <- c(means, interested_data[!is.na(interested_data)])
print(monitor_data)
print(means)
print(interested_data)
}
mean(means)
}
pollutantmean("specdata", "sulfate", 1:2)
correct output with assigning means<-c()
wrong output without means<-c() before "for loop"
I think I found the essence of the qusetion which is I don't understand the logic behind:
> x=c(x,1)
> x
>[1] 9 1
> x=c()
> x=c(x,1)
> x
>[1] 1
It can represent one of the for loop, like the first round of the loop. I'm confused about how the value of x being assigned during each process for the upper two runs, why the first one gives the output of 9,1 but the second one gives the expected result? is anyone able to explain the what happens behind those two runs? Really grateful if anyone could answer it.
In the case that doesn't work, you still have the line
means <- c(means, interested_data[!is.na(interested_data)])
that refers to means on the right hand side. If you didn't set means <- c(), then you'll get different results depending on whatever happened to be stored in that variable before executing this code. If there's no such variable, you'll get an error.
By the way, this isn't a great way to write R code, even with the means <- c() line. The problem is that every time you execute the line above you need to make a slightly longer vector to hold means. In your case you don't know how many values you'll be adding, so it's excusable, but in the more common case where you will always be adding one more entry, it's a lot more efficient to set up the result vector to the right length in advance and assign values using something like
means[i] <- newValue
I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"
We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"
Given some sample data for reference:
sn,fail_type,dt
V12001,broken ego,2018-12-07 15:58:33
V12002,batt overheat,2018-10-11 22:33:51
V12003,batt overheat,2018-10-26 15:02:51
V12004,broken ego,2018-09-28 15:44:46
V12005,cognitive meltdown,2018-12-31 02:30:04
V12006,won't turn on,2018-12-14 02:05:41
V12007,won't turn on,2018-12-02 21:14:29
V12008,bad system board,2018-11-02 16:30:57
V12009,petulant child operator,2018-09-06 14:53:25
V12010,leaky pump,2018-11-05 14:41:48
V12011,leaky pump,2018-11-04 18:05:11
V12012,petulant child operator,2018-11-23 16:34:54
V12013,cognitive meltdown,2018-09-11 18:07:50
V12014,cognitive meltdown,2018-10-26 22:55:32
V12015,leaky pump,2018-09-19 14:05:29
V12016,no alarm,2018-11-05 23:44:08
V12017,petulant child operator,2018-12-18 14:02:34
V12018,leaky pump,2018-10-08 04:13:41
V12019,bad system board,2018-09-03 02:28:16
V12020,leaky pump,2018-11-10 07:10:50
I create a data.table called ts_vars from the above.
I then want to isolate the unique list of fail_types and get time-series event data based for each unique fail_type.
# get unique list
ft_list <- unique(ts_vars$fail_type)
# clean up unnecessary punctuation
ft_list <- gsub("[[:punct:]]", " ", ft_list)
The next thing I wish to do is create a list of expressions that can be executed row-by-row, and the assignments stored in memory as I will use them for plotting (yep, alot of them). I know, I know, I'm using a for loop, and apply/plyr'ish methods are better, but putting this out there as a quick/dirty MWE.
cmdvec <- function() {
for (i in (1:length(ft_list))) {
# name a variable, ts_var, with a numeric suffix
nam <- paste("ts_var",i, sep="")
# stitch together an assignment statement which will store a vector of
events by fail_type, allowing a separate plot for each
sub <- paste("subset(ts_vars,ts_vars$fail_type==",ft_list[i], sep="'", ")")
ts_cmd[i] <- paste(nam,sub,sep=" <- ")
# parse each statement to be evaluated and store in a vector for execution
ts_cmd2 <- as.vector(eval(parse(text = ts_cmd[i]), envir = new.env()))
ts_cmd2
# print(ts_cmd2)
}
}
cmdvec()
As-is, nothing really happens. I see no results from execution nor new stored vectors (ts_var1 through ts_var187). If I substitute ts_cmd for print(ts_cmd), the statements are evaluated and I get results in the console, but none of the assignments are stored.
I've tried eval'ing the last statement, calling it (but I can't fathom what parameters I would add), converting the character list to expressions - but I'm missing some critical points here, and I think I've hit all docs in base R on this and picked some ideas from other tangential SO questions. I'm stumped now. In sum, I just can't seem to pass a simple list of statements as bona-fide commands to be processed one-by-one AND have the assignment variable be stored for downstream use (plots, other independent analyses, etc).
Any thoughts?
I am brand new to R, so please excuse anything that may seem overly obvious.
I am using apriori to evaluate frequent item sets. When I execute the code below and my subset call returns items, everything works great. The problem is when there is nothing returned on the subset (the criteria returns no subset). When it does this, I am receiving "object 'rulesMatchLHS' not found" when trying to construct a data frame for output. Can you please tell me what I am doing wrong when checking the validity of rulesMatchLHS on the ifelse line?
rules <- apriori(trnew, parameter=list(supp=0.01, conf=0.5, minlen=2, maxlen=2))
rulesMatchLHS <- subset(rules, lhs %ain% dataset1)
ifelse(exists(rulesMatchLHS),
OutputClient <- data.frame(lhs=labels(lhs(rulesMatchLHS))$elements, rhs=labels(rhs(rulesMatchLHS))$elements,rulesMatchLHS#quality),
OutputClient <- data.frame())
View(OutputClient)
Subset returns an empty data frame. So it does exist. Also exists requires that the parameter be a character string. You might want to change the exists to nrow in your ifelse. Here is a simple example to demonstrate:
test <- subset(iris, Species == "Fake")
typeof(test)
exists("test")
nrow(test) == 0
DF <- data.frame(CpGId, tframe$t, tframe$p, q)
dimnames(DF)[[2]] <- c("CpGId", "t_value", "p_value", "q_value")
DFhyper <- DF[with(DF, q_value < 0.05 & t_value> 0), ]
DFhyper <- data.frame(DFhyper, row.names = NULL)
DFhyper <- DFhyper [order(p_value), ]
Until fourth line of code, things work fine but then why R gives an error stating p_value object not found?
R executes the bracketed expression first, without paying any attention to how it is going to be used. When you type
DFhyper[order(p_value),]
R will look for p_value in the current scope (probably the global scope), however, as this is bound into the dataframe, it will not be able to find it. You need to do something to tell it where this is located.
Either
DFhyper[order(DFhyper$p_value),]
or
DFhyper[with(DFhyper,order(p_value)),]
(or nearly equivalent, with(DFhyper,DFHyper[order(p_value),])) will work. The first command tells R specifically that you are referencing the column in the data frame, and the second tells R to look in the dataframe for the variable if it can't find it in scope.
Finally, you can just bind the dataframe into the scope as well, executing
attach(DFhyper)
DFhyper[order(p_value),]
The attach command adds the dataframe columns to the current scope. It can be useful for when you have many operations on the dataframe columns, but don't want to keep referencing it. You can then detach it with detach(DFhyper) when you are done.
It needs to be
DFhyper <- DFhyper [order(Dfhyper$p_value), ]