how to solve key error with this python code? - jupyter-notebook

# creating the prediction results for the image classification and shifting the predicted images to another folder
# with renamed filename having the class name predicted for that image using model
with open(os.path.join(JSON_DIR, 'class_name_map.json')) as secret_input:
info = json.load(secret_input)
for i in range(data_test.shape[0]):
new_name = data_test.iloc[i, 0].split("/")[-1].split(".")[0] + "_" + info[data_test.iloc[i, 1]] + ".jpg"
shutil.copy(data_test.iloc[i, 0], os.path.join(PREDICT_DIR, new_name))
# saving the model predicted results into a csv file
data_test.to_csv(os.path.join(os.getcwd(), "csv_files", "short_test_result.csv"), index=False)
KeyError Traceback (most recent call last)
in
7
8 for i in range(data_test.shape[0]):
----> 9 new_name = data_test.iloc[i,0].split("/")[-1].split(".")[0]+"_"+info[data_test.iloc[i,1]]+".jpg"
10 shutil.copy(data_test.iloc[i,0],os.path.join(PREDICT_DIR,new_name))
11
KeyError: 'c2 '

Related

Error in Rscript: "Error in system("tail -n1010 EpisodeIV_dialogues.txt | cut -f2", intern = TRUE) : 'tail' not found"

I'm trying to run the following script on R, but I get an error that I do not understand. The script is supposed to parse a movie script which is in txt format.
setwd("C:/Users/name/Desktop/star wars")
# read episode IV script in R (this is a character vector)
sw = readLines("StarWars_EpisodeIV_script.txt")
# inspect first 70 lines
# you'll see that the first dialogue is from THREEPIO in line 52
sw[1:70]
# command to extract character name (just for demo purposes)
substr(sw[52], 21, nchar(sw[52]))
# command to extract dialogue text (just for demo purposes)
substr(sw[53], 11, nchar(sw[53]))
# we need these auxiliary strings to help us
# extract character names and their dialogues
b10 = " "
b20 = " "
# how many lines in input file
nlines = length(sw)
# let's parse the entire script while extracting only the names of the
# characters and their dialogues. The output file is EpisodeIV_dialogues.txt
# write first line in output file
writeLines("STAR WARS - EPISODE 4: STAR WARS", "EpisodeIV_dialogues.txt")
# the first 50 lines don't contain dialogues
# start reading at line 50
i = 50
# while loop to extract character and dialogues
# you may get some errors, just ignore them and re-run
# the while loop as many times as needed
while (i <= nlines)
{
# if empty line
if (sw[i] == "") i = i + 1 # next line
# if text line
if (sw[i] != "")
{
# if script description
if (substr(sw[i], 1, 1) != " ") i = i + 1 # next line
if (nchar(sw[i]) < 10) i = i + 1 # next line
# if character name
if (substr(sw[i], 1, 20) == b20)
{
if (substr(sw[i], 21, 21) != " ")
{
tmp_name = substr(sw[i], 21, nchar(sw[i], "bytes"))
cat("\n", file="EpisodeIV_dialogues.txt", append=TRUE)
cat(tmp_name, "", file="EpisodeIV_dialogues.txt", sep="\t", append=TRUE)
i = i + 1
} else {
i = i + 1
}
}
# if dialogue
if (substr(sw[i], 1, 10) == b10)
{
if (substr(sw[i], 11, 11) != " ")
{
tmp_diag = substr(sw[i], 11, nchar(sw[i], "bytes"))
cat(tmp_diag, file="EpisodeIV_dialogues.txt", append=TRUE)
i = i + 1
} else {
i = i + 1
}
}
}
}
# =====================================================================
# Creating data table "SW_EpisodeIV.txt"
# =====================================================================
# how many lines in output file
system("wc -l EpisodeIV_dialogues.txt")
# get vector of character names
SW4_chars = system("tail -n1010 EpisodeIV_dialogues.txt | cut -f1", intern=TRUE)
# get vector of dialogue lines
SW4_diags = system("tail -n1010 EpisodeIV_dialogues.txt | cut -f2", intern=TRUE)
# check character names
table(SW4_chars)
# remove voices
SW4_chars = gsub("'S VOICE", "", SW4_chars)
# join characters and dialogues in one table
SW4 = cbind(character=SW4_chars, dialogue=SW4_diags)
# save SW4 in file 'SW_EpisodeIV.txt'
write.table(SW4, file="SW_EpisodeIV.txt")
# if you want to check the data table
A = read.table("SW_EpisodeIV.txt")
head(A)
tail(A)
The error comes up when I run the following lines
SW4_chars = system("tail -n1010 EpisodeIV_dialogues.txt | cut -f1", intern=TRUE)
# get vector of dialogue lines
SW4_diags = system("tail -n1010 EpisodeIV_dialogues.txt | cut -f2", intern=TRUE)
The error says
Error in system("tail -n1010 EpisodeIV_dialogues.txt | cut -f2", intern = TRUE) :
'tail' not found
I'm not sure what the error means.

R: trouble assigning values to a dynamic variable in a dataframe

I am trying to assign values to a dataframe variable defined by the user. The user specifies the name of the variable, let's call this x, in the dataframe df. For simplicity I want to assign a value of 3 to everything in the column the user specifies. The simplified code is:
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
But I get an error:
Error in file(filename, "r") : cannot open the connection
In addition: Warning message:
In file(filename, "r") :
cannot open file 'df$x': No such file or directory
I've tried all kinds of remedies to no avail. If I simply try to print the values of the column.
eval(parse(text=variableName))
I get no errors and it prints out ok. It's only when I try to give that column a value that I get the error. Any help would be appreciated.
I believe the issue is that there is no way to use the result of eval() on the LHS of an assignment.
df = data.frame(foo = 1:5,
bar = -3)
x = "bar"
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
#> Warning in file(filename, "r"): cannot open file 'df$bar': No such file or
#> directory
#> Error in file(filename, "r"): cannot open the connection
## This error is a bit misleading. Breaking it apart I get a different error.
eval(expression(df$bar)) <- 3
#> Error in eval(expression(df$bar)) <- 3: could not find function "eval<-"
## And it works if you put it all in the string to be parsed.
ex1 <- paste0("df$", x, "<-3")
eval(parse(text=ex1))
df
#> foo bar
#> 1 1 3
#> 2 2 3
#> 3 3 3
#> 4 4 3
#> 5 5 3
## But I doubt that's the best way to do it!

Error when using cv.tree

Hi I tried using the function cv.tree from the package tree. I have a binary categorical response (called Label) and 30 predictors. I fit a tree object using all predictors.
I got the following error message that I don't understand:
Error in as.data.frame.default(data, optional = TRUE) :
cannot coerce class ""function"" to a data.frame
The data is the file 'training' taken from this site.
This is what I did:
x <- read.csv("training.csv")
attach(x)
library(tree)
Tree <- tree(Label~., x, subset=sample(1:nrow(x), nrow(x)/2))
CV <- cv.tree(Tree,FUN=prune.misclass)
The error occurs once cv.tree calls model.frame. The 'call' element of the tree object must contain a reference to a data frame whose name is also not the name of a loaded function.
Thus, not only will subsetting in the call to tree generate the error when cv.tree later uses the 'call' element of the tree object, using a dataframe with a name like "df" would give an error as well because model.frame will take this to be name of an existing function (i.e. the 'density of F distribution' from the stats package).
I think the problem is in the dependent variable list. The following works, but I think you need to read the problem description more carefully. First, setup the formula without weight.
x <- read.csv("training.csv")
vars<-setdiff(names(x),c("EventId","Label","Weight"))
fmla <- paste("Label", "~", vars[1], "+",
paste(vars[-c(1)], collapse=" + "))
Here's what you've been running
Tree <- tree(fmla, x, subset=sample(1:nrow(x), nrow(x)/2))
plot(Tree)
$size
[1] 6 5 4 3 1
$dev
[1] 25859 25859 27510 30075 42725
$k
[1] -Inf 0.0 1929.0 2791.0 6188.5
$method
[1] "misclass"
attr(,"class")
[1] "prune" "tree.sequence"
You may want to consider package rpart also
urows = sample(1:nrow(x), nrow(x)/2)
x_sub <- x[urows,]
Tree <- tree(fmla, x_sub)
plot(Tree)
CV <- cv.tree(Tree,FUN=prune.misclass)
CV
library(rpart)
tr <- rpart(fmla, data=x_sub, method="class")
printcp(tr)
Classification tree:
rpart(formula = fmla, data = x_sub, method = "class")
Variables actually used in tree construction:
[1] DER_mass_MMC DER_mass_transverse_met_lep
[3] DER_mass_vis
Root node error: 42616/125000 = 0.34093
n= 125000
CP nsplit rel error xerror xstd
1 0.153733 0 1.00000 1.00000 0.0039326
2 0.059274 2 0.69253 0.69479 0.0035273
3 0.020016 3 0.63326 0.63582 0.0034184
4 0.010000 5 0.59323 0.59651 0.0033393
If you include weight, then that is the only split.
vars<-setdiff(names(x),c("EventId","Label"))

Importing unstructured software log file in R?

Below is our software's log file sample. I like to analysis this data with the help of R language to get some insight information.
30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f
30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry
30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1
30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.
30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2
Each log file contains 20k lines and we have plenty of log files.
My requirement is to split as following.
| 30-Mar-14 | 17:59:58.1244 | (6628 6452) | Module1.exe:Program1.cpp,v | :854: | ERROR: group 7 failed with error = 0x8004000f |
I tried to import this dataset using "Import Dataset" -->"From File" in R studio. I tried with different options available there. But it unable to recognize the fields. Is there any option split based on patterns or regular expression?
Software environment:
R language v3.0.3
R studio
Windows 7
Note: I have edited the log file to remove real module names.
There is no such option in the GUI itself (unlike Excel or SPSS, for instance, which might have more powerful GUI import options). You need a script for that.
You can construct a regular expression with placeholders that matches all lines, and call gsub to extract the values in the placeholders. For instance:
text <- readLines("log.log")
rx <- "^([0-9]+-[^-]+[0-9]+) +([0-9]+:[0-9]+:[0-9]+[.][0-9]+) +.*$"
stopifnot(grepl(rx, text))
And then:
date <- gsub(rx, "\\1", text)
time <- gsub(rx, "\\2", text)
date.time.df <- data.frame(date, time)
Or:
date.time <- gsub(rx, "\\1\n\\2", text)
date.time.l <- strsplit(date.time, "\n")
do.call(rbind, date.time.l)
Enhance rx to match the other fields.
Here is a script that will do it:
x <- scan(text = "30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f
30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry
30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1
30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.
30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2",
what = '', sep = '\n')
# pull off date/time
dateTime <- sapply(strsplit(x, ' '), '[', 1:2)
# piece together with "|"
dateTime <- apply(dateTime, 2, paste, collapse = "|")
newX <- sub("^[^ ]+ [^(]+", "", x)
# extract the data in parenthesises
par1 <- sub("(\\([^)]+\\)).*", "\\1", newX)
newX <- sub("[^)]+\\)", "", newX) # remove data just matched
# parse the rest of the data
x <- strsplit(newX, ":")
y <- sapply(x, function(.line){
paste(c(paste(c(.line[1], .line[2]), collapse = ":")
, paste0(":", .line[3], ":")
, paste(.line[-(1:3)], collapse = ":")
), collapse = "|")
})
# put it all back together
paste0("|"
, dateTime
, "|"
, par1
, "|"
, y
, "|"
)
Here is the output of the script:
[1] "|30-Mar-14|17:59:58.1244|(6628 6452)| Module1.exe:Program1.cpp,v|:854:| ERROR: group 7 failed with error = 0x8004000f|"
[2] "|30-Mar-14|17:59:58.1254|(6628 6452)| Module1.exe:Program1.cpp,v|:880:| ERROR: group 7 failed on its 3 retry|"
[3] "|30-Mar-14|18:00:04.8491|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 1|"
[4] "|30-Mar-14|18:00:08.6213|( -1 1376 13900)| Module2.exe:Execute|:603:| Information - command 1 completed.|"
[5] "|30-Mar-14|18:00:08.6273|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 2|"

R - Exact String Match - Revisited

I have the below test input in a file called Input
Exploratory objectives :
This is Exp objective 1
This is Exp objective 2
3.3 Exploratory objective(s)
This is Exp objective 1
This is Exp objective 2
From this text file, I'm trying to grep for "Exploratory objective(s)" using the below. The output line number I am expecting is 7.
However, when I run the below command: I am getting the line number as 1. Can anyone please point out what is wrong with my grep here and why it doesnt return 7? Also how I can fix this?
key_str <-"Exploratory objective(s)"
key_str
key_pat <- paste0("(", key_str, ")", "(?![[:alpha:]])")
line_number<-grep(key_pat,Input,perl=TRUE)
line_number
Expected line_number: 7
Output line_number using above: 1 (Incorrect)
You have to escape parentheses:
key_str <- "Exploratory objective\\(s\\)"
If the string is dynamically generated or read from a file, use this:
key_str <- gsub("([\\(\\)])", "\\\\\\1", string)

Resources