I want to read multiple images which are in a folder in Scilab. My code is:
I1=dir('G:\SCI\FRAME\*.jpg');
n=length(I1);
disp(n);
for i=1:n
I2=strcat('G:\SCI\FRAME\',I1(i).name);
I=imread(I2);
figure(),imshow(I);
end
But it does not work. It shows error "invalid index".
There are two mistakes to correct:
1.) length gives the number of characters (=length) of a string, but you want to get the number of elements (=size) in a vector (the filenames), hence you should use size.
2.) I1 is a list structure returned by dir. You can extract its content with the . operator, e.g. I1.name, I1.date, I1.bytes, I1.isdir. Type these into the consol, to see the contents! Since I1.name already contains the fullpath+filename+extension as a string vector, you don't have to construct it with strcat. Anyway if you want to "glue" 2 strings together, it's easier to use + e.g. S="fisrst_string"+"second_string".
So the revised code:
I1=dir('G:\SCI\FRAME\*.jpg');
n=size(I1.name,"*"); //size of the I1.name vector
disp(n);
for i=1:n
I=imread(I1.name(i)); //I1.name is a string vector
figure();
imshow(I);
end
Related
I want to extract information from downloaded html-Code. The html-Code is given as a string. The required information is stored inbetween specific html-expressions. For example, if I want to have every headline in the string, I have to search for "H1>" and "/H1>" and the text between these html expressions.
So far, I used substr(), but I had to calculate the position of "H1>" and "/H1>" first.
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
startposition = c(21,55) # calculated with gregexpr
stopposition = c(28, 63) # calculated with gregexpr
substr(htmlcode, startposition[1], stopposition[1])
substr(htmlcode, startposition[2], stopposition[2])
The output is correct, but to calculate every single start and stopposition is a lot of work. Instead I search for a similar function like substr (), where you can use start and stop words instead of the position. For example like this:
function(htmlcode, startword = "H1>", stopword = "/H1>")
I'd agree that using a package built for html processing is probably the best way to handle the example you give. However, one potential way to sub-string a string based on character values would be to do the following.
Step 1: Define a simple function to return to position of a character in a string, in this example I am only using fixed character strings.
strpos_fixed=function(string,char){
a<-gregexpr(char,string,fixed=T)
b<-a[[1]][1:length(a[[1]])]
return(b)
}
Step 2: Define your new sub-string function using the strpos_fixed() function you just defined
char_substr<-function(string,start,stop){
x<-strpos_fixed(string,start)+nchar(start)
y<-strpos_fixed(string,stop)-1
z<-cbind(x,y)
apply(z,1,function(x){substr(string,x[1],x[2])})
}
Step 3: Test
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
htmlcode2 = " some html code <H1>baa dee ya</H1> some other code <H1>say do you remember?</H1>"
htmlcode3<- "<x>baa dee ya</x> skdjalhgfjafha <x>dancing in september</x>"
char_substr(htmlcode,"<H1>","</H1>")
char_substr(htmlcode2,"<H1>","</H1>")
char_substr(htmlcode3,"<x>","</x>")
You have two options here. First, use a package that has been developed explicitly for the parsing of HTML structures, e.g., rvest. There are a number of tutorials online.
Second, for edge cases where you may need to extract from strings that are not necessarily well-formatted HTML you should use regular expressions. One of the simpler implementations for this comes from stringr::str_match:
# 1. the parenthesis define regex groups
# 2. ".*?" means any character, non-greedy
# 3. so together we are matching the expression <H1>some text or characters of any length</H1>
str_match(htmlcode, "(<H1>)(.*?)(</H1>)")
This will yield a matrix where the columns are (in order) the fully matched string followed by each independent regex group we specified. You would just want to pull the second group in this case if you want whatever text is between the <H1> tags (3rd column).
attach.files = c(paste("/users/joesmith/nosection_", currentDate,".csv",sep=""),
paste("/users/joesmith/withsection_", currentDate,".csv",sep=""))
Basically, if I did it like
c("nosection_051418.csv", "withsection_051418.csv")
And I did that manually it would work fine but since I'm automating this to run every day I can't do that.
I'm trying to attach files in an automated email but when I structure it like this, it doesn't work. How can I recreate this so that the character vector accepts it?
I thought your example implied the need for "parallel" inputs to the path stem, the first portion of the file name, and the date portions of those full paths. Consider this illustration of using a 2 item vector and a one item vector (produced by Sys.Date, replacing your "currentdate") to populate the %s positions in that sprintf string (suggested by #Gregor):
sprintf("/users/joesmith/%s_%s.csv", c("nosection", "withsection"), Sys.Date() )
[1] "/users/joesmith/nosection_2018-05-14.csv" "/users/joesmith/withsection_2018-05-14.csv"
I've got some kind of logfile I'd like to read and analyse. Unfortunately the files are saved in a pretty "ugly" way (with lots of special characters in between), so I'm not able to read in just the lines with each one being an entry. The only way to separate the different entries is using regular expressions, since the beginning of each entry follows a specified pattern.
My first approach was to identify the pattern in the character vector (I use read_file from the readr-package) and use the corresponding positions to split the vector with strsplit. Unfortunately the positions seem not always to match, since the result doesn't always correspond to the entries (I'd guess that there's a problem with the special characters).
A typical line of the file looks as follows:
16/10/2017, 21:51 - George: This is a typical entry here
The corresponding regular expressions looks as follows:
([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):
The first thing I want is a data.frame with each line corresponding to a specific entry (in a next step I'd split the pattern into its different parts).
What I tried so far was the following:
regex.log = "([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):"
log.regex = gregexpr(regex.log, file.log)[[1]]
log.splitted = substring(file.log, log.regex, log.regex[2:355]-1)
As can be seen this logfile has 355 entries. The first ones are separated correctly. How can I separate the character vector using a regular expression without loosing the information of the regular expression/pattern?
Use capturing and non-capturing groups to identify the parts you want to keep, and be sure to use anchors:
file.log = "16/10/2017, 21:51 - George: This is a typical entry here"
regex.log = "^((?:[[:digit:]]{2})\\/(?:[[:digit:]]{2})\\/(?:[[:digit:]]{4}), (?:[[:digit:]]{2}):(?:[[:digit:]]{2}) - (?:[[:alpha:]]+)): (.*)$"
gsub(regex.log,"\\1",file.log)
>> "16/10/2017, 21:51 - George"
gsub(regex.log,"\\2",file.log)
>> "This is a typical entry here"
Many functions in R will take a vector as an argument, evaluate the function for each element of the argument vector, and then return a vector containing the results. For example, if I create the following function
myFunction <- function(x) {
x <- (x+1)/2
print(x)
}
and then evaluate myFunction(1:5), I get a vector result: 1.0 1.5 2.0 2.5 3.0. No loop is required.
The other day, however, I was using the dir.create() function in order to make a bunch of directories. The dir.create() function takes as an argument the path of the folder that you want to create. Since I wanted to make many folders, I tried to use a character vector with each element being the path of a folder that I wanted to create:
dir.create(c("folder 1 path", "folder 2 path", "folder 3 path"))
On doing this, I get an error that says, "invalid 'path' argument." Sure enough, if you look at the documentation for dir.create(), it specifies that the path argument must be, "a character vector containing a single path name."
The only way to make dir.create() accept a vector of path names seems to be to write a loop or an apply function:
sapply(c("folder 1 path", "folder 2 path", "folder 3 path"), dir.create)
Although I can't remember specific examples, I think I've run into other functions like this. It seems inconsistent that some functions automatically loop over a vector of inputs while others behave like the dir.create() function. Writing a loop is easy enough but I'd really like to understand why it is that I expect some functions to operate over the length of a vector and yet they do not.
Is there any way to tell ahead of time whether a function is happy to take a vector as an input or whether it will only accept a single value?
Setting:
I have (simple) .csv and .dat files created from laboratory devices and other programs storing information on measurements or calculations. I have found this for other languages but nor for R
Problem:
Using R, I am trying to extract values to quickly display results w/o opening the created files. Hereby I have two typical settings:
a) I need to read a priori unknown values after known key words
b) I need to read lines after known key words or lines
I can't make functions such as scan() and grep() work.
c) Finally I would like to loop over dozens of files in a folder and give me a summary (to make the picture complete: I will manage this part)
I woul appreciate any form of help.
ok, it works for the key value (although perhaps not very nice)
variable<-scan("file.csv", what=character(),sep="")
returns a charactor vector of everything
variable[grep("keyword", ks)+2] # + 2 as the actual value is stored two places ahead
returns characters of seaked values.
as.numeric(lapply(variable, gsub, patt=",", replace="."))
for completion: data had to be altered to number and "," and "." problem needed to be solved.
in a line:
data=as.numeric(lapply(ks[grep("Ks_Boden", ks)+2], gsub, patt=",", replace="."))
Perseverence is not to bad of an asset ;-)
The rest isn't finished, yet, I will post once finished.