Line-by-line parsing of text file containing data with Julia? - julia

I'm trying to read tensor elements written to a text file. The first line of the file defines the tensor dimension. The next lines give the tensor values. In Matlab syntax, I was able to achieve this with the following line of code, but I am having a difficult time coding an equivalent function in Julia. Any help is greatly appreciated.
fid=fopen(fname);
shape = sscanf(fgetl(fid),'%i');
for j = 1:shape(3)
for i = 1:shape(1)
A(i,:,j) = str2num(fgets(fid));
end
end
fclose(fid);
The first lines of a typical file are reproduced below:
4 4 48
1.00000 0.00000 0.00000 0.00000
0.00000 1.00000 0.00000 0.00000
0.00000 0.00000 1.00000 0.00000
0.00000 0.00000 0.00000 1.00000
-1.00000 0.00000 0.00000 0.00000
0.00000 1.00000 0.00000 0.00000
0.00000 0.00000 -1.00000 0.00000
0.00000 0.00000 0.00000 1.00000
-1.00000 0.00000 0.00000 0.00000
...

As #colin said in his comment, such a file can be easily read into Julia with this:
julia> data, heading = readdlm("/tmp/data.txt", header=true)
(
9x4 Array{Float64,2}:
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
-1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 -1.0 0.0
0.0 0.0 0.0 1.0
-1.0 0.0 0.0 0.0,
1x4 Array{AbstractString,2}:
"4" "4" "48" "")
The two values returned are the array of Float64s and the header row as an array of strings.
Any use?

If you do want to read line by line, you can use the following:
a = open("/path/to/data.txt", "r")
for line in eachline(a)
print(line) ## or whatever else you want to do with the line.
end
close(a)
In particular, a syntax like this:
LineArray = split(replace(line, "\n", ""), "\t")
Might be useful to you. It will (a) remove the line break at the end of the line and (b) then split it up into an indexed array so that you can then pull elements out of it based on predictable positions they occupy in the line.
You could also put:
Header = readline(a);
right after you open the file if you want to specifically pull out the header, and then run the above loop. Alternatively, you could use enumerate() over eachline(a) and then perform logic on the index of the enumeration (e.g. define the header when the index = 1).
Note though that this will be slower than the answer from daycaster, so it's only worthwhile if you really need the extra flexibility.

Related

Problems when coding NLS in R with dims

I'm trying to code a non-linear regression in R to fit data I have regarding the relationship between temperatures and types of precipitations.
I first created 2 vectors with my data:
vec_temp_num
[1] -8.5 -8.0 -6.5 -6.1 -5.9 -5.8 -5.6 -5.4 -5.3 -5.1 -4.9 -4.8 -4.7 -4.5 -4.3 -4.2 -4.1
[18] -4.0 -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.6 -2.5 -2.4 -2.3
vec_rain
[1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[9] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000
[17] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[25] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.33333333 0.00000000 0.00000000
vec_temp_num contains a list of temperatures and vec_rain has for each of them a percentage by type of precipitation observed, basically rain or snow (I chose to start with one of them to simplify the process). Both vectors contain 300 lines and are "as.numeric".
The function is the following:
func_rain <- function(x,b){(1/(1+exp(-b*x)))}
I then tested my function and got a plot that looks like it should be, so until this step everything seems to be ok.
But when I try to write the nls formula:
Rain_fit<-nls(vec_rain~func_rain(vec_temp_num,b), start=c(vec_temp_num=2.6, b=1))
I get a error message saying:
Error in qr(.swts * gr) :
dims [product 2] do not match the length of object [300]
It seems that I should have my data as a matrix opposed to a vector (which I don't get because some forums advise to create vectors) but then I tried to directly use data from my data frame (df = dp.w2/rain & temperature columns):
Rain_fit<-
nls(dp.w2$rain~func_rain(dp.w2$temperature,b),start=list(temperature=2.6,rain=1,
b=1))
and got another error message:
Error in parse(text = x, keep.source = FALSE) :
:2:0: unexpected end of input
1: ~
^
I've read a lot of question/answers about the nls function but it's been some days now and I just can't find the right way to fit my data so thanks a lot in advance for your help!
PS: I'm a total beginner so if you could provide a "step by step" or detailed (for dummies) answer it would be awesome!

how to remove decimals in rownames matrix?

I have a matrix like that:
12Q_S12 14Q_S14 16Q_S16 18Q_S2 22Q_S6 28Q_S12
ENSG00000000003.14 1.18007 0.0000 1.20602 2.24477 1.27663 1.12392
ENSG00000000005.5 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000
and I would like to remove the decimal part only for the rownames (ENSG00000000003.14, ENSG00000000005.5 ...) any help?
Expected:
12Q_S12 14Q_S14 16Q_S16 18Q_S2 22Q_S6 28Q_S12
ENSG00000000003 1.18007 0.0000 1.20602 2.24477 1.27663 1.12392
ENSG00000000005 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000
You need to reassign the rownames and eliminate the part after the point, you can do it with gsub.
rownames(tab) <- gsub("\\..*","",rownames(tab))

Sector wise mean wind speed and directions in R

I am trying to get mean wind speeds of a data-set based on its mean direction within a sector. It is fairly simple and the below program does the trick. I, however, am unable to automate it, meaning I have to manually input the values of fsector and esectorevery time. Also, the output is not where I would like. Please tell me a better way or help me on improving this one.
##Dummy Wind Speed and Directional Data.
ws<-c(seq(1,25,by=0.5))
wd<-C(seq(0,360,by=7.346939))
fsector<-22.5 ##Starting point
esector<-45 ##End point
wind <- as.data.frame(cbind(ws,wd))
wind$test<- ifelse(wind$wd > fsector & wind$wd < esector,'mean','greater')
mean<-rbind(aggregate(wind$wd,by=list(wind$test),mean))
meanws<-rbind(aggregate(wind$ws,by=list(wind$test),mean))
mean<-cbind(meanws[2,2],mean[2,2])
mean
It would be great if i can choose the number of sectors and automatically generate the list of mean wind speeds and mean direction. Thanks.
Actually I'm working with the same data.
First I do a wind rose like this:
And then, depending the direction, I put the data:
max(Windspeed[direc >=11.25 & direc <= 33.75])
min(Windspeed[direc >=11.25 & direc <= 33.75])
mean(Windspeed[direc >=11.25 & direc <= 33.75])
I put he direccion in degrees.
If you don't search that, I will be here waiting for help you.
Okay working on the idea by #monse-aleman and similar question by her, here. I was able to automate the program to give the required answer. The function listed is as:
in_interval <- function(x, interval){
stopifnot(length(interval) == 2L)
interval[1] < x & x < interval[2]
}
Using the above code over the data-set we get.
##Consider a dummy Wind Speed and Direction Data.
ws<-c(seq(1,25,by=0.5))
wd<-c(seq(0,360,by=7.346939))
## Determine the sector starting and end points.
a<-rbind(0.0 ,22.5 ,45.0 ,67.5 ,90.0 ,112.5 ,135.0 ,157.5 ,180.0 ,202.5 ,225.0 ,247.5 ,270.0 ,292.5 ,315.0,337.5)
b<-rbind(22.5 ,45.0 ,67.5 ,90.0 ,112.5 ,135.0 ,157.5 ,180.0 ,202.5 ,225.0 ,247.5 ,270.0 ,292.5 ,315.0,337.5,360)
sectors<-cbind(a,b)
sectors
## See the table of the sector.
[,1] [,2]
[1,] 0.0 22.5
[2,] 22.5 45.0
[3,] 45.0 67.5
[4,] 67.5 90.0
[5,] 90.0 112.5
[6,] 112.5 135.0
[7,] 135.0 157.5
[8,] 157.5 180.0
[9,] 180.0 202.5
[10,] 202.5 225.0
[11,] 225.0 247.5
[12,] 247.5 270.0
[13,] 270.0 292.5
[14,] 292.5 315.0
[15,] 315.0 337.5
[16,] 337.5 360.0
for(o in 1:16){
mean[o]<-mean(ws[in_interval(wd, c(sectors[o,1], sectors[o,2]))])
}
mean
[1] 2.0 3.5 5.0 6.5 8.0 9.5 11.0 12.5 14.0 15.5 17.0 18.5 20.0 21.5 23.0 24.5
This is the result. Works quite well.

How to create average curve in a data frame in R where data contain { }?

I have a data frame in the following format in R and I would like to calculate the average curve of all 'readings' with standard error bars, but I get errors probably due to the format of the readings ({ }). How can I fix this?
Note (update): The size of the df is over 9 mio objects (where there are many readings for each installnr). Do you have any proposal which would run easily in a huge data frame like this?
installnr readdate readings
1 002345 2014-08-17 {0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,2,0}
2 002345 2014-08-18 {0,0,0,0,0,0,4,1,0,0,0,0,1,1,1,1,0,1,1,1,1,1,0,1}
3 002345 2014-08-19 {0,1,2,1,0,1,1,1,2,0,1,0,1,0,1,0,1,0,1,0,2,1,1,0}
4 013459 2014-08-17 {1,0,0,1,0,1,1,1,1,0,1,0,1,0,1,0,0,1,3,1,0,0,1,1}
5 127465 2014-08-19 {0,1,0,0,1,0,1,1,1,0,0,1,2,0,1,0,0,1,1,0,0,1,1,1}
Updated to reflect that request is for column means.
You need to convert these lists to character, remove the {}, split into lists and convert to numeric. The easiest way to get the column means is to form this into a matrix, then just use colMeans.
df$readings = gsub("[{}]", "", as.character(df$readings))
Read1 = strsplit(df$readings, ",")
Readings = matrix(as.numeric(unlist(Read1)), nrow=length(Read1))
colMeans(Readings)
[1] 0.2 0.2 0.2 0.2 0.4 0.0 1.0 0.8 0.8 0.6 1.0 0.8 0.6 0.8 0.4 0.8 0.6 0.4 1.0
[20] 0.4 0.8 0.6 0.6 0.6
Since you want error bounds, I will mention that you can get the Standard Deviations for the columns from
apply(Readings, 2, sd)
[1] 0.4472136 0.4472136 0.4472136 0.4472136 0.8944272 0.0000000 1.7320508
[8] 0.4472136 0.4472136 0.5477226 0.7071068 0.8366600 0.5477226 0.8366600
[15] 0.5477226 0.4472136 0.5477226 0.5477226 1.2247449 0.5477226 0.4472136
[22] 0.8944272 0.5477226 0.5477226

What are the Closeness and shortest.paths functions definition in igraph package calculating?

I found a weird result in some data I am working on and decided to test closeness and shortest.paths functions with the following matrix.
test<-c(0,0.3,0.7,0.9,0.3,0,0,0,0.7,0,0,0.5,0.9,0,0.5,0)
test<-matrix(test,nrow=4)
colnames(test)<-c("A","B","C,","D")
rownames(test)<-c("A","B","C,","D")
test
A B C D
A 0.0 0.3 0.7 0.9
B 0.3 0.0 0.0 0.0
C 0.7 0.0 0.0 0.5
D 0.9 0.0 0.5 0.0
grafo=graph.adjacency(abs(test),mode="undirected",weighted=TRUE,diag=FALSE)
When I measure closeness() I get this:
> closeness(grafo)
A B C D
0.5263158 0.4000000 0.4545455 0.3846154
Which is merely the sum of the weights and NOT the distancies (1-weights).
> 1/(0.7+(0.7+0.3)+0.5)
[1] 0.4545455
When I define distance as 1-weight, I get this
> 1/((1-0.7)+((1-0.7)+(1-0.3))+(1-0.5))
[1] 0.5555556
In the igraph manual, it says, in the formula, that it is the sum of distances. My question is, does the function actually consider the weight and, therefore, it is a bug, or WE should consider (and modify) our graphs' edges as distance to run this function?
The SAME issue occurs with the shortest.paths function btw. It gives me a sum of the weights, NOT distances.
> shortest.paths(grafo)
A B C D
A 0.0 0.3 0.7 0.9
B 0.3 0.0 1.0 1.2
C 0.7 1.0 0.0 0.5
D 0.9 1.2 0.5 0.0
Thanks.

Resources