Is IDL 8.5 able to read tables with strings, dates and values in it? - idl-programming-language

I am very new in the IDL scene and I am struggling for hours with a problem and I hope you can help me:
So right now I am trying to read data from a table ("file.txt"). I would like, that every column is saved in a variable (I thought about using STRARR)
I found this tutorial: http://www.idlcoyote.com/tips/ascii_column_data.html
This is very useful, when you want to read numbers for every column, which works fine.
This is the table from the tutorial above:
Tutorial table:
Experiment 01-14-97-2b9c
No. of Data Rows: 5
Temperature Pressure Relative Humidity
20.43 0.1654 0.243
16.48 0.2398 0.254
17.21 0.3985 0.265
18.40 0.1852 0.236
21.39 0.2998 0.293
Code:
OPENR, lun, "tutorial.txt", /GET_LUN
header = STRARR(3)
READF, lun, header
data = FLTARR(3, 5)
READF, lun, data
temperature = data(0,*)
print, data
print, temperature
Output data:
20.4300 0.165400 0.243000
16.4800 0.239800 0.254000
17.2100 0.398500 0.265000
18.4000 0.185200 0.236000
21.3900 0.299800 0.293000
Output temperature:
20.4300
16.4800
17.2100
18.4000
21.3900
Looks quite good, for numbers.
But what about when I have strings with dates, times, but also numbers in it, like this:
My table:
Experiment 01-14-97-2b9c
No. of Data Rows: 5
Date Start time End time Value
12-Feb-2002 05:08:10 06:08:30 20
08-Mar-2002 07:35:38 09:25:59 100
20-Jun-2002 12:30:35 16:15:18 5536
25-Jul-2002 04:02:06 07:02:58 5822
02-Aug-2002 23:30:25 23:55:22 456
The code above won't work anymore. When I am using this my_var= data(0,*), the whole data will be saved in variable my_var, of course because the data are no more looking as columns, but in a whole row.
FLTARR is setting this data
12-Feb-2002 05:08:10 06:08:30 20
to this result (of course because of FLTarr)
12.0000
5.00000
6.00000
20.0000
And STRARR is saving the data good in my_var, but without separating the columns.
What I want:
I would like to have every column in one variable, so that I can handle these variable data later in another code.
dates = data(0,*)
starts = data(1,*)
ends = data(2,*)
values = data(3,*)
print, starts
Output:
Start time
05:08:10
07:35:38
12:30:35
04:02:06
23:30:25
(and also the rest of my variables)
I hope you can help here.
Maybe I misunderstood something, if so please let me know it.
For any other suggestion or solution I would be grateful.
Thanks in advance!

My suggestion would be to use STRSPLIT in some manner, either each line as you read it or all at the end.
Here is an example of doing it all at the end. First, read the data into a data array (ignoring the header array):
IDL> openr, lun, 'file.txt', /get_lun
IDL> header = strarr(3)
IDL> readf, lun, header
IDL> data = strarr(5)
IDL> readf, lun, data
IDL> free_lun, lun
Then split on whitespace:
IDL> tokens = strsplit(data, /extract)
And, finally, extract elements by position:
IDL> dates = (tokens.map(lambda(x: x[0]))).toarray()
IDL> starts = (tokens.map(lambda(x: x[1]))).toarray()
IDL> ends = (tokens.map(lambda(x: x[2]))).toarray()
IDL> values = (tokens.map(lambda(x: long(x[3])))).toarray()
You've got your values now:
IDL> help, dates, starts, ends, values
DATES STRING = Array[5]
STARTS STRING = Array[5]
ENDS STRING = Array[5]
VALUES LONG = Array[5]
IDL> print, values
20 100 5536 5822 456
UPDATE: Make sure you have done
IDL> compile_opt strictarr
in scope before these commands.
UPDATE: To sort these arrays by value:
IDL> ind = sort(values)
IDL> values = values[ind]
IDL> dates = dates[ind]
IDL> starts = starts[ind]
IDL> ends = ends[ind]

Related

How to efficiently send a dataframe with multiple rows via httr::PUT

Probably due to my limited knowledge of communicating with APIs, (Which I am trying to remedy :) ) I seem to be unable to execute a put request for more than 1 row of a dataframe at a time. for example, if df_final consists of 1 row, the following code works. If there are multiple rows, it fails and I get a 400 status.
reqBody <- list(provName = df_final$Provider,site = df_final$Site,
monthJuly = df_final$July, monthAugust = df_final$August,
monthSeptember = df_final$September, monthOctober =df_final$October,
monthNovember = df_final$November ,
monthDecember = df_final$December, monthJanuary = df_final$January, monthFebruary = df_final$February,
monthMarch = df_final$March, monthApril = df_final$April, monthMay = df_final$May,
monthJune = df_final$June,
assumptions = paste("Monthly Volume:", input$Average, "; Baseline Seasonality:", input$Year, "; Trend:", input$Year_slopes),
rationale = as.character(input$Comments), fiscalYear = FY_SET, updateDtm = Sys.time())
r <- PUT(fullURL, body = reqBody, encode = "json", content_type_json())
Using with_verbose() I am able to see that the json being sent is formatted differently for the 2 cases. I haven't found anything in the documentation ( https://cran.r-project.org/web/packages/httr/httr.pdf) that has been particularly helpful in overcoming this.
The format it appears to be sending out in the first instance (1 row in the data frame) Looks like this:
{"provName":"Name","site":"site","monthJuly":56,"monthAugust":71,"monthSeptember":65,"monthOctober":78,"monthNovember":75,"monthDecember":98,"monthJanuary":23,"monthFebruary":39,"monthMarch":38,"monthApril":42,"monthMay":57,"monthJune":54,"assumptions":"Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","rationale":"","fiscalYear":2022,"updateDtm":"2023-02-03 15:19:40"}
and again, it works sans issues.
With 2 rows I get the following format:
{"provName":["Name","Name"],"site":["site","site"],"monthJuly":[56,56],"monthAugust": [71,71],"monthSeptember":[65,65],"monthOctober":[78,78],"monthNovember":[75,75],"monthDecember": [98,98],"monthJanuary":[23,23],"monthFebruary":[39,39],"monthMarch":[38,38],"monthApril": [42,42],"monthMay":[57,57],"monthJune":[54,54],"assumptions":["Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017"],"rationale":["",""],"17":2,"18":2}
And it fails with status 400.
I suppose I could use lapply and PUT for each row, however with thousands of rows in a dataframe, I think that would be less than ideal.
Anyone have any light to share on this?
Any help would be greatly appreciated!
PS: this didn't really answer my question
R httr put requets
and as I mentioned, Doing something like this is not ideal:
Convert each data frame row to httr body parameter list without enumeration
Looks like you are using a list as the request body. Use a data frame instead.
Lists and data frames get serialized to JSON differently:
jsonlite::toJSON(list(x = 1:2, y = 3:4))
#> {"x":[1,2],"y":[3,4]}
jsonlite::toJSON(data.frame(x = 1:2, y = 3:4))
#> [{"x":1,"y":3},{"x":2,"y":4}]

Substring (variable length) values in entire column of dataframe

I have looked for this tirelessly with no luck. I am coming from a Java background and new to R. (On a side note, I am loving R, but disliking string operations in it as well as the documentation - maybe that's just a Java bias.)
Anyhow, I have a dataframe with a single column, it is composed of a latitude and longitude numbers seperated by a colon e.g. ROAD:_:-87.4968190989999:38.7414455360001
I would like to create 2 new data frames where each will have the separate lat and long numbers.
I have successfully written a piece of code where I use for loops (but I know this is inefficient - and that there has to be another way)
Here is a snippet of the inefficient code:
length <- length(fromLatLong)
for (i in 1:length){
fromLat[i] <- strsplit(fromLatLong[i] ,":")[[1]][4]
}
for (i in 1:length){
fromLong[i] <- strsplit(fromLatLong[i] ,":")[[1]][3]
}
for (i in 1:length){
toLat[i] <- strsplit(toLatLong[i] ,":")[[1]][4]
}
for (i in 1:length){
toLong[i] <- strsplit(toLatLong[i] ,":")[[1]][3]
}
Here is how I tried to optimize it using mutate, but I only get the first value copied over to all rows as such:
fromLat = mutate(fromLatLong, FROM_NODE_ID = (strsplit(as.character(fromLatLong$FROM_NODE_ID),":")[[1]][4]))
fromLong = mutate(fromLatLong, FROM_NODE_ID = (strsplit(fromLatLong$FROM_NODE_ID,":")[[1]][3]))
toLat = mutate(toLatLong, TO_NODE_ID = (strsplit(toLatLong$TO_NODE_ID,":")[[1]][4]))
toLong = mutate(toLatLong, TO_NODE_ID = (strsplit(toLatLong$TO_NODE_ID,":")[[1]][3]))
And here is the result:
FROM_NODE_ID
1
38.7414455360001
2
38.7414455360001
3
38.7414455360001
4
38.7414455360001
5
38.7414455360001
6
38.7414455360001
7
38.7414455360001
8
38.7414455360001
9
38.7414455360001
I would appriciete your help on this. Thanks
You can use the map_chr function of the purrr package. For instance:
fromLat = mutate(fromLatLong, FROM_NODE_ID = map_chr(FROM_NODE_ID, ~ strsplit(as.character(.x),":")[[1]][4]))
The following expression will produce a data frame with each of the colon-delimited components as a separate column. You can then break this up into separate data frames or do whatever else you want with it.
as.data.frame(t(matrix(unlist(strsplit(fromLatLong$coords, ":", fixed=TRUE), recursive=FALSE), nrow=4)),stringsAsFactors=FALSE)
(Assuming the column name of your values in the data frame is coords.)

How to create columns with defined width -R

I want to create an empty table / matrix which will be filled with values later on.
data columns (example below for "Prec01 (p)" and "Prec04 (p)") should have a fix width of 11 signs (will be a program specific ascii format)
!!Date Prec01 (p) Prec04 (p)
1992 10 02 00:00 0.4 0.0
Any ideas?
you can use formatC, for example :
paste0("Prec", formatC(1:10, width = 3, flag = 0 ), "(p)")
flag is used to fill the empty character with 0.
thanks!
I did it like this: create an empty vector (unfortunately this is a character vector, hope it works, but I dond't know how to specify it the other way)
col <- vector(length=length(dwd[,2]))
for(i in 1:length(dwd[,2])){
col1[i]<- paste(formatC(x=dwd[i,2],width = 11, flag = ))
}
it does the job.
Best regards
Jochen

Writing and reading a zoo object - errors

I have a zoo object, prices, which, when I type class(prices), it returns “zoo.” I then create a file using:
write.zoo(prices, file = “foo”, index.name = “time”)
The resulting files looks like this:
"time" "AAPL.Adjusted" “SHY.Adjusted"
2013-05-01 60.31 84.12
2013-05-02 61.16 84.11
2013-05-03 61.77 84.08
I then try and read this file with this statement:
myData <- read.zoo(“foo”)
and I get this error:
Error in read.zoo(“foo") :
index has bad entries at data rows: 1 2 3 4
I’ve tried a number of parameter settings and nothing seems to work. Help much appreciated.
Newbie
The file has a header line so try:
z <- read.zoo("foo", header = TRUE, check.names = FALSE)
The check.names part gives nicer looking column names but you could leave it out if that were not important.

Inputting one column of info into a R data frame.

I am currently using this code to input data from numerous files into R:
library(foreign)
setwd("/Users/ericbrotto/Desktop/A_Intel/")
filelist <-list.files()
#assuming tab separated values with a header
datalist = lapply(filelist, function(x)read.table(x, header=T, sep=";", comment.char=""))
#assuming the same header/columns for all files
datafr = do.call("rbind", datalist)
The headers look like this:
TIME ;POWER SOURCE ;qty MONITORS ;NUM PROCESSORS ;freq of CPU Mhz ;SCREEN SIZE ;CPU LOAD ;BATTERY LEVEL ; KEYBOARD MVT ; MOUSE MVT ;BATTERY MWH ;HARD DISK SPACE ;NUMBER PROCESSES ;RAM ;RUNNING APPS ;FOCUS APP ;BYTES IN ;BYTES OUT ;ACTIVE NETWORKS ; IP ADDRESS ; NAMES OF FILES ;
and an example of the data looks like this:
2010-09-11-19:28:34.680 ; BA ; 1 ; 2 ; 2000 ; 1440 : 900 ; 0.224121 ; 92 ; NO ; NO ; NULL ; 92.581558 ; 57 ; 196.1484375 ; +NULL ; loginwindow-#35 ; 5259 ; 4506 ; en1 : ; 192.168.1.3 ; NULL ;
Rather then input all of the columns into a data frame I would like to just grab one, say, FOCUS APP.
If you just want to read in a particular column from your files, then colClasses is the way to go. For example, suppose your data looked like this:
a,b
1,2
3,4
Then
## Use colClasses to select columns
## "NULL" means skip the column
## "numeric" means that the column is numeric
## Other options are Date, factor - see ?read.table for more
## Use NA to let R decide
data = read.table("/tmp/tmp.csv", sep=",",
colClasses=c("NULL", "numeric"),
header=TRUE)
gives just the second column.
> data
b
1 2
2 4
maybe just adding the column name to your read table line is ok, like this:
datalist = lapply(filelist, function(x)read.table(x, header=T, sep=";", comment.char="")["FOCUS APP"])
If you are just doing this once, then the colClasses answer is probably the best (however that still reads in all the data, just only processes the one column). If you are doing things like this often then you may want to use a database instead. Look at the RSQLite, sqldf, and SQLiteDF packages as well as RODBC for some possibilities.

Resources