How to conditionally plot in gnuplot with missing or invalid data? - plot

In gnuplot (I'm using 5.1 CVS) one can specify missing data (set datafile missing '?' for example) and gnuplot also knows invalid data (like NaN or 1/0).
How can I conditionally react to them? If my data has one of them I sometimes (i.e. on some columns, but not on all) want to do something else instead of just skipping them. So, basically I want to say (pseudocode)
plot 'datafile' using 1:($2 = MISSING ? $3+$4 : $2 )
I can use strcol(2) to check the column content, but this does not work for the string specified by missing set datafile missing '?' because the string specified by set datafile missing seems to have a higher "importance", because I can't check it using colstr() (gnuplot stops handling the datapoint before it even comes to evaluating strcol()).
My data can have missing data in several columns. If it is, for example, missing in column 2 I just want a gap in the data (like it's invalid), but if it is in column 3 I want it to plot something else instead and not leaving a gap.
For invalid data (like the pre-defined NaN) this works perfectly fine. It is skipped when appearing in the data, but I can also react to it by saying strcol(2) == 'NaN' ? $3+$4 : $2. So for invalid data, gnuplot first evaluates strcol() if (and only if) it is there.
I can simulate this behaviour by using two "missing chars", one that I use for set datafile missing and another one that I use for strcol() checks. But this is an ugly workaround, I would have to edit my datafiles and replacing half of the missing chars by hand. Is there a way to handle missing also data conditionally, like one can handle invalid data?

Related

Dealing with points vs. rows

I fear I've missed some crucial point in my education thus far.
I have a table HR and I've performed functions on it.
For example HR$FTE <- HR$'Std Hrs' / 38 gives me a new column for each employee; working as intended.
However, whenever I try to perform a function when creating a new column it doesn't like that. The question that I posted yesterday is similar in nature where the error result was from returning the whole row.
An example function that doesn't work would be HR$FYEnd <- as.Date(paste(HR$FY + 1,"06","30", sep = "-")). In this case, non-numeric argument to binary operator is returned, as HR$FY is not numeric but rather a column of numeric data. What should be outputted is a set of dates on 30/06.
In Excel (which I'm trying to train myself to leave) the equivalent when dealing with tables would be [#[FY Start]] or something to that effect which demonstrates that you're working with the figure on that row rather than the whole row.
Worked it out - couple of days later.
The step that I was missing was using the mapply/sapply commands. Using these has sorted everything out.

Error in col2rgb(d) : invalid color name in tweenr

I'm getting this error a lot in using tweenr in RStudio on mac but I'm unable to replicate it using dummy dataset. My dataset is a list of data frames with I want to apply tween_states. Works fine on dummy data, but always return Error in col2rgb(d) : invalid color name and recognise my first character column as a 'color' whenever I use real data.
Hard to be sure, but I think you are passing too many columns to the tweenr function.
The data you send to the tweenr function should be trimmed column wise to only contain the columns used as argument names and one additional column of values that will be tweened
Getting the same issue! I fixed it by making sure the first column only has numbers, no strings. For whatever reason the first column is interpreted as colors if it contains strings. I didn't need to trim any columns down as the other poster suggested.

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

convert period in stata to NA in r

I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.

Limiting Window Size and/or Removing Specific Rows of Time Values In R

I'm trying to figure out how to observe just one particular section of the data in the graph below (e.g. 5pm onwards). I know there are basically two methods of doing this:
1) Method 1: Limiting the window size, which requires the following function:
< symbols(Data$Times, Data$y, circles=Data$z, xlim=c("5:00pm","10:00pm"))
The problem is, I get an "invalid 'xlim' value" error when I try to input the two time endpoints.
2) Method 2: Clearing out the rows in Data$Times that have values over 5pm.
The problem here is that I'm not sure how to sort the rows by earliest time -> latest time OR how to define a new variable such that TimesPM <- Data$Times>"5pm" (what I typed just now obviously did not work.)
Any ideas? Thanks in advance.
ETA: This is what I plotted:
Times<-strptime(DATA$Time,format="%I:%M%p")
symbols(Times, y, circles=z, xaxt='n', inches=.4, fg="3", bg=(a), xlab="Times", ylab="y")
axis.POSIXct(1, at=Times, format="%I:%M%p")
Both approaches have the problem that in all likelihood your datetime format will not equal the values expressed just as a character vector like "5:00pm" even after coercion with the ">" comparison operator. To get the best advice you need to present str(DATA$Times) or dput(head(DATA$Times)) or class(Data$Times) . Generally plotting functions recognize either valid date or datetime classes or their numeric representation. If the ordering operation is not working, then it raises the question whether you have a proper class. But you appear to have an axis labeling that suggests a date-time format of some sort, and that we just need to figure out what class it really is.
Because you are creating a character vector from you Time column, you probably want to apply the restriction before you send the DATA$Time vector to strptime(). You still have not offered the requested clarifications, so I have no way to give tested or even very specific code, but you might be doing something like
Times<-strptime(DATA$Time[ as.POSIXlt(DATA$Time)$hour >= 17 &
as.POSIXlt(DATA$Time)$hour <= 22 ] ,
format="%I:%M%p")

Resources