Split vector of floats by whole integer value - r

Suppose that I have a vector like the following
> head(samp)
[1] 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417
> tail(samp)
[1] 1997.500 1997.583 1997.667 1997.750 1997.833 1997.917
This vector represents x-values for a plot that I am constructing. I want to superimpose each year's values on top of one another for my plot. To do so, I figure that I have to split this samp vector by whole integer value.
What is the easiest way to do so ?
The only solution I have come up with is taking a sequence for all of the years with
years <- seq(floor(min(samp)),
ceiling(max(samp)))
and then looping through the years and indexing to find the values belonging to each year. There feels like there should be some way to cut my vector up by year like this more easily than an explicit loop, though.

I just make my comment into an answer:
You are looking for the split function (see ?split to check out some examples)
It takes as arguments your vector and a vector of the same length of factors (numeric is OK) defining how to group the values. The output of split is a list.
samp = c(1959.000 ,1959.083 ,1959.167 ,1959.250, 1960.000 ,1960.083)
split(samp, floor(samp))
#### $`1959`
#### [1] 1959.000 1959.083 1959.167 1959.250
####
#### $`1960`
#### [1] 1960.000 1960.083

Related

How to remove some values from a 4-dimensional matrix?

I'm working with a 4-dimensional matrix (Year, Simulation, Flow, Time instant: 10x5x20x10) in R. I need to remove some values from the matrix. For example, for year 1 I need to remove simulations number 1 and 2; for year 2 I need to remove simulation number 5.
Can anyone suggest me how I can make such changes?
Arrays (which is how R documentation usually refers to higher-dimensional 'matrices') can be indexed with negative values in the same way as matrices or vectors: a negative value removes the corresponding row/column/slice. So if you wanted to remove year 1 completely (for example), you could use a[-1,,,]; to remove simulation 5 completely, a[,-5,,].
However, arrays can't be "ragged", there has to be something in every row/column/slice combination. You could replace the values you want to remove with NAs (and then make sure to account for the NAs appropriately when computing, e.g. using na.rm = TRUE in sum()/min()/max()/median()/etc.): a[1,1:2,,] <- NA or a[2,5,,] <- NA in your examples.
If you knew that all values of Flow and Time would always be present, you could store your data as a list of lists of matrices: e.g.
results <- list(Year1 = list(Simulation1 = matrix(...),
Simulation2 = matrix(...),
...),
Year2 = list(Simulation1 = matrix(...),
Simulation2 = matrix(...),
...))
Then you could easily remove years or simulations within years (by setting them to NULL, but it would make indexing a little bit harder (e.g. "retrieve Simulation1 values for all years" would require an lapply or a loop across years).

R: Finding duplicates in a data frame and recording them in vectors

I am trying to create some lines on a graph based on a third coordinate (x,y, temp). I would like to get a vector of indexes so I can split them into x and y vectors for each duplicate temperature. To make this more clear, I will include my actual data set:
DataFrame
I am trying to make multiple lines that have the same temp value. For example, I would like to have the following coordinates on the same line [0,14] [0,22] [0,26] [0,28]. They all have the temp value of 5.8. Once I find the duplicates, I will record the indexes in a vector which will allow me to retrieve the x and y coordinates. One other aspect is that I will not always know how many entries are going to be in the data.frame.
My question is how can I find the duplicates and store their indices in a vector? Once I have the indices for the duplicate temps, I can be sure to grab their x y coordinates and use that to create lines.
If you can answer my question or have any advice on how I can do this better, all help is appreciated
Consider the following:
df <- data.frame(temp = sample.int(n=3, size=5, replace=T))
df
temp
1 3
2 3
3 1
4 3
5 1
duplicated(df$temp)
[1] FALSE TRUE FALSE TRUE TRUE
which(duplicated(df$temp))
[1] 2 4 5
You've stated in the comments that you're looking to make an isopleth graph. The procedure you have described will not generate anything resembling an isopleth graph. Since it looks like your data is arranged in a regular grid, you should do something like the solutions presented in this question and answer, which use functions specifically designed for extracting contours from a grid of values. Another option is the contourLines function in the gDevices package. If you want higher-resolution, less jagged contours, you might look into using either the interp.surface or Krig functions from the fields package to interpolate your data to the resolution you require.

Retrieving minimum non-numeric value

This might be too simple question, but I'm still familiarising with R syntax.
I have a data frame with 2 columns and 3 rows:
The first column is a numeric vector from 1 to 3.
The second column is a character vector with values: best, good, worse.
Which function should I be using in order to obtain the minimum non-numeric value (i.e. "worse")?
Another solution would be to use an ordered factor for the character variable. This way min will know what to do:
dat <- data.frame(a=1:3, b=c("worst","good","best"))
dat$b <- ordered(dat$b, levels=c("worst","good","best"))
min(dat$b)
Result:
> min(dat$b)
[1] worst
Levels: worst < good < best

subset indexing in r

I have a dataframe ma
it has a factor called type
type is comprised of the following factors: I210, I210plus, I210plusc, KV2c, KV2cplus
I'd like to put some of these factors in a vector, say, selected_types
so, selected_types<-c("I210plusc","KV2c")
then, have this command subset the dataframe ma
ma1<-subset(ma, type==selected_types)
such that ma1 would be a subset of ma consisting of only the observations that had
type I210plusc and KV2c
however, when I do this, the number of observations in the resulting dataframe ma1 is less than the sum of the occurrences of the two types in selected_types from the original ma
Any ideas on what I'm doing incorrectly?
Thank you
I originally had this in a comment, but it's a bit lengthy, plus I wanted to add to it. Here some details on what's happening:
what you're doing with == is recycling your two length vector, so that every even row is compared to "KV2c", and every odd one to "I210plusc", so your final result will be the data frame of odd rows that are "KV2c" and even rows that are "I210plusc".
An alternate solution that might make the issue clear is as follows:
subset(ma, type == selected_types[[1]] | type == selected_types[[2]])
Or, more gracefully:
subset(ma, type %in% selected_types)
The %in% operator returns a logical vector of same length as type with TRUE for every position in type that "is in" selected_types (hence the name of the operator).

R: Assigning value to a matrix with variable name

I'm struggling to remove a row in a matrix, where this matrix's name is "unknown". What I mean by "unknown" is that there are several matrices, and the last 3 characters of each matrix's name is different.
An example would make this a lot clearer I think.
Say I have 3 matrices, Trades_ABC, Trades_DEF, Trades_HIJ. Each of these matrices has x rows and 5 columns.
I currently have the following code:
for (k in 1:3)
assign(get(paste0("Trades_",sellLeg))[1,1],y)
next k
Where "sellLeg" is one of "ABC","DEF","HIJ"
In this code I am trying to change the value of the first element in each of the three matrices to some number, represented by "1", as an example. In reality, I'm not so much looking to CHANGE a value as I am looking to REMOVE a row, but my main problem is that I don't know how to assign a value to a matrix with an "unknown" name (once I can do this I should be able to remove a row)
Many thanks!

Resources