Looping through items on a list in R - r

this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!

We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7

Related

How can I troubleshoot the delete row function

I am attempting to delete a row like this:
data <- data[-1645,]
However, after running the code, the row is still there. I can tell because there is an outlier in that row that is showing up on all my graphs, and when I view the data I can sort a column to easily find the offending outlier. I have had no trouble deleting rows in the past- has anyone run into anything similar? I do understand the limitations of outlier removal and I don't typically remove them however for a number of reasons I would like to see what the data look like without this one (in this case, all other values in the response variable are between -1 and 0, and in this row the value is 10^4).
You really need to provide more information, but there are several ways you can troubleshoot the problem. The first one is to print out the line you are removing:
data[1645, ]
Is that the outlier? You did not tell us how you identified the outlier. If lines have been removed from the data frame, the row names are not changed but the index values are changed, e.g.
set.seed(42)
x <- sample.int(25)
y <- sample.int(25)
data <- data.frame(x, y)
head(data)
# x y
# 1 17 2
# 2 5 8
# 3 1 3
# 4 10 1
# 5 4 10
# 6 18 11
data <- data[-c(5, 10, 15, 20, 25), ]
head(data)
# x y
# 1 17 2
# 2 5 8
# 3 1 3
# 4 10 1
# 6 18 11
# 7 25 15
data[6, ]
# x y
# 7 25 15
data["6", ]
# x y
# 6 18 11
Notice that the 6th row of the data has a row name of "7" but the row with name "6" is the 5th row in the data frame because we deleted the 5th row. The which function will give you the index value, but if you identified the outlier by looking at the printout, you got the row name and that may be different from the index. If we want to remove values in x greater than 24, here is one way to do that:
data[data$x<25, ]
After playing around with the data, I think the best explanation is that the indexing is off. This is in line with what dcarlson was saying- that it could be removing the 1,645th row, it just isn't labelled as such. I think the best solution is to use subset:
data <- subset(data, Yield.Decline < 100)
This is a more robust solution than trying to remove any given row based on its value (the line can be accidentally run multiple times without erroneously removing additional lines).

Create a new vector with no element being in the same position as the original vector?

Suppose I have a vector V1 (with two or more elements):
V1 <- 1:10
I can reorder the original vector with the function sample. This function, however, cannot make sure that none element in the new vector being in the same position as the original vector. For example:
set.seed(4)
V2 <- sample(V1)
This will result in a vector that has two elements being in the same position as the original one:
V1[V1 == V2]
3 5
My question is: Is it possible to generate a random vector to make sure that no element being in the same position between the two vectors?
Your requirement of not having certain indices in the vector not being able to shift means that you don't want a purely random permutation, where that might happen. The best I could come up with is to just loop, using sample until we find a vector where every element shifts:
v1 <- 1:10
v1_perm <- v1
cnt <- 0
while (sum(v1 == v1_perm) > 0) {
v1_perm <- sample(v1)
cnt <- cnt + 1
}
v1
v1_perm
paste0("It took ", cnt, " tries to find a suitable vector")
[1] 1 2 3 4 5 6 7 8 9 10
[1] 3 10 4 7 8 1 6 2 5 9
[1] "It took 3 tries to find a suitable vector"
Demo
Note that I have implemented the requirement of shifting positions with shifting values. This of course isn't strictly true, because two values could be the same. But, assuming all your entries are unique, then checking for zero overlap of values equates with zero overlap of indices.

Apply an operation to some elements of a vector by using indices

I've got a fairly basic question concerning vector operations in R. I want to apply a certain operation (i.e. increment) to specific elements of a vector by using a vector containing the indices of the elements.
For example:
ind <- c(2,5,8)
vec <- seq(1,10)
I want to add 1 to the 2nd, 5th and 8th element of vec. In the end I'd like to have:
vec <- c(1,3,3,4,6,6,7,9,8,10)
I tried vec[ind] + 1
but that returns only the three elements. I could use a for-loop, of course, but knowing R, I'm sure there's a more elegant way.
Any help would be much appreciated.
We have to assign it
vec[ind] <- vec[ind] + 1
vec
#[1] 1 3 3 4 6 6 7 9 9 10

Create vector by given distibution of values

Let's say I have a vector a = (1,3,4).
I want to create new vector with integer numbers in range [1,length(a)]. But the i-th number should appear a[i] times.
For the vector a I want to get:
(1,2,2,2,3,3,3,3)
Would you explain me how to implement this operation without several messy concatenations?
You can try rep
rep(seq_along(a), a)
#[1] 1 2 2 2 3 3 3 3
data
a <- c(1,3,4)

Applying a function on each row of a data frame in R

I would like to apply some function on each row of a dataframe in R.
The function can return a single-row dataframe or nothing (I guess 'return ()' return nothing?).
I would like to apply this function on each of the rows of a given dataframe, and get the resulting dataframe (which is possibly shorter, i.e. has less rows, than the original one).
For example, if the original dataframe is something like:
id size name
1 100 dave
2 200 sarah
3 50 ben
And the function I'm using gets a row n the dataframe (i.e. a single-row dataframe), returns it as-is if the name rhymes with "brave", otherwise returns null, then the result should be:
id size name
1 100 dave
This example actually refers to filtering a dataframe, and I would love to get both an answer specific to this kind of task but also to a more general case when even the result of the helper function (the one that operates on a single row) may be an arbitrary data frame with a single row. Please note than even in the case of filtering, I would like to use some sophisticated logic (not something simple like $size>100, but a more complex condition that is checked by a function, let's say boo(single_row_df).
P.s.
What I have done so far in these cases is to use apply(df, MARGIN=1) then do.call(rbind ...) but I think it give me some trouble when my dataframe only has a single row (I get Error in do.call(rbind, filterd) : second argument must be a list)
UPDATE
Following Stephen reply I did the following:
ranges.filter <- function(ranges,boo) {
subset(x=ranges,subset=!any(boo[start:end]))
}
I then call ranges.filter with some ranges dataframe that looks like this:
start end
100 200
250 400
698 1520
1988 2147
...
and some boolean vector
(TRUE,FALSE,TRUE,TRUE,TRUE,...)
I want to filter out any ranges that contain a TRUE value from the boolean vector. For example, the first range 100 .. 200 will be left in the data frame iff the boolean vector is FALSE in positions 100 .. 200.
This seems to do the work, but I get a warning saying numerical expression has 53 elements: only the first used.
For the more general case of processing a dataframe, get the plyr package from CRAN and look at the ddply function, for example.
install.packages(plyr)
library(plyr)
help(ddply)
Does what you want without masses of fiddling.
For example...
> d
x y z xx
1 1 0.68434946 0.643786918 8
2 2 0.64429292 0.231382912 5
3 3 0.15106083 0.307459540 3
4 4 0.65725669 0.553340712 5
5 5 0.02981373 0.736611949 4
6 6 0.83895251 0.845043443 4
7 7 0.22788855 0.606439470 4
8 8 0.88663285 0.048965094 9
9 9 0.44768780 0.009275935 9
10 10 0.23954606 0.356021488 4
We want to compute the mean and sd of x within groups defined by "xx":
> ddply(d,"xx",function(r){data.frame(mean=mean(r$x),sd=sd(r$x))})
xx mean sd
1 3 3.0 NA
2 4 7.0 2.1602469
3 5 3.0 1.4142136
4 8 1.0 NA
5 9 8.5 0.7071068
And it gracefully handles all the nasty edge cases that sometimes catch you out.
You may have to use lapply instead of apply to force the result to be a list.
> rhymesWithBrave <- function(x) substring(x,nchar(x)-2) =="ave"
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) dfr[i,] else NULL,
+ dfr))
id size name
1 1 100 dave
But in this case, subset would be more appropriate:
> subset(dfr,rhymesWithBrave(name))
id size name
1 1 100 dave
If you want to perform additional transformations before returning the result, you can go back to the lapply approach above:
> add100tosize <- function(x) within(x,size <- size+100)
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) add100tosize(dfr[i,])
+ else NULL,dfr))
id size name
1 1 200 dave
Or, in this simple case, apply the function to the output of subset.
> add100tosize(subset(dfr,rhymesWithBrave(name)))
id size name
1 1 200 dave
UPDATE:
To select rows that do not fall between start and end, you might construct a different function (note: when summing result of boolean/logical vectors, TRUE values are converted to 1s and FALSE values are converted to 0s)
test <- function(x)
rowSums(mapply(function(start,end,x) x >= start & x <= end,
start=c(100,250,698,1988),
end=c(200,400,1520,2147))) == 0
subset(dfr,test(size))
It sounds like you want to use subset:
subset(orig.df,grepl("ave",name))
The second argument evaluates to a logical expression that determines which rows are kept. You can make this expression use values from as many columns as you want, eg grepl("ave",name) & size>50

Resources