What does index do in r? - r

I have a code I'm working with which has the following line,
data2 <- apply(data1[,-c(1:(index-1))],2,log)
I understand that this creates a new data frame, from the data1, taking column-wise values log-transformed and some columns are eliminated, but I don't understand how the columns are removed. what does 1:(index-1) do exactly?

The ":" operator creates an integer sequence. Because (1:(index-1) ) is numeric and being used in the second position for the extraction operator"[" applied to a dataframe, it is is referring to column numbers. The person writing the code didn't need the c-function. It could have been more economically written:
data1[,-(1:(index-1))]
# but the outer "("...")"'s are needed so it starts at 1 rather than -1
So it removes the first index-1 columns from the object passed to apply. (As MrFlick points out, index must have been defined before this gets passed to R. There's not default value or interpretation for index in R.

Suppose the index is 5, then index -1 returns 4 so the sequence will be from 1 to 4 i.e. and then we use - implies loop over the columns other than the first 4 columns as MARGIN = 2

Related

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

How to find the length of a list based on a condition in R

The problem
I would like to find a length of a list.
The expected output
I would like to find the length based on a condition.
Example
Suppose that I have a list of 4 elements as follows:
myve <–list(1,2,3,0)
Here I have 4 elements, one of them is zero. How can I find the length by extracting the zero values? Then, if the length is > 1I would like to substruct one. That is:
If the length is 4 then, I would like to have 4-1=3. So, the output should be 3.
Note
Please note that I am working with a problem where the zero values may be changed from one case to another. For example, For the first list may I have only one 0 value, while for the second list may I have 2 or 3 zero values.
The values are always positive or zero.
You just need to apply the condition to each element. This will produce a list of boolean, then you sum it to get the number of True elements (i.e. validation your condition).
In your case:
sum(myve != 0)
In a more complex case, where the confition is expressed by a function f:
sapply(myve, f)
Use sapply to extract the ones different to zeros and sum to count them
sum(sapply(myve, function(x) x!=0))

Remove single value from vector leaving other occurrences of the same value

Suppose I have a large vector of integers in which a single integer can occur in the vector multiple times. I do not know the order of the values within the vector. Consider the code below: I have vector and I want to remove a single 1 to get newVector. Since the order within the vector is not known outside this example, I cannot simply use vector[-1].
vector<-c(1,1,2,2,3)
newVector<-c(1,2,2,3)
Some background: I iteratively pick two values from the vector (using sample) and then want to remove the values I picked from the vector.
Of course I could loop through the vector until I find the first occurrence of the value I wish to remove and remove it using the index, however, that is very time consuming. All the other results I found end up removing all occurrences of the value, which I don't want.
I think this would work, as which.max returns the index of the first match and then we can remove them using negative subsetting.
vector[-which.max(vector == 1)]
#[1] 1 2 2 3
Also, match does the same
vector[-match(1, vector)]
#[1] 1 2 2 3
You could use match. This finds the first occurrence of the specified value returning its index
vector<-c(1,1,2,2,3)
vector[-match(1, vector)]
# [1] 1 2 2 3

Easy way of creating categories

Suppose we have a two categorical variables A and B that can each take 6 values. So there are 36 possible combinations. I want to create a new variable category that enumerates these possibilities based on the values of A and B . Is there a way of doing this without hard coding?
apply(expand.grid(unique(A), unique(B)), 1, paste, collapse="")
From inmost function to the outmost:
unique, returns unique vales of its argument
expand.grid, returns a matrix which contains the Cartesian product of its components
apply, applies a given function to the specified matrix/data-frame/... along the given dimension (1 = rows, 2 = columns)
paste concatenates strings or vector elements

R Compare each data value of a column to rest of the values in the column?

I would like to create a function that looks at a column of values. from those values look at each value individually, and asses which of the other data points value is closest to that data point.
I'm guessing it could be done by checking the length of the data frame, making a list of the respective length in steps of 1. Then use that list to reference which cell is being analysed against the rest of the column. though I don't know how to implement that.
eg.
data:
20
17
29
33
1) is closest to 2)
2) is closest to 1)
3) is closest to 4)
4) is closest to 3)
I found this example which tests for similarity but id like to know what letter is assigns to.
x=c(1:100)
your.number=5.43
which(abs(x-your.number)==min(abs(x-your.number)))
Also if you know how I could do this, could you expain the parts of the code and what they mean?
I wrote a quick function that does the same thing as the code you provided.
The code you provided takes the absolute value of the difference between your number and each value in the vector, and compares that the minimum value from that vector. This is the same as the which.min function that I use below. I go through my steps below. Hope this helps.
Make up some data
a = 1:100
yourNumber = 6
Where Num is your number, and x is a vector
getClosest=function(x, Num){
return(which.min(abs(x-Num)))
}
Then if you run this command, it should return the index for the value of the vector that corresponds to the closest value to your specified number.
getClosest(x=a, Num=yourNumber)

Resources