Writing a function in in R

Writing a function in in R - r

I am doing an exercise to practice writing functions.
I'm trying to figure out the general code before writing the function that reproduces the output from the table function. So far, I have the following:
set.seed(111)
vec <- as.integer(runif(10, 5, 20))
x <- sort(unique(vec))
for (i in x) {
c <- length(x[i] == vec[i])
print(c)
}
But this gives me the following output:
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
I don't think I'm subsetting correctly in my loop. I've been watching videos, but I'm not quite sure where I'm going wrong. Would appreciate any insight!
Thanks!

We can sum the logical vector concatenate it to count
count <- c()
for(number in x) count <- c(count, sum(vec == number))
count
#[1] 3 1 4 1 5 4 3 2 7
In the OP's for loop, it is looping over the 'x' values and not on the sequence of 'x'
If we do
for(number in x) count <- c(count, length(vec[vec == number]))
it should work as well

You can try sapply + setNames to achieve the same result like table, i.e.,
count <- sapply(x, function(k) setNames(sum(k==vec),k))
or
count <- sapply(x, function(k) setNames(length(na.omit(match(vec,k))),k))
such that
> count
1 2 3 4 5 6 7 8 9
3 1 4 1 5 4 3 2 7

Here is a solution without using unique and with one pass through the vector (if only R was fast with for loops!):
count = list()
for (i in vec) {
val = as.character(i)
if (is.null(count[[val]]))
count[[val]] = 1
else
count[[val]] = count[[val]] + 1
}
unlist(count)

Related

Modifying a list based on the length of its shortest string

I have a list of two series that start out the same length. After executing the following code, the second series has one fewer elements than the first. Is there a general way of removing the final element of only the series containing n+1 elements, so that all the series in my list have n elements? What about if I have a combination of series in my list containing n, n+1 and n+2 elements? Below is a minimal reproducible example.
#test
library('urca')
tseries <- list("t1" = c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1), "t2" = c(1,2,3,4,5,6,7,8,9,10,9,8,7,8,9));
# apply stationarity test to the list of series
adf <- lapply(tseries, function(x) tseries::adf.test(x)$p.value)
adf
# index only series that need differencing
not_stationary <- tseries[which(adf > 0.05)]
stationary <- tseries[which(adf < 0.05)]
not_stationary <- lapply(not_stationary, diff);
# verify
adf <- lapply(not_stationary, function(x) tseries::adf.test(x)$p.value)
adf
now_stationary <- not_stationary
#combine stationary and now_stationary
tseries_diff <- c(stationary, now_stationary)
tseries_diff
#$t1
#[1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
#$t2
#[1] 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 1
So to summarise, I would ike to remove the final element, 1, from t1, but using code that can be applied to a list of series of lengths n and n+1 (and n+2 would be useful).
Thanks!

You can find the minimum length and simply get the series up to that point, i.e.
new_series_list <- lapply(tseries_diff, function(i)i[seq(min(lengths(tseries_diff)))])
so the lengths are now the same
lengths(new_series_list)
#t1 t2
#14 14
This will work in any size series. It will trim the long series to much the short one.

Edited for list instead of vector -
If you are dealing with list, you are wanting to make all of the series the length of the shortest:
(I modify the example to avoid using a library)
#test
mylist <- c(1,1,1,1,1)
mylongerlist <- c(1,1,1,1,1,1,1)
length(mylist)
# [1] 5
length(mylongerlist)
# [1] 7
#combine
tseries_diff <- list("t1" = mylist, "t2" = mylongerlist)
tseries_diff
# $t1
# [1] 1 1 1 1 1
#
# $t2
# [1] 1 1 1 1 1 1 1
# on the fly approach to truncate
lapply(tseries_diff, function(x) { length(x) <- min(lengths(tseries_diff)); x })
# $t1
# [1] 1 1 1 1 1
#
# $t2
# [1] 1 1 1 1 1
And a function
# As a reusable function for clear code
reduceToShortestLength <- function(toCut) {
# takes a list and cuts the tail off of any series longer than the shortest
lapply(toCut, function(x) { length(x) <- min(lengths(tseries_diff)); x })
}
reduceToShortestLength(tseries_diff)
# $t1
# [1] 1 1 1 1 1
#
# $t2
# [1] 1 1 1 1 1
Original below (in case anyone thinks vector like I did at first)
I think you are asking how to truncate a vector to the shortest length. The head function does this well in base R.
the on the fly approach:
> mylist <- c(1,1,1,1,1)
> mylongerlist <- c(1,1,1,1,1,1,1)
> length(mylist)
[1] 5
> length(mylongerlist)
[1] 7
> x <- head(mylongerlist, length(mylist))
> length(x)
[1] 5
A function can be written like so:
> reduceToShorterLength<- function(toshorten, template) { head(toshorten, length(template))}
> x <- reduceToShorterLength(mylongerlist, mylist)
> length(x)
[1] 5

for loop with decimals and store results in a vector

I'm working on a for loop in R, and I had to store the results in a vector. I know that this is a quite common answer, and my problem is not there, but let's proceed with order.
I got those data:
# here the data
alpha <- c(1,2,3,4,5,6)
beta <- c(0.1,0.5,0.3,0.4,0.5,0.6)
data <- data.frame(alpha, beta)
And I make a simple function that select the data above a certain threshold:
# here the function
funny <- function(x,k)
{x[x[,2]>=k,]}
# here an example of the function
funny(data,0.5)
alpha beta
2 2 0.5
5 5 0.5
6 6 0.6
But what I want is the number of the rows that go over the threshold, so:
# here the result wanted
nrow(funny(data,0.5))
[1] 3
So I got a question: how many rows go over the threshold at the variation of k, the parameter of the function? And I would like to have the result in a vector. I created a for loop, looking at
For loop in R with increments
Saving results from for loop as a vector in r
And I created this: first of all let's see if everything is all right:
# here the sequence
s <-seq(0.1,0.6, by = 0.1)
# here the I loop
for(i in s) {print(nrow(funny(data,i)))}
[1] 6
[1] 5
[1] 4
[1] 4
[1] 3
[1] 1
But clearly this is not stored in a vector. The problem is here. I tried:
# already written sequence
s <-seq(0.1,0.6, by = 0.1)
# here the empty vector
vec <- vector("numeric")
# here the II problematic loop
for(i in s) {vec[i]<-(nrow(funny(data,i)))}
vec
And here the result I do not want, I expected something like [1] 6 5 4 4 3 1
[1] 0 0 0 0 0 0
Furthermore infos:
I tried something like this:
# sequence * 10
s <-seq(1,6, by = 1)
# here the vector
vec <- vector("numeric")
# and the III loop, that it works now.
for(i in s) {vec[i]<-(nrow(funny(data,i/10)))}
vec
[1] 6 5 5 4 3 1
But I do not like this, because I do not understand why the III works and why the II loop no.
What I am missing?

We can try with sapply which will return a vector
sapply(s, function(x) nrow(funny(data, x)))
#[1] 6 5 4 4 3 1
As far as why your loop II is not working. If you do,
for(i in s) {
print(i)
}
You'll get
[1] 0.1
[1] 0.2
[1] 0.3
[1] 0.4
[1] 0.5
[1] 0.6
So when you are trying to store in your loop II vec[i] <-, you are actually doing vec[0.1] in first case which is not correct.
To correct your loop, try
for(i in seq_along(s)) {vec[i]<-(nrow(funny(data,s[i])))}
vec
#[1] 6 5 4 4 3 1
Where seq_along(s) would return #[1] 1 2 3 4 5 6.

How to append a growing array to itself efficiently

I have a data frame "v" with id and value columns, such as:
set.seed(123)
v <- data.frame(id=sample(1:5),value=sample(1:5))
v
id value
1 2 1
2 4 3
3 5 4
4 3 2
5 1 5
In the loop, I want to find the index of v which v's id matches tmp and then find the subset of v based on this index.
tmp is a sample with "replacement" of v$id
Here is my attempt:
df <- vector(mode='list',length = iter)
iter = 1
for (i in 1:iter)
{
tmp <- sample(v$id, replace=T)
index.position <- NULL
for (j in 1:length(tmp)) {index.position <- c(index.position, which(v$id %in% tmp[j]) )}
df[[i]] <- v[index.position,]
}
tmp
[1] 1 5 3 5 2
df
[[1]]
id value
5 1 5
3 5 4
4 3 2
3.1 5 4
1 2 1
This works as expected. However, the execution is very slow when both "v" and "iter" are large because growing the index.position array is not memory efficient.
I have also tried to create an empty matrix or list as a placeholder and then assign index.position to it as I loop, but did not really speed up the process.
(reference: Growing a data.frame in a memory-efficient manner)
Edit: id "isn't" unique in v

Try to avoid for...for... loop. It is extremely inefficient. It is equal to:
for (i in 1:iter)
{
df[[i]] <- v[sample(nrow(v),replace = T),]
}
a more verbose version of Gregor's solution...

How to extract a number into digits using R?

Suppose I have a number: 4321
and I want to extract it into digits: 4, 3, 2, 1
How do I do this?

Alternatively, with strsplit:
x <- as.character(4321)
as.numeric(unlist(strsplit(x, "")))
[1] 4 3 2 1

Use substring to extract character at each index and then convert it back to integer:
x <- 4321
as.integer(substring(x, seq(nchar(x)), seq(nchar(x))))
[1] 4 3 2 1

For real fun, here's an absurd method:
digspl<-function(x){
x<-trunc(x) # justin case
mj<-trunc(log10(x))
y <- trunc(x/10^mj)
for(j in 1:mj) {
y[j+1]<- trunc((x-y[j]*10^(mj-j+1))/(10^(mj-j)))
x<- x - y[j]*10^(mj-j+1)
}
return(y)
}

For fun, here's an alternative:
x <- 4321
read.fwf(textConnection(as.character(x)), rep(1, nchar(x)))
# V1 V2 V3 V4
# 1 4 3 2 1
The only advantage I can think of is the possibility of exploding your input into varying widths, though I guess you can do that with substring too.

An alternative solution, using modulo operator:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
# for one number
get_all_digit <- function(x) {
get_digit_x <- function(d) get_digit(x,d)
sapply(nchar(x):1, get_digit_x)
}
# for a vector of numbers
digits <- function(x) {
out <- lapply(x, get_all_digit)
names(out) <- x
out
}
Example:
> digits(100:104)
$`100`
[1] 1 0 0
$`101`
[1] 1 0 1
$`102`
[1] 1 0 2
$`103`
[1] 1 0 3
$`104`
[1] 1 0 4

R: How can I sum across variables, within cases, while counting NA as zero

Fake data for illustration:
df <- data.frame(a=c(1,2,3,4,5), b=(c(2,2,2,2,NA)),
c=c(NA,2,3,4,5)))
This would get me the answer I want IF it weren't for the NA values:
df$count <- with(df, (a==1) + (b==2) + (c==3))
Also, would there be an even more elegant way if I was only interested in, e.g. variables==2?
df$count <- with(df, (a==2) + (b==2) + (c==2))
Many thanks!

The following works for your specific example, but I have a suspicion that your real use case is more complicated:
df$count <- apply(df,1,function(x){sum(x == 1:3,na.rm = TRUE)})
> df
a b c count
1 1 2 NA 2
2 2 2 2 1
3 3 2 3 2
4 4 2 4 1
5 5 NA 5 0
but this general approach should work. For instance, your second example would be something like this:
df$count <- apply(df,1,function(x){sum(x == 2,na.rm = TRUE)})
or more generally you could allow yourself to pass in a variable for the comparison:
df$count <- apply(df,1,function(x,compare){sum(x == compare,na.rm = TRUE)},compare = 1:3)

Another way is to subtract your target vector from each row of your data.frame, negate and then do rowSums with na.rm=TRUE:
target <- 1:3
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 2 1 2 1 0
target <- rep(2,3)
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 1 3 1 1 0

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Writing a function in in R - r

You can try sapply + setNames to achieve the same result like table, i.e., count <- sapply(x, function(k) setNames(sum(k==vec),k)) or count <- sapply(x, function(k) setNames(length(na.omit(match(vec,k))),k)) such that > count 1 2 3 4 5 6 7 8 9 3 1 4 1 5 4 3 2 7

Here is a solution without using unique and with one pass through the vector (if only R was fast with for loops!): count = list() for (i in vec) { val = as.character(i) if (is.null(count[[val]])) count[[val]] = 1 else count[[val]] = count[[val]] + 1 } unlist(count)

Related

Modifying a list based on the length of its shortest string

for loop with decimals and store results in a vector

How to append a growing array to itself efficiently

How to extract a number into digits using R?

R: How can I sum across variables, within cases, while counting NA as zero

Categories

Resources