Another nested loop in R - r

I have the following data and nested for loop:
x <- c(12, 27, 21, 16, 12, 21, 18, 16, 20, 23, 21, 10, 15, 26, 21, 22, 22, 19, 26, 26)
y <- c(8, 10, 7, 7, 9, 5, 7, 7, 10, 4, 10, 3, 9, 6, 4, 2, 4, 2, 3, 6)
a <- c(20,25)
a.sub <- c()
df <- c()
for(j in 1:length(a)){
a.sub <- which(x >= a[j])
for(i in 1:length(a.sub)){
df[i] <- y[a.sub[i]]
}
print(df)
}
I'd like the loop to return values for df as:
[1] 10 6 3 6 4 10 6 4 2 4 3 6
[1] 10 6 3 6
As I have it, however, the loop returns the same values twice of df for a <- 20 but not a <- 25:
[1] 10 7 5 10 4 10 6 4 2 4 3 6
[1] 10 6 3 6 4 10 6 4 2 4 3 6

for(i in 1:length(a.sub)){
df[i] <- y[a.sub[i]]
}
can become
df <- y[a.sub]
neither a.sub nor df need to be predefined then and thus...
x <- c(12, 27, 21, 16, 12, 21, 18, 16, 20, 23, 21, 10, 15, 26, 21, 22, 22, 19, 26, 26)
y <- c(8, 10, 7, 7, 9, 5, 7, 7, 10, 4, 10, 3, 9, 6, 4, 2, 4, 2, 3, 6)
a <- c(20,25)
for(j in 1:length(a)){
a.sub <- which(x >= a[j])
df <- y[a.sub]
print(df)
}
It could be made shorter. df is unnecessary if you're just printing the subset of y anyway. Just print it directly. And the selector is so short it wouldn't make a single line confusing. Furthermore, why use length of a and index.. loop through a directly. So, it could be...
a <- c(20,25)
for(ax in a){
print( y[ which(x >= ax) ] )
}

Not sure if this is a simplified version of a more complex problem, but I'd probably solve this using some direct indexing and an apply function. Something like this:
z <- cbind(x,y)
sapply(c(20,25), function(x) z[z[, 1] >= x, 2])
[[1]]
[1] 10 7 5 10 4 10 6 4 2 4 3 6
[[2]]
[1] 10 6 3 6

Related

How to cutoff values in dataframe?

From the following dataframe:
df1 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
revenue=c(34, 1000, 40, 49, 43, 55, 99))
df2 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
expenses=c(22, 26, 31, 40, 20, 2000, 22))
df3 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
profit=c(12, 10, 14, 12, 9, 15, 16))
df_list <- list(df1, df2, df3)
test <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list)
rownames(test) <- test[,1]
test[,1] <- NULL
test
I would like to eliminate extreme values (e.g. 1000 and 2000). I need to cutoff everything that is greater than 100. When I check test<100 I see TRUE and FALSE positions but I would like to replace them with NA or zeroes.
To replace all values in a dataframe (df) which values are higher than 100 with a 0 simply use: df[df > 100] = 0
We can use replace()
replace(test, test>100, NA)
revenue expenses profit
1 34 22 12
2 NA 26 10
3 40 31 14
4 49 40 12
5 43 20 9
6 55 NA 15
7 99 22 16

How can I recode a variable in R, so that the lowest value will be 1, the second lowest will be 2 etc

Imagine I have a tidy dataset with 1 variable and 10 observations. The values of the variable are e.g. 3, 5, 7, 9, 13, 17, 29, 33, 34, 67. How do I recode it so that the 3 will be 1, the 5 will be 2 (...) and the 67 will be 10?
One possibility is to use rank: in a ´dplyr` setting it could look like this:
library(dplyr)
tibble(x = c(3, 5, 7, 9, 13, 17, 29, 33, 34, 67)) %>%
mutate(y = rank(x))
Here is one way -
x <- c(3, 5, 7, 9, 13, 17, 29, 33, 67, 34)
x1 <- sort(x)
y <- match(x1, unique(x1))
y
#[1] 1 2 3 4 5 6 7 8 9 10
Changed the order of last 2 values so that it also works when the data is not in order.
Another way:
x <- c(3, 5, 7, 9, 13, 17, 29, 33, 67, 34)
x <- sort(x)
seq_along(x)
# 1 2 3 4 5 6 7 8 9 10

Replace elements of a data frame with special columns (list columns) depending on a threshold value

I have a data frame df with special columns:
df<- data.frame(w= 1:3, x=3:5, y=6:8, z = I(list(1:2, 1:3, 1:4)))
df <- as.data.frame(do.call(cbind, lapply(df[1:3], function(x) Map("*",
df$z, x))))
>df
w x y
1, 2 3, 6 6, 12
2, 4, 6 4, 8, 12 7, 14, 21
3, 6, 9, 12 5, 10, 15, 20 8, 16, 24, 32
I want to replace any number in df which has a value less than 6 with the number 6 and every value greater than 8 with the number 8. I do not want to touch the numbers in between and I want to maintain the data frame structure.
To achieve this, I have written a function transfo
transfo<- function(x){
x <- unlist(x)
if (x < 6){ x <- 6}
if (x > 8){ x <- 8}
x
}
When I run the following code:
transformed <- as.data.frame(sapply(df, transfo))
I get 10 of the warning messages:
1: In if (x < 6) { :
the condition has length > 1 and only the first element will be used
...and I do not get the required output.
My expected output is
>transformed
w x y
6, 6 6, 6 6, 8
6, 6, 6 6, 8, 8 7, 8, 8
6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8
I will be very grateful for a hint on the fastest way to replace all elements of the data frame df with 6 if they are less than 6 and with 8 if they are greater than 8 since I work with a large data set with 3000 rows.
Thanks in advance.
Assuming that the columns are list of vector, the OP got the warning as there are more than one element or the length is greater than 1. Instead of if/else we can use ifelse or if_else or case_when within mutate_all (as we need to change all the columns) and looping through the list with map
library(tidyverse)
out <- df %>%
mutate_all(funs(map(., ~ case_when(.x < 6 ~ 6,
.x > 8 ~ 8,
TRUE ~ as.numeric(.x)))))
out
# w x y
#1 6, 6 6, 6 6, 8
#2 6, 6, 6 6, 8, 8 7, 8, 8
#3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8
Or using pmin/pmax
df %>%
mutate_all(funs(map(., ~pmax(.x, 6) %>%
pmin(8))))
# w x y
#1 6, 6 6, 6 6, 8
#2 6, 6, 6 6, 8, 8 7, 8, 8
#3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8
Instead of applying the function on each of the nested list, we could unlist it and later relist back to the original structure
df %>%
mutate_all(funs(relist(pmin(pmax(unlist(.), 6), 8), skeleton = .)))
Or the same logic in base R
df[] <- lapply(df, function(x) relist(pmin(pmax(unlist(x), 6), 8), skeleton = x))
Or in data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) relist(pmin(pmax(unlist(x), 6), 8),
skeleton = x))]
Benchmarks
Created a slightly bigger dataset by replicating the rows of the 'df'
df1 <- df[rep(seq_len(nrow(df)), 5000),]
system.time({
df1 %>%
mutate_all(funs(map(., ~pmax(.x, 6) %>%
pmin(8))))
})
# user system elapsed
# 6.116 0.017 6.159
system.time({
df1 %>%
mutate_all(funs(relist(pmin(pmax(unlist(.), 6), 8), skeleton = .)))
})
# user system elapsed
# 0.389 0.000 0.389
The data.table and lapply (base R) methods also time similar to the one with dplyr using the modified code with relist
Also works
> out <- as.data.frame(do.call(cbind, lapply(df, function(i){
lapply(i, function(j){
ifelse((j < 6), 6, ifelse((j > 8), 8, j))
})
})))
> out
w x y
1 6, 6 6, 6 6, 8
2 6, 6, 6 6, 8, 8 7, 8, 8
3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8

How to generate sequence with exclusions in R

I need 4 functions that generate some numbers (each)
First function generates sequence from n odd numbers except 5, 15, 25, etc...
example with n=2: 1, 1, 3, 3, 7, 7, 9, 9, 11, 11, 13, 13, 17, 17,...
Second function generates sequence from n even numbers except 10, 20, 30, etc...
example witn n=2: 2, 2, 4, 4, 6, 6, 8, 8, 12, 12, 14, 14, 16, 16,...
Third function generates sequence from n numbers from 5 by 10
example witn n=2: 5, 5, 15, 15, 25, 25,...
Fourth function generates sequence from n numbers from 10 by 10
example witn n=2: 10, 10, 20, 20, 30, 30,...
Each function has to get vector 1: N and n as inputs.
For example,
f1(1:10, 3)
> 1, 1, 1, 3, 3, 3, 7, 7, 7, 9
f2(1:5, 10)
> 2, 2, 2, 2, 2
f3(1:15, 5)
> 5, 5, 5, 5, 5, 15, 15, 15, 15, 15, 25, 25, 25, 25, 25
f4(1:2, 1)
> 10, 20
I have some decision for first two functions but I don`t know how to exclude some numbers:
f1 <- function(x) 2*((x-1) %/% 10) + 1 # goes 1, 3, 5, etc for n = 10
f2 <- function(x) 2*((x-1) %/% 10 + 1) # goes 2, 4, 6, etc for n = 10
why not use seq and rep ?
n = 25
nrep = 2 # number of repetitions
by5 <- sort(rep(seq(5, n, by = 10), nrep )) # numbers from 5 by 10
by5
by10 <- sort(rep(seq(10, n, by = 10), nrep )) # numbers from 10 by 10
by10
odd <- sort(rep(seq(1, n, by = 2), nrep )) # odd number
odd[!odd %in% by5] # remove all the by5 values
even <- sort(rep(seq(2, n, by = 2), nrep )) # Even numbers
even[!even %in% by10] # remove all the by 10 values
output
> [1] 5 5 15 15 25 25
> [1] 10 10 20 20
> [1] 1 1 3 3 7 7 9 9 11 11 13 13 17 17 19 19 21 21 23 23
> [1] 2 2 4 4 6 6 8 8 12 12 14 14 16 16 18 18 22 22 24 24.

2 columns into list and sort in R

Let's say we have two list
x <- c(1, 3, 4, 2, 6, 5)
y <- c(12, 14, 15, 61, 71, 21)
I want to combine into a list so that we have 2 column x and y and values should be in same order.
x <- c(1, 3, 4, 2, 6, 5)
y <- c(12, 14, 15, 61, 71, 21)
After you have a list I want to sort it on y so the final list looks like
x <- c(1, 3, 4, 5, 2, 6)
y <- c(12, 14, 15, 21, 61, 71)
I am really new to R.
I tried list(x,y) but it seems to make a
list(1, 3, 4, 2, 6, 5, 12, 14, 15, 61, 71, 21)
so I was wondering someone could help me.
You need to put them in a data.frame first and then use order:
x <- c(1, 3, 4, 2, 6, 5)
y <- c(-12, 14, 15, 61, 71, 21)
DF <- data.frame(x, y)
> DF[order(DF$y),]
x y
1 1 -12
2 3 14
3 4 15
6 5 21
4 2 61
5 6 71
keeping as a list, using lapply:
x <- c(1, 3, 4, 2,6,5)
y <- c(12, 14,15,61,71,21)
l <- list(x = x, y = y)
## thelatemail
lapply(l, `[`, order(l$y))
# $x
# [1] 1 3 4 5 2 6
#
# $y
# [1] 12 14 15 21 61 71
a more explicit version of the short one given by #thelatemail above but doesn't preserve the names:
lapply(seq_along(l), function(x) l[[x]][order(l$y)])
# [[1]]
# [1] 1 3 4 5 2 6
#
# [[2]]
# [1] 12 14 15 21 61 71
or rapply:
rapply(l, function(x) x[order(l$y)], how = 'list')
# $x
# [1] 1 3 4 5 2 6
#
# $y
# [1] 12 14 15 21 61 71

Resources