How to fill a list based off of other items in the list in R? - r

I have a list that looks like this:
n <- c(1, rep(NA, 9), 2, rep(NA, 9))
I want the 9 observations following the first observation to contain the same value as the first observation. And continue this pattern throughout the whole list. So ideally, I want my list to look like this:
c(rep(1, 10), rep(2, 10))
I want to accomplish this without using for loops, is there a way to do this?

library(zoo)
na.locf(n)
##[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

You can use the each argument in the rep command:
rep(1:2, each = 10)
# [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

My favorite non-na.locf way:
c(NA, n[!is.na(n)])[cumsum(!is.na(n)) + 1]
# [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
If there are NAs before the first value, they will stay. But if you know there are no NAs at the beginning of the vector it's just:
not.na <- !is.na(n)
n[not.na][cumsum(not.na)]

Related

Reorder (collate) vector elements automatically

It's an easy one, but I can find a simple solution for my problem. I have several vectors look like this one: rep(1:3, each = 3) and I want to convert them to like rep(1:3, times = 3).
So each element is repeated multiple times c(1,1,1,2,2,2,3,3,3) and I want to reorder them to c(1,2,3,1,2,3,1,2,3). How can I achieve that?
You can use a matrix transpose:
as.vector(t(matrix(x, nrow = 3)))
# [1] 1 2 3 1 2 3 1 2 3
v1 <- c(1,1,1,2,2,2,3,3,3)
o1 <- rle(v1)
rep(o1$values, min(o1$length))
[1] 1 2 3 1 2 3 1 2 3
This allows for unknown amount of numbers or strings but expects each value to be present in equal numbers. It only has some flexibility on what you want to do on some values occuring more than others.
Consider:
v2 <- c(1,1,1,2,2,2,3,3,3,3)
o2 <- rle(v2)
rep(o2$values, min(o2$length))
[1] 1 2 3 1 2 3 1 2 3
rep(o2$values, max(o2$length))
[1] 1 2 3 1 2 3 1 2 3 1 2 3

How to split the data 1 1 2 2 3 3 to 1 2 3 1 2 3 in R? [duplicate]

This question already has an answer here:
Sort vector into repeating sequence when sequential values are missing R
(1 answer)
Closed 6 months ago.
I want to convert a vector:
1 1 2 2 3 3
to
1 2 3 1 2 3
How to do it? Many thanks.
You can use a matrix to layout the original vector by rows and then convert it back to a vector to get the desired result.
v = c(1,1,2,2,3,3)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
> v2
[1] 1 2 3 1 2 3
The length(unique(v)) is there to generalize how many rows the matrix should have and not hardcode a 3.
Another example:
v = c(1,1,1,2,2,2,3,3,3,4,4,4)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
v2
[1] 1 2 3 4 1 2 3 4 1 2 3 4
We can use rbind/split
c(do.call(rbind, split(v1, v1)))
#[1] 1 2 3 1 2 3
Or if there are unequal number of replications of each element, get the order of the rowid
library(data.table)
v1[order(rowid(v1))]
#[1] 1 2 3 1 2 3
Or with base R
v1[order(ave(v1, v1, FUN = seq_along))]
#[1] 1 2 3 1 2 3
data
v1 <- c(1, 1, 2, 2, 3, 3)
vec <- c(1, 1, 2, 2, 3, 3)
rep(unique(vec), 2)
[1] 1 2 3 1 2 3

R write table last longer for 2 columns than for whole dataframe

A dataframe with 40 columns:
This is executed after a few seconds
write.table(data_2[1:10000,], file = "/Volumes/2018/06_abteilungen/bi/analytics/tools/adobe/adobe_analytics/adobe_analytics_api_rohdaten/api_via_data_feed_auf_ftp/beispiel_datenexporte_data_feed/r_exporte/channel_va_closer.csv", sep = ";", col.names = NA)
This never ends:
write.table(data_2[1:1000,c(data_2$va_closer_detail,data_2$va_closer_id)], file = "/Volumes/2018/06_abteilungen/bi/analytics/tools/adobe/adobe_analytics/adobe_analytics_api_rohdaten/api_via_data_feed_auf_ftp/beispiel_datenexporte_data_feed/r_exporte/channel_va_closer.csv", sep = ";", col.names = NA)
How can I extract only 2 columns without performance-delay?
You can use [ to subset a data frame either by giving it row/column indices or row/column names. For example:
dd = data.frame(col1 = rep(1:2, 5), col2 = c(rep(1:3, 3), 1), col3 = 'a')
dd
# col1 col2 col3
# 1 1 1 a
# 2 2 2 a
# 3 1 3 a
# 4 2 1 a
# 5 1 2 a
# 6 2 3 a
# 7 1 1 a
# 8 2 2 a
# 9 1 3 a
# 10 2 1 a
If you wanted the first 5 rows and the first 2 columns, you could do either of these:
# good
dd[1:5, 1:2] # using column indices
dd[1:5, c("col1", "col2")] # using column names
But what you have in your question is
# bad
dd[1:5, c(dd$col1, dd$col2)] # using actual values :(
What columns are you asking for? Well, dd$col1 is the first column values: 1,2,1,2,... and dd$col2 is the second column values 1,2,3,1,2,3... Using c() you are sticking them together, so we can expand this out to
c(dd$col1, dd$col2) # these are the columns you are asking for
# [1] 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 3 1 2 3 1
# these are equivalent for this data
dd[1:5, c(dd$col1, dd$col2)]
dd[1:5, c(1,2,1,2,1,2,1,2,1,2,1,2,3,1,2,3,1,2,3,1)]
# col1 col2 col1.1 col2.1 col1.2 col2.2 col1.3 col2.3 col1.4 col2.4 col1.5 col2.5 col3 col1.6 col2.6 col3.1 col1.7 col2.7
# 1 1 1 1 1 1 1 1 1 1 1 1 1 a 1 1 a 1 1
# 2 2 2 2 2 2 2 2 2 2 2 2 2 a 2 2 a 2 2
# 3 1 3 1 3 1 3 1 3 1 3 1 3 a 1 3 a 1 3
# 4 2 1 2 1 2 1 2 1 2 1 2 1 a 2 1 a 2 1
# 5 1 2 1 2 1 2 1 2 1 2 1 2 a 1 2 a 1 2
# col3.2 col1.8
# 1 a 1
# 2 a 2
# 3 a 1
# 4 a 2
# 5 a 1
We are asking to repeat the columns again and again, with twice as many columns as there are rows in the original data! I don't know how many rows you have, it looks like more than 1000, so you are asking not for 2 columns, but for more than 2000 columns - maybe a lot more.
Two footnotes:
I second the the comment recommending data.table::fwrite, it will be much faster.
As a debugging technique, don't forget you can run small pieces of code to isolate the problem. When you try
write.table(data_2[1:1000,c(data_2$va_closer_detail,data_2$va_closer_id)],
file = "/Volumes/2018/06_abteilungen/bi/analytics/tools/adobe/adobe_analytics/adobe_analytics_api_rohdaten/api_via_data_feed_auf_ftp/beispiel_datenexporte_data_feed/r_exporte/channel_va_closer.csv",
sep = ";", col.names = NA)
And it doesn't seem to work there are two things worth checking: (a) is the file path valid, (b) is the data valid. If you had just tried running the data_2[...] part of the line, you would have identified the problem without needing help.
data_2[1:1000,c(data_2$va_closer_detail,data_2$va_closer_id)]
And when you ran that and saw different output than expected, again you run a smaller piece of the line,
c(data_2$va_closer_detail,data_2$va_closer_id)
And hopefully the issue is clear.

Count frequency of each element in vector

I'm looking for a way to count the frequency of each element in a vector.
ex <- c(2,2,2,3,4,5)
Desired outcome:
[1] 3 3 3 1 1 1
Is there a simple command for this?
rep(table(ex), table(ex))
# 2 2 2 3 4 5
# 3 3 3 1 1 1
If you don't want the labels you can wrap in as.vector()
as.vector(rep(table(ex), table(ex)))
# [1] 3 3 3 1 1 1
I'll add (because it seems related somehow) that if you only wanted consecutive values, you could use rle instead of table:
ex2 = c(2, 2, 2, 3, 4, 2, 2, 3, 4, 4)
rep(rle(ex2)$lengths, rle(ex2)$lengths)
# [1] 3 3 3 1 1 2 2 1 2 2
As pointed out in comments, for a large vector calculating a table can be expensive, so doing it only once is more efficient:
tab = table(ex)
rep(tab, tab)
# 2 2 2 3 4 5
# 3 3 3 1 1 1
You can use
ex <- c(2,2,2,3,4,5)
outcome <- ave(ex, ex, FUN = length)
This is what thelatemail suggested. Also similar to the answer at this question

Create a vector that repeats itself in R

I would like to create a vector that repeats itself. (eg 1:3 until 12 rows)
1,2,3,1,2,3,1,2,3,1,2,3
How can I do this in R?
Thanks for your help.
See ?rep. What you want is as easy as
> rep(1:3, times = 4)
[1] 1 2 3 1 2 3 1 2 3 1 2 3
but if you don't know the length of the vector until run time but you do know the length of the output required, you could do (updated to reflect comment from #baptiste):
> rep(1:3, length.out = 12)
[1] 1 2 3 1 2 3 1 2 3 1 2 3

Resources