Please simplify my code. The result should be the same. The script works but R shows warning messages:
1: In data$sygnature[seq(first[v], last[v])] <- paste0(n[v], "/", syg) :
number of items to replace is not a multiple of replacement length
etc.
The idea is to assign each sequence in the column the same value.
data <- data.frame(sygnature = c(seq(1:8),seq(1:3),seq(1:11),seq(1:6),seq(1:9),seq(1:5)))
n <- c(44:49)
k<-c()
for(i in (1: nrow(data))){
s<- data$sygnature[i]
z<-data$sygnature[i+1]
if(
if(is.na(z)){
z<-1
s > z
}else{
s > z
}
){
k<- c(k, s)
}
}
last<- cumsum(k)
first<-(last-k)+1
syg <- data$sygnature
for(v in 1:6)
{
data$sygnature[seq(first[v],last[v])] <- paste0(n[v],"/",syg)
}
You can do,
data$res <- paste0(rep(n, aggregate(sygnature ~ cumsum(sygnature == 1), data, length)[[2]]),
'/',
data$sygnature)
data
sygnature res
1 1 44/1
2 2 44/2
3 3 44/3
4 4 44/4
5 5 44/5
6 6 44/6
7 7 44/7
8 8 44/8
9 1 45/1
10 2 45/2
11 3 45/3
12 1 46/1
13 2 46/2
14 3 46/3
15 4 46/4
16 5 46/5
17 6 46/6
18 7 46/7
19 8 46/8
20 9 46/9
21 10 46/10
22 11 46/11
23 1 47/1
24 2 47/2
25 3 47/3
26 4 47/4
27 5 47/5
28 6 47/6
29 1 48/1
30 2 48/2
31 3 48/3
32 4 48/4
33 5 48/5
34 6 48/6
35 7 48/7
36 8 48/8
37 9 48/9
38 1 49/1
39 2 49/2
40 3 49/3
41 4 49/4
42 5 49/5
Related
I'm looking for the optimal way to go from a numeric vector containing duplicate entries, like this one:
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
to this one, avoiding the duplicates by shifting +1 if appropriate:
b=c(1,3,4,5,6,7,8,9,27,28,29,30,42,43)
side to side comparison:
> data.frame(a=a, b=b)
a b
1 1 1
2 3 3
3 4 4
4 4 5
5 4 6
6 5 7
7 7 8
8 9 9
9 27 27
10 28 28
11 28 29
12 30 30
13 42 42
14 43 43
is there any easy and quick way to do it? Thanks!
In case you want it to be done only once (there may still be duplicates):
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
a <- ifelse(duplicated(a),a+1,a)
output:
> a
[1] 1 3 4 5 5 5 7 9 27 28 29 30 42 43
Loop that will lead to a state without any duplicates:
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
while(length(a[duplicated(a)])) {
a <- ifelse(duplicated(a),a+1,a)
}
output:
> a
[1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43
An alternative is to use a recursive function:
no_dupes <- function(x) {
if (anyDuplicated(x) == 0)
x
else
no_dupes(x + duplicated(x))
}
no_dupes(a)
[1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43
A tidyverse option using purrr::accumulate.
library(dplyr)
library(purrr)
accumulate(a, ~ if_else(.y <= .x, .x+1, .y))
# [1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43
I have a table in a file. There is a character line before the table starts. The table in a file looks like this
XYZ=1
1 40 3 24 4
2 40 4 16 21
3 40 3 12 16
XYZ=2
1 40 5 27 8
2 40 4 16 21
3 40 2 14 24
I want to have output with replicating all rows. For example Row 1 to Row 3 should repeat themselves and the resulting table should have 6 rows in total. The output should look something like this -
XYZ=1
1 40 3 24 4
2 40 4 16 21
3 40 3 12 16
4 40 3 24 4
5 40 4 16 21
6 40 3 12 16
XYZ=2
1 40 5 27 8
2 40 4 16 21
3 40 2 14 24
4 40 5 27 8
5 40 4 16 21
6 40 2 14 24
I am new to R. It will be great if someone help me with this problem.
This should do it. It definitely isn't the cleanest way to do it, but from a beginners point of view the functions are pretty basic. test_table.txt is the same data you gave in the beginning. I don't create any lists and the output looks like
> output
[,1] [,2] [,3] [,4] [,5]
XYZ = 1 1 40 3 24 4
2 40 4 16 21
3 40 3 12 16
XYZ = 2 1 40 3 24 4
2 40 4 16 21
3 40 3 12 16
XYZ = 3 1 40 5 27 8
2 40 4 16 21
3 40 2 14 24
XYZ = 4 1 40 5 27 8
2 40 4 16 21
3 40 2 14 24
data <- readLines("test_table.txt")
get_rid <- which(unlist(strsplit(data, split = "=")) == "XYZ")
new_data <- data[-c(get_rid[1], get_rid[2:length(get_rid)]-1)]
output <- matrix(nrow = 1, ncol = 5)
for ( i in 1 : length(new_data) ) {
temp <- unlist(strsplit(new_data[i], " "))
if ( temp[1] == 1 && i != 1 ) {
c <- length(output[,1])
output <- rbind(output, output[(c-output[(c-1),1]):(output[c,1]+1),])
##output <- rbind(output, output[(i-output[(i-1),1]):(output[(i),1]+1),])
output <- rbind(output, as.numeric(temp))
} else {
output <- rbind(output, as.numeric(temp))
}
if ( i == length(new_data) ) {
output <- rbind(output, output[max(which(output[,1] ==
1)):length(output[,1]),])
}
}
output <- output[2:length(output[,1]),]
xyz_names <- character(length(output[,1]))
c <- 1
for ( i in 1 : length(output[,1]) ) {
if ( output[i,1] == 1 ) {
xyz_names[i] <- paste("XYZ =", c, collapse = "")
c <- c + 1
} else {
xyz_names[i] <- ""
}
}
rownames(output) <- xyz_names
##the values of XYZ = ... is
which(output[,1] == 1)
output
I have a dataset consisting of two variables, Contents and Time like so:
Time Contents
2017M01 123
2017M02 456
2017M03 789
. .
. .
. .
2018M12 789
Now I want to create a numeric vector that aggregates Contents for six months, that is I want to sum 2017M01 to 2017M06 to one number, 2017M07 to 2017M12 to another number and so on.
I'm able to do this by indexing but I want to be able to write: "From 2017M01 to 2017M06 sum contents corresponding to that sequence" in my code.
I would really appreciate some help!
You can create a grouping variable based on the number of rows and number of elements to group. For your case, you want to group every 6 rows so your data frame should be divisible with 6. Using iris to demonstrate (It has 150 rows, so 150 / 6 = 25)
rep(seq(nrow(iris)%/%6), each = 6)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10
#[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25
There are plenty of ways to handle how you want to call it. Here is a custom function that allows you to do that (i.e. create the grouping variable),
f1 <- function(x, df) {
v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
i1 <- (v2 - v1) + 1
return(rep(seq(nrow(df)%/%i1), each = i1))
}
f1("2017M01:2017M06", iris)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10
#[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25
EDIT: We can easily make the function compatible with 'non-0-remainder' divisions by concatenating the final result with a repetition of the max+1 value of the final result of remainder times, i.e.
f1 <- function(x, df) {
v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
i1 <- (v2 - v1) + 1
final_v <- rep(seq(nrow(df) %/% i1), each = i1)
if (nrow(df) %% i1 == 0) {
return(final_v)
} else {
remainder = nrow(df) %% i1
final_v1 <- c(final_v, rep((max(final_v) + 1), remainder))
return(final_v1)
}
}
So for a data frame with 20 rows, doing groups of 6, the above function will yield the result:
f1("2017M01:2017M06", df)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4
I would like to create groups from a base by matching values.
I have the following data table:
now<-c(1,2,3,4,24,25,26,5,6,21,22,23)
before<-c(0,1,2,3,23,24,25,4,5,0,21,22)
after<-c(2,3,4,5,25,26,0,6,0,22,23,24)
df<-as.data.frame(cbind(now,before,after))
which reproduces the following data:
now before after
1 1 0 2
2 2 1 3
3 3 2 4
4 4 3 5
5 24 23 25
6 25 24 26
7 26 25 0
8 5 4 6
9 6 5 0
10 21 0 22
11 22 21 23
12 23 22 24
I would like to get:
now before after group
1 1 0 2 A
2 2 1 3 A
3 3 2 4 A
4 4 3 5 A
5 5 4 6 A
6 6 5 0 A
7 21 0 22 B
8 22 21 23 B
9 23 22 24 B
10 24 23 25 B
11 25 24 26 B
12 26 25 0 B
I would like to reach the answer to this without using a "for" loop becouse the real data is too large.
Any you could provide will be appreciated.
Here is one way. It is hard to avoid a for-loop as this is quite a tricky algorithm. The objection to them is often on the grounds of elegance rather than speed, but sometimes they are entirely appropriate.
df$group <- seq_len(nrow(df)) #assign each row to its own group
stop <- FALSE #indicates convergence
while(!stop){
pre <- df$group #group column at start of loop
for(i in seq_len(nrow(df))){
matched <- which(df$before==df$now[i] | df$after==df$now[i]) #check matches in before and after columns
group <- min(df$group[i], df$group[matched]) #identify smallest group no of matching rows
df$group[i] <- group #set to smallest group
df$group[matched] <- group #set to smallest group
}
if(identical(df$group, pre)) stop <- TRUE #stop if no change
}
df$group <- LETTERS[match(df$group, sort(unique(df$group)))] #convert groups to letters
#(just use match(...) to keep them as integers - e.g. if you have more than 26 groups)
df <- df[order(df$group, df$now),] #reorder as required
df
now before after group
1 1 0 2 A
2 2 1 3 A
3 3 2 4 A
4 4 3 5 A
8 5 4 6 A
9 6 5 0 A
10 21 0 22 B
11 22 21 23 B
12 23 22 24 B
5 24 23 25 B
6 25 24 26 B
7 26 25 0 B
I have a list that stores different data types and objects:
header <- "This is a header."
a <- 10
b <- 20
c <- 30
w <- 1:10
x <- 21:30
y <- 51:60
z <- 0:9
mylist <- list(header = header,
const = list(a = a, b = b, c = c),
data = data.frame(w,x,y,z))
Now I want R to display this list in the following format:
This is a header.
Values: a: 10 b: 20 c: 30
Data: w x y z
1 1 21 51 0
2 2 22 52 1
3 3 23 53 2
4 4 24 54 3
5 5 25 55 4
6 6 26 56 5
7 7 27 57 6
8 8 28 58 7
9 9 29 59 8
10 10 30 60 9
Is there a convenient way to do this?
If you want to use this kind of print regularly i would use a class as follows:
class(mylist) <- "myclass"
print.myclass <- function(x, ...){
cat(x$header,"\n\n")
cat("Values: ", sprintf("%s: %s", names(x$const), x$const), "\n\n")
cat("Data:\n")
print(x$data, ...)
}
If you want to learn more about generic function have a look at http://adv-r.had.co.nz/OO-essentials.html
Printing now results in:
> mylist #equal to print(mylist). Thats why we extended print with print.myclass
This is a header.
Values: a: 10 b: 20 c: 30
Data:
w x y z
1 1 21 51 0
2 2 22 52 1
3 3 23 53 2
4 4 24 54 3
5 5 25 55 4
6 6 26 56 5
7 7 27 57 6
8 8 28 58 7
9 9 29 59 8
10 10 30 60 9
Thanks to Ananda Mahto and David Arenburg for improving my original answer.