How to rename data within a dataframe - r

I have a dataframe but the numbering of the months is all jumbled. I need to change the following rows to the following, but i'm struggling to see an easy way through. I'm aware that this code changes the data, but it's just a case of working out the puzzle without adding columns together.
data$column[data$column == "0"] <- "7"
0 <- 7
1 <- 8
2 <- 9
3 <- 10
4 <- 1
5 <- 2
6 <- 3
7 <- 4
8 <- 5
9 <- 6
Thank you

maybe plyr::mapvalues() can help you here:
library(plyr)
df$column <- mapvalues(df$column, from = c(0,1:9), to = c(7:10, 1:6))

Use
data$column <- (data$column + 7) %% 10

Related

Reverse order of rows of data.frame in R [duplicate]

I need to change/invert rows in my data frame, not transposing the data but moving the bottom row to the top and so on. If the data frame was:
1 2 3
4 5 6
7 8 9
I need to convert to
7 8 9
4 5 6
1 2 3
I've read about sort() but I don't think it is what I need or I'm not able to find the way.
There probably are more elegant ways, but this works:
m <- matrix(1:9, ncol=3, byrow=TRUE)
# m[rev(seq_len(nrow(m))), ] # Initial answer
m[nrow(m):1, ]
[,1] [,2] [,3]
[1,] 7 8 9
[2,] 4 5 6
[3,] 1 2 3
This works because you are indexing the matrix with a reversed sequence of integers as the row index. nrow(m):1 results in 3 2 1.
You can reverse the order of a data.frame using the dplyr package:
iris %>% arrange(-row_number())
Or without using the pipe-operator by doing
arrange(iris, -row_number())
I would reverse the rows an index starting with the number of rows, along this line
revdata <- thedata[dim(thedata)[1L]:1,]
I think this is the simplest way:
MyMatrix = matrix(1:20, ncol = 2)
MyMatrix[ nrow(MyMatrix):1, ]
If you want to reverse the columns, just do
MyMatrix[ , ncol(MyMatrix):1 ]
We can reverse the order of row.names (for data.frame only):
# create data.frame
m <- matrix(1:9, ncol=3, byrow=TRUE)
df_m <- data.frame(m)
#reverse
df_m[rev(rownames(df_m)), ]
# X1 X2 X3
# 3 7 8 9
# 2 4 5 6
# 1 1 2 3
Veeery late, but this seems to be working fast, does not need any extra packages and is simple:
for(i in 1:ncol(matrix)) {matrix[,i] = rev(matrix[,i])}
I guess that for frequent use, one would make a function out of it.
Tested with R v=3.3.1.
Encounter this problem today and here I am providing another solution for your interests.
m <- matrix(1:9, ncol=3, byrow=TRUE)
apply(m,2,rev)

R Dataframe comparison which, scaling bad

The idea is extracting the position of df charactes with a reference of other df, example:
L<-LETTERS[1:25]
A<-c(1:25)
df<-data.frame(L,A)
Compare<-c(LETTERS[sample(1:25, 25)])
df[] <- lapply(df, as.character)
for (i in 1:nrow(df)){
df[i,1]<-which(df[i,1]==Compare)
}
head(df)
L A
1 14 1
2 12 2
3 2 3
This works good but scale very bad, like all for, any ideas with apply, or dplyr?
Thanks
Just use match
Your data (use set.seed when providing data using sample)
df <- data.frame(L = LETTERS[1:25], A = 1:25)
set.seed(1)
Compare <- LETTERS[sample(1:25, 25)]
Solution
df$L <- match(df$L, Compare)
head(df)
# L A
# 1 10 1
# 2 23 2
# 3 12 3
# 4 11 4
# 5 5 5
# 6 21 6

Split dataframe into list's based on id's

Please note, I'm not a programmer by trade. I'm literature student. So please bear with me.
I would like to improve the existing working procedure. Certainly function split is one option (I'm not sure how however).
Basically, I'm trying to subdivide existing dataframe into list of sub samples so that the sequnce of id's is not splitted into second list.
Here is working example together with sample data:
df <- data.frame(id=c(rep(1,3),rep(2,2),rep(3,3),rep(4,2),5,6,7,8,9,rep(10,5)),r1=rep(1,40),r2=rep(2,40))
x <- transform(df, rec=ave(df$id,df$id, FUN=seq_along))
x$cum <- cumsum(x$rec)
x$dif <- diff(c(0,x$cum),1)
x$lab <- ifelse(x$dif!=1,0,1)
x$seq <- seq_along(x$id)
x$subs <- x$lab*x$seq
seqrow <- seq(1,nrow(x),3) # how many rows approx. per part
rw <- x$subs[x$subs %in% seqrow]
start_rw <- c(1,rw[2:length(rw)])
end_rw <- c(start_rw[2:length(start_rw)]-1,nrow(x))
df.lst <- list()
for(i in 1:length(start_rw)){
df.lst[[i]] <- x[(start_rw[i]:end_rw[i]), ]
}
In each list the id's should be also sorted increasingly and should be arranged according to id's.
Reading through your code, I would summarize your procedure as:
Compute seqrow, which is row numbers where you would be willing to split the list
Split df only at the positions in seqrow where df$id is new (hasn't appeared above); this list of positions is called start_rw in your code.
You can use duplicated to determine if df$id has appeared above or not, which enables you to grab start_rw more easily:
seqrow <- seq(1,nrow(df),3)
(start_rw <- intersect(which(!duplicated(df$id)), seqrow))
# [1] 1 4 13 16
All that remains is to split df at these positions. You can use diff to compute the number of elements in each grouping:
(groups <- rep(seq(start_rw), times=diff(c(start_rw, nrow(df)+1))))
# [1] 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
df.lst2 <- split(df, groups)
This matches the output of your code:
all.equal(unname(df.lst2), lapply(df.lst, function(x) x[,1:3]))
# [1] TRUE

Counting unique values across a row

I want to check that columns are consistent for each ID number (they're supposed to be constants, but there may be some doubt in the data, so I want to double check)
For example, given the following data frame:
test <- data.frame(ID = c("one","two","three"),
a = c(1,1,1),
b = c(1,1,1),
t = c(NA,1,1),
d = c(2,4,1))
I want to check that columns a,b,c and d are all the same, disregarding missing values. I thought I could do this by counting the unique values in the relevant columns, so then I can select only the rows where the number of unique values is more than 1... I imagine this is likely not the best way of doing that, but it was the only way I could think with my limited knowledge.
I found this question here, which seems to be similar to what I want to do:
Find unique values across a row of a data frame
But I am struggling to apply the answers to my data. I have tried this, which didn't do anything (but I've never used a for-loop before, so I've probably done that wrong), although when I run the inside of the function on it's own for a single row it does exactly what I hope for:
yeartest <- function(x){
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
for(i in 1:nrow(test)){
yeartest(i)
}
Then I tried from the accepted answer, to apply that:
x <- test
# dups <- function(x) x[!duplicated(x)]
yeartest <- function(x){
# x <- 1
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
new.df <- t(apply(x, 1, function(x) yeartest(x)))
Which gives an error and so it is pretty obvious that I have made a mistake in my translation of the answer to my data.
Apologies, this must be a really obvious failing on my part, I am very grateful for any help.
Solution: (thank you for the help!)
test$new <- apply(test[,2:5],1,function(r) length(unique(na.omit(r))))
> df <- data.frame(
a=sample(2,10,replace=TRUE),
b=sample(2,10,replace=TRUE),
c=sample(c("a","b"),10,replace=TRUE),
d=sample(c("a","b"),10,replace=TRUE))
> df[c(3,6,8),1] <- NA
> df
a b c d
1 1 2 a b
2 1 2 a b
3 NA 2 a a
4 2 2 a b
5 1 2 a a
6 NA 1 a b
7 2 1 b b
8 NA 1 a a
9 1 1 b b
10 2 2 b b
> apply(df,1,function(r) length(unique(na.omit(r))))
[1] 3 3 2 4 3 2 4 2 3 3

Change row order in a matrix/dataframe

I need to change/invert rows in my data frame, not transposing the data but moving the bottom row to the top and so on. If the data frame was:
1 2 3
4 5 6
7 8 9
I need to convert to
7 8 9
4 5 6
1 2 3
I've read about sort() but I don't think it is what I need or I'm not able to find the way.
There probably are more elegant ways, but this works:
m <- matrix(1:9, ncol=3, byrow=TRUE)
# m[rev(seq_len(nrow(m))), ] # Initial answer
m[nrow(m):1, ]
[,1] [,2] [,3]
[1,] 7 8 9
[2,] 4 5 6
[3,] 1 2 3
This works because you are indexing the matrix with a reversed sequence of integers as the row index. nrow(m):1 results in 3 2 1.
You can reverse the order of a data.frame using the dplyr package:
iris %>% arrange(-row_number())
Or without using the pipe-operator by doing
arrange(iris, -row_number())
I would reverse the rows an index starting with the number of rows, along this line
revdata <- thedata[dim(thedata)[1L]:1,]
I think this is the simplest way:
MyMatrix = matrix(1:20, ncol = 2)
MyMatrix[ nrow(MyMatrix):1, ]
If you want to reverse the columns, just do
MyMatrix[ , ncol(MyMatrix):1 ]
We can reverse the order of row.names (for data.frame only):
# create data.frame
m <- matrix(1:9, ncol=3, byrow=TRUE)
df_m <- data.frame(m)
#reverse
df_m[rev(rownames(df_m)), ]
# X1 X2 X3
# 3 7 8 9
# 2 4 5 6
# 1 1 2 3
Veeery late, but this seems to be working fast, does not need any extra packages and is simple:
for(i in 1:ncol(matrix)) {matrix[,i] = rev(matrix[,i])}
I guess that for frequent use, one would make a function out of it.
Tested with R v=3.3.1.
Encounter this problem today and here I am providing another solution for your interests.
m <- matrix(1:9, ncol=3, byrow=TRUE)
apply(m,2,rev)

Resources