Renaming Column Names in R - r

I would like to rename my columns from my data frame which have duplicated column names where my columns are a,b and c.
df>
a b c a b c
1 6 11 1 4 4
2 7 12 2 8 12
3 8 13 3 7 7
4 9 14 5 7 11
5 10 15 44 2 13
I could change the columns name by taking out column 1:3 as df1, but is there a way to loop it if I have 1000 column names to change?
df1 <- df[,1:3]
colnames(df1) <- paste(colnames(df1), "test1" , sep = '_')

If you know that the same 3 column names repeat in the same order, you could just use rep here with the each option:
namenums <- rep(1:(ncol(df1)/3), each=3)
colnames(df1) <- paste0(colnames(df1), "_test", namenums)
df1
a_test1 b_test1 c_test1 a_test2 b_test2 c_test2
1 1 6 11 1 6 11
2 2 7 12 2 7 12
3 3 8 13 3 8 13
4 4 9 14 4 9 14
5 5 10 15 5 10 15
Data:
df <- data.frame(a=c(1:5), b=c(6:10), c=c(11:15), a=c(1:5), b=c(6:10), c=c(11:15))
names(df) <- c("a", "b", "c", "a", "b", "c")

We can use make.unique
names(df) <- make.unique(names(df))

Related

Subtracting 1 column from multiple columns

df <- data.frame(a=1:3, b=4:6, c=7:9, d=10:12, e=13:15)
a b c d e
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
Is it possible to subtract 'column a' from all of the other columns without doing each calculation individually?
I have a dataset of 1001 columns and would like to know if it is possible to do so without doing 1000 calculations manually.
Many Thanks
Try this:
#Data
df <- data.frame(a=1:3, b=4:6, c=7:9, d=10:12, e=13:15)
#Isolate
df1 <- df[,1,drop=F]
#Substract
dfr <- cbind(df1,as.data.frame(apply(df[,-1],2,function(x) x-df1)))
names(dfr)<-names(df)
a b c d e
1 1 3 6 9 12
2 2 3 6 9 12
3 3 3 6 9 12

From one vector delete all elements of another vector in r [duplicate]

This question already has answers here:
R: Remove the number of occurrences of values in one vector from another vector, but not all
(2 answers)
Closed 6 years ago.
I have 2 vectors
vec_1
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9
[35] 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14
vec_2
[1] 12 3 13 3 14 4 10 8 9 5 7 5 13 11 6 10 8 8 14 12 6 11 8 5 3 6
I want to delete all elements of vec_2 from vec_1
And sure, that function setdiff is not the case,because, for example, in vec_2 there are two 10s values. And I want to delete only to 10(not all four values of 10).
EDITED: expected output:
vec_1
[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
How can i do this in r?
Here is one idea via union
unlist(sapply(union(vec_1, vec_2), function(i)
rep(i, each = length(vec_1[vec_1 == i]) - length(vec_2[vec_2 == i]))))
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
Definitely, not the best solution but here is one way.
I created a simplified example.
vec1 <- c(1, 2, 3, 1, 1, 5)
vec2 <- c(1, 3, 5)
#Converting the frequency table to a data frame
x1 <- data.frame(table(vec1))
x2 <- data.frame(table(vec2))
#Assuming your vec1 has all the elements present in vec2
new_df <- merge(x1, x2, by.x = "vec1", by.y = "vec2", all.x = TRUE)
new_df
# vec1 Freq.x Freq.y
#1 1 3 1
#2 2 1 NA
#3 3 1 1
#4 5 1 1
#Replacing NA's by 0
new_df[is.na(new_df)] <- 0
#Subtracting the frequencies of common elements in two vectors
final <- cbind(new_df[1], new_df[2] - new_df[3])
final
# vec1 Freq.x
#1 1 2
#2 2 1
#3 3 0
#4 5 0
#Recreating a new vector based on the final dataframe
rep(final$vec1, times = final$Freq.x)
# [1] 1 1 2
You can do this using a simple for loop:
for(i in 1:length(vec2)){
i=which(vec1 %in% vec2[i])[1]
vec1=vec1[-i]
}
You just identify the first position and remove from the original vector.
You can try this too:
for (el in vec2[vec2 %in% intersect(vec1, vec2)])
vec1 <- vec1[-which(vec1==el)[1]]
sort(vec1)
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14

How to delete duplicates but keep most recent data in R

I have the following two data frames:
df1 = data.frame(names=c('a','b','c','c','d'),year=c(11,12,13,14,15), Times=c(1,1,3,5,6))
df2 = data.frame(names=c('a','e','e','c','c','d'),year=c(12,12,13,15,16,16), Times=c(2,2,4,6,7,7))
I would like to know how I could merge the above df but only keeping the most recent Times depending on the year. It should look like this:
Names Year Times
a 12 2
b 12 2
c 16 7
d 16 7
e 13 4
I'm guessing that you do not mean to merge these but rather combine by stacking. Your question is ambiguous since the "duplication" could occur at the dataframe level or at the vector level. You example does not display any duplication at the dataframe level but would at the vector level. The best way to describe the problem is that you want the last (or max) Times entry within each group if names values:
> df1
names year Times
1 a 11 1
2 b 12 1
3 c 13 3
4 c 14 5
5 d 15 6
> df2
names year Times
1 a 12 2
2 e 12 2
3 e 13 4
4 c 15 6
5 c 16 7
6 d 16 7
> dfr <- rbind(df1,df2)
> dfr <-dfr[order(dfr$Times),]
> dfr[!duplicated(dfr, fromLast=TRUE) , ]
names year Times
1 a 11 1
2 b 12 1
6 a 12 2
7 e 12 2
3 c 13 3
8 e 13 4
4 c 14 5
5 d 15 6
9 c 15 6
10 c 16 7
11 d 16 7
> dfr[!duplicated(dfr$names, fromLast=TRUE) , ]
names year Times
2 b 12 1
6 a 12 2
8 e 13 4
10 c 16 7
11 d 16 7
This uses base R functions; there are also newer packages (such as plyr) that many feel make the split-apply-combine process more intuitive.
df <- rbind(df1, df2)
do.call(rbind, lapply(split(df, df$names), function(x) x[which.max(x$year), ]))
## names year Times
## a a 12 2
## b b 12 1
## c c 16 7
## d d 16 7
## e e 13 4
We could also use aggregate:
df <- rbind(df1,df2)
aggregate(cbind(df$year,df$Times)~df$names,df,max)
# df$names V1 V2
# 1 a 12 2
# 2 b 12 1
# 3 c 16 7
# 4 d 16 7
# 5 e 13 4
In case you wanted to see a data.table solution,
# load library
library(data.table)
# bind by row and convert to data.table (by reference)
df <- setDT(rbind(df1, df2))
# get the result
df[order(names, year), .SD[.N], by=.(names)]
The output is as follows:
names year Times
1: a 12 2
2: b 12 1
3: c 16 7
4: d 16 7
5: e 13 4
The final line orders the row-binded data by names and year, and then chooses the last observation (.sd[.N]) for each name.

How to Deal with Textual Data?

In R, you have a certain data frame with textual data, e.g. the second column has words instead of numbers. How can you remove the rows of the data frame with a certain word (e.g. "total") in the second column? data <- data[-(data[,2] == "total"),] does not work for me.
Besides, is there an easy way to convert these words sequentially into numbers? (I.e., first word becomes 1, second appeared word becomes 2, and so on.) I would rather not use a loop...
You can use ! to negate. For the sequence, use either seq_along or as.numeric(factor(.)) depending on what you are actually looking for.
Here's some sample data:
set.seed(1)
mydf <- data.frame(V1 = 1:15, V2 = sample(LETTERS[1:3], 15, TRUE))
mydf
# V1 V2
# 1 1 A
# 2 2 B
# 3 3 B
# 4 4 C
# 5 5 A
# 6 6 C
# 7 7 C
# 8 8 B
# 9 9 B
# 10 10 A
# 11 11 A
# 12 12 A
# 13 13 C
# 14 14 B
# 15 15 C
Let's remove any rows where there is an "A" in column "V2":
mydf2 <- mydf[!mydf$V2 == "A", ]
mydf2
# V1 V2
# 2 2 B
# 3 3 B
# 4 4 C
# 6 6 C
# 7 7 C
# 8 8 B
# 9 9 B
# 13 13 C
# 14 14 B
# 15 15 C
Now, let's create two new columns. The first sequentially counts each occurrence of each "word" in column "V2". The second converts each unique "word" into a number.
mydf2$Seq <- ave(as.character(mydf2$V2), mydf2$V2, FUN = seq_along)
mydf2$WordAsNum <- as.numeric(factor(mydf2$V2))
mydf2
# V1 V2 Seq WordAsNum
# 2 2 B 1 1
# 3 3 B 2 1
# 4 4 C 1 2
# 6 6 C 2 2
# 7 7 C 3 2
# 8 8 B 3 1
# 9 9 B 4 1
# 13 13 C 4 2
# 14 14 B 5 1
# 15 15 C 5 2

subset dataframe based on conditions in vector

I have two dataframes
#df1
type <- c("A", "B", "C")
day_start <- c(5,8,4)
day_end <- c(12,10,11)
df1 <- cbind.data.frame(type, day_start, day_end)
df1
type day_start day_end
1 A 5 12
2 B 8 10
3 C 4 11
#df2
value <- 1:10
day <- 4:13
df2 <- cbind.data.frame(day, value)
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
10 13 10
I would like to subset df2 such that each level of factor "type" in df1 gets its own dataframe, only including the rows/days between day_start and day_end of this factor level.
Desired outcome for "A" would be..
list_of_dataframes$df_A
day value
1 5 2
2 6 3
3 7 4
4 8 5
5 9 6
6 10 7
7 11 8
8 12 9
I found this question on SO with the answer suggesting to use mapply(), however, I just cannot figure out how I have to adapt the code given there to fit my data and desired outcome.. Can someone help me out?
The following solution assumes that you have all integer values for days, but if that assumption is plausible, it's an easy one-liner:
> apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],])
[[1]]
day value
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
[[2]]
day value
5 8 5
6 9 6
7 10 7
[[3]]
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
You can use setNames to name the dataframes in the list:
setNames(apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],]),df1[,1])
Yes, you can use mapply:
Define a function that will do what you want:
fun <- function(x,y) df2[df2$day >= x & df2$day <= y,]
Then use mapply to apply this function with every element of day_start and day_end:
final.output <- mapply(fun,df1$day_start, df1$day_end, SIMPLIFY=FALSE)
This will give you a list with the outputs you want:
final.output
[[1]]
day value
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
[[2]]
day value
5 8 5
6 9 6
7 10 7
[[3]]
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
You can name each data.frameof the list with setNames:
final.output <- setNames(final.output,df1$type)
Or you can also put an attribute type on the data.frames of the list:
fun <- function(x,y, type){
df <- df2[df2$day >= x & df2$day <= y,]
attr(df, "type") <- as.character(type)
df
}
Then each data.frame of final.output will have an attribute so you know which type it is:
final.output <- mapply(fun,df1$day_start, df1$day_end,df1$type, SIMPLIFY=FALSE)
# check wich type the first data.frame is
attr(final.output[[1]], "type")
[1] "A"
Finally, if you do not want a list with the 3 data.frames you can create a function that assigns the 3 data.frames to the global environment:
fun <- function(x,y, type){
df <- df2[df2$day >= x & df2$day <= y,]
name <- as.character(type)
assign(name, df, pos=.GlobalEnv)
}
mapply(fun,df1$day_start, df1$day_end, type=df1$type, SIMPLIFY=FALSE)
This will create 3 separate data.frames in the global environment named A, B and C.

Resources