I am trying to rename columns but I do not know if that column will be present in the dataset. I have a large data set and if a certain column name is present I want to rename it. For example:
A B C D E
1 4 5 9 2
3 5 6 9 1
4 4 4 9 1
newNames <- data %>% rename(1=A,2=B,3=C,4=D,5=E)
This works to rename what is in the dataset but I am looking for the flexibility to add more potential name changes, without an error occurring.
newNames2 <- data %>% rename(1=A,2=B,3=C,4=D,5=E,6=F,7=G)
This ^ will not work it give me an error because F and G are not in the data set.
Is there any way to write a code to ignore the column change if the name does not exist?
Thanks!
There can be plenty of ways to do this. One would be to create a named vector with the names and their corresponding 'new name' (as the vector's names) and use that, i.e.
#The below vector v1, uses LETTERS as old names and 1:7 as the new ones
v1 <- setNames(LETTERS[1:7], 1:7)
names(df) <- names(v1)[v1 %in% names(df)]
Related
I am new to R and I am having some trouble.
I am trying to replace some specific values of a data frame according to different values from another df.
For example:
I have this two dataframes:
a <- data.frame(c('a','b','c','d'), c('g','e','p','d'))
1 a g
2 b e
3 c p
4 d d
b <- data.frame(c('a','c'))
1 a
2 c
I want to find out which items that are on df a are also on df b and assign the value of the next column, in this case 'g' and 'p'. I tried with the match function but it has a problem if there are many items with the same name that need to be changed. I would really want an option to do this without checking 1 by 1 with a loop.
Given the below dataframe
df <- data.frame(cbind(seq(1:4),rep(letters[seq(1:3)],4)))
X1 X2
1 a
2 b
3 c
4 a
1 b
2 c
3 a
4 b
1 c
2 a
3 b
4 c
I would like to summarize unique X2s by X1. For example,
1 a,b,c
2 b,c,a
3 c,a,b
4 a,b,c
I am very close. I use the following code:
'summary <- aggregate(df$X2, list(df$X1),FUN=unique)`
which produces
Group.1 X
1 1,2,3
2 2,3,1
3 3,1,2
4 1,2,3
(the index of the list). What is the most efficient way to get my desired result?
I am certain there is an easy solution and I've tried searching, but I must not be using the correct search terms. Thank you in advanced.
We can use toString to paste the elements
aggregate(X2~X1, unique(df), toString )
Or if we need to keep it as list
aggregate(X2~X1, transform(unique(df), X2 = as.character(X2)), list)
As the OP also mentioned the efficient approach
library(data.table)
unique(setDT(df))[, .(X2 = toString(X2)), by = X1]
Regarding the creation of data.frame, it is easier, compact and error-free way to do without using cbind with data.frame. The main reason is that cbind converts to a matrix and matrix can have only a single class. So, if there is a single character column or elements, all the elements are converted to character. With as.data.frame, by default the stringsAsFactors=TRUE, so the columns are converted to factor class.
df <- data.frame(X1= 1:4, X2= rep(letters[1:3],4), stringsAsFactors= FALSE)
The above code gets the intended output. Note that seq is not needed when we use :
I am trying to a simple task, and created a simple example. I would like to add the counts of a taxon recorded in a vector ('introduced',below) to the counts already measured in another vector ('existing'), according to the taxon name. However, when there is a new taxon (present in introduced by not in existing), I would like this taxon and its count to be added as a new entry in the matrix (doesn't matter what order, but name needs to be retained).
For example:
existing<-c(3,4,5,6)
names(existing)<-c("Tax1","Tax2","Tax3","Tax4")
introduced<-c(2,2)
names(introduced)<-c("Tax1","Tax5")
I want new matrix, called "combined" here, to look like this:
#names(combined)= c("Tax1","Tax2","Tax3","Tax4","Tax5")
#combined= c(5,4,5,6,2)
The main thing to see is that "Tax1"'s values are combined (3+2=5), "Tax5" (2) is added on to the end
I have looked around but previous answers similar to this have much more complex data and it is difficult to extract which function I need. I have been trying combinations of match and which, but just cannot get it right.
grp <- c(existing,introduced)
tapply(grp,names(grp),sum)
#Tax1 Tax2 Tax3 Tax4 Tax5
# 5 4 5 6 2
Instead of keeping your data in 'loose' vectors, you may consider collecting them in one data frame. First, put you two sets of vector data in data frames:
existing <- c(3, 4, 5, 6)
taxon <- c("Tax1", "Tax2", "Tax3", "Tax4")
df1 <- data.frame(existing, taxon)
introduced <- c(2, 2)
taxon <- c("Tax1", "Tax5")
df2 <- data.frame(introduced, taxon)
Then merge the two data frames by the common column, 'taxon'. Set all = TRUE to include all rows from both data frames:
df3 <- merge(df1, df2, all = TRUE)
Finally, sum 'existing' and 'introduced' taxon, and add the result to the data frame:
df3$combined <- rowSums(df3[ , c("existing", "introduced")], na.rm = TRUE)
df3
# taxon existing introduced combined
# 1 Tax1 3 2 5
# 2 Tax2 4 NA 4
# 3 Tax3 5 NA 5
# 4 Tax4 6 NA 6
# 5 Tax5 NA 2 2
I am reading in parameter estimates from some results files that I would like to compare side by side in a table. But I cant get the dataframe to the structure that I want to have (Parameter name, Values(file1), Values(file2))
When I read in the files I get a wide dataframe with each parameter in a separate column that I would like to transform to "long" format using melt. But that gives only one column with values. Any idea on how to get several value columns without using a for loop?
paraA <- c(1,2)
paraB <- c(6,8)
paraC <- c(11,9)
Source <- c("File1","File2")
parameters <- data.frame(paraA,paraB,paraC,Source)
wrong_table <- melt(parameters, by="Source")
You can use melt in combination with cast to get what you want. This is in fact the intended pattern of use, which is why the functions have the names they do:
m<-melt(parameters)
dcast(m,variable~Source)
# variable File1 File2
# 1 paraA 1 2
# 2 paraB 6 8
# 3 paraC 11 9
Converting #alexis's comment to an answer, transpose (t()) pretty much does what you want:
setNames(data.frame(t(parameters[1:3])), parameters[, "Source"])
# File1 File2
# paraA 1 2
# paraB 6 8
# paraC 11 9
I've used setNames above to conveniently rename the resulting data.frame in one step.
I have done lot of googling but I didn't find satisfactory solution to my problem.
Say we have data file as:
Tag v1 v2 v3
A 1 2 3
B 1 2 2
C 5 6 1
A 9 2 7
C 1 0 1
The first line is header. The first column is Group id (the data have 3 groups A, B, C) while other column are values.
I want to read this file in R so that I can apply different functions on the data.
For example I tried to read the file and tried to get column mean
dt<-read.table(file_name,head=T) #gives warnings
apply(dt,2,mean) #gives NA NA NA
I want to read this file and want to get column mean. Then I want to separate the data in 3 groups (according to Tag A,B,C) and want to calculate mean(column wise) for each group. Any help
apply(dt,2,mean) doesn't work because apply coerces the first argument to an array via as.matrix (as is stated in the first paragraph of the Details section of ?apply). Since the first column is character, all elements in the coerced matrix object will be character.
Try this instead:
sapply(dt,mean) # works because data.frames are lists
To calculate column means by groups:
# using base functions
grpMeans1 <- t(sapply(split(dt[,c("v1","v2","v3")], dt[,"Tag"]), colMeans))
# using plyr
library(plyr)
grpMeans2 <- ddply(dt, "Tag", function(x) colMeans(x[,c("v1","v2","v3")]))