I'm getting the following error in R:
argument lengths differ.
I have a data set I would like to order on two columns, first on caseID, then on a column that contains a timestamp. I use the following code:
mydata <- mydata[order(mydata[ ,col1], mydata[ ,col2], decreasing = FALSE),]
Col1 and col2 are two variables holding an integer. I have looked at similar questions and tried the solutions that were proposed there, but nothing worked ;).
Could someone please help me?
Kind regards
R thinks that you 2 columns have different lengths, sometimes that happens when you accidentally access a column that does not exist, check the values of col1 and col2 to make sure that they are appropriate numbers. Also look at length(mydata[,col1]) and length(mydata[,col2]) to see if those 2 values match. Also check for missing , or other punctuation, sometimes if you don't have the syntax exactly right then you get a list of length 1, or a single element vector which does not match the other vector in length.
I was having this same problem, but was able to get my code working. Try this code.
with(mydata, mydata[order(col1,col2),]).
The result is decreasing, so adding function decreasing = False was not necessary. Hope that helps.
Probably it's nice to check this similar post out, uses dplyr package to solve it and it helped me: Arrange within a group with dplyr
This might do the trick:
library(dplyr)
mydata <- mydata %>%
arrange(
col1,
col2,
desc(col3)
)
Related
I've got a big data frame, and like to remove the duplicate column
For simplicity, let's pretend this is my data:
df <- data.frame(id1 = c("Aa","Aa","Ba","Ca","Da"), id2 = c(2,1,4,5,10), location=c(351,261,101,91,51), comment=c(35,26,10,9,5), comment=c(5,16,25,14,11), hight=c(15,21,5,19,18), check.names = FALSE)
I can remove the duplicate column name "comment" using:
df <- df[!duplicated(colnames(df))]
However, when I apply same code in my real dataframe it returns an error:
Error in `[.data.table`(SNV_wild, !duplicated(colnames(SNV_wild))) :
i evaluates to a logical vector length 1883 but there are 60483 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
Sorry, I can't post real data since it is quite large which you can see in error.
How can I troubleshoot this - I have gone through all columns names and there are duplicate column name.
Thank you in advance
Your real dataframe is of class data.table, while your small example is not. You can try:
df[,!duplicated(colnames(df)), with=F]
I am trying to take df1 (a summary table), and merge it into df2 (master summary table).
This is a snapshot of df2, ignore the random 42, just the answer to the ultimate question.
This is an example of what df1, looks like.
Lastly, I have a vector called Dates. This matches the dates that are the column names for df2.
I am trying to cycle through 20 file, and gather the summary statistics of that file. I then want to enter that data into df2 to be stored permanently. I only need to enter the Earned column.
I have tried to use merge but since they do not have shared column names, I am unable to.
My next attempt was to try this. But it gave an error, because of unequal row numbers.
df2[,paste(Dates[i])] <- cbind(df2,df1)
Then I thought that maybe if I specified the exact location, it might work.
df2[1:length(df1$Earned),Dates[i]] <- df1$Earned
But that gave and error "New columns would leave holes after existing columns"
So then I thought of trying that again, but with cbind.
df2[1:length(df1$Earned),Dates[i]] <- cbind(df2, df1$Earned)
##This gave an error for differing row numbers
df2 <- cbind(df2[1:length(df1$Earned),Dates[i]],df1$earned)
## This "worked" but it replaced all of df2 with df1$earned, so I basically lost the rest of the master table
Any ideas would be greatly appreciated. Thank you.
Something like this might work:
df1[df1$TreatyYear %in% df2$TreatyYear, Dates] <- df2$Earned
Example
df <- data.frame(matrix(NA,4,4))
df$X1 <- 1:4
df[df$X1 %in% c(1,2),c("X3","X4")] <- c(1,2)
The only solution that I have found so far is to force df1$Earned into a vector. Then append the vector to be the exact length of the df2. Then I am able to insert the values into df2 by the specific column.
temp_values <- append(df1$Earned,rep(0,(length(df2$TreatyYear)-length(df1$TreatyYear))),after=length(df1$Earned))
df2[,paste(Dates[i])] <- temp_values
This is kind of a roundabout way to fix it, but not a very pleasant way. Any better ideas would be appreciated.
I have a problem concerning sapply in R:
I hav a dataframe Test_ALL that I split by (at the moment) one column named activity. The dataframe has somewhat 20 columns with extra long names ( e.g. fBodyBodyGyroJerkMag-std()) that I don`t want to write down explicitely. From this dataframe I want to get a mean for each column. I tried this and it worked for 1 named column.
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x [c("fBodyBodyGyroJerkMag-std()")]))
but when I tried to get a mean for more than 1 column it didn`t work.
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x [c("fBodyBodyGyroJerkMag-std()","fBodyAccMag-std()")]))
I tried this too, but also no success
namesERG<-names(Test_ALL)
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x[c(namesERG)]))
What am I doing wrong?
Thak you!
Without a reproducible example is difficult to completely understand your problem. Anyway I think that a part of the issue is related to the fact that you have some non numeric columns. I think that somenthing like that could be a solution
library(dplyr)
aa <- split(Test_ALL, Test_ALL$activity)
y <- sapply(aa, function(x) colMeans(select_if(x, is.numeric)))
Suppose I've got a data frame called someMatrix. Now in this matrix I want to replace only the first three rows of the 4 column.
I came up with this idea.
(someMatrix[,4])[1:3] <- replacement
but I get following error: could not find function "(<-"
Any idea how I could solve this?
Thanks!
You may subset with brackets as many times you want, without bothering with parentheses:
a <- cbind(rnorm(10), rnorm(10))
a[1:5, ][2:3, ][, 2][1]
I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
row.names class
564028 1
275747 1
601137 0
922930 1
481988 1
...
The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.
Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.
Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.
The documentation implores me to:
use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.
but this leaves me with nothing but NULL.
My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?
None of the other solutions would actually work.
It should be:
# Assuming the data frame is called df
df[ order(as.numeric(row.names(df))), ]
because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.
This worked for me:
new_df <- df[ order(row.names(df)), ]
If you have only one column in your dataframe like in my case you have to add drop=F:
df[ order(rownames(df)) , ,drop=F]
For completeness:
#BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in #BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":
ordered.df <- df[order(df$row.names),] #this orders the df by the "row.names" column
Alternatively, to order by the first column (same thing if you're still using your example):
ordered.df <- df[order(df[,1]),] #this orders the df by the first column
Hope this is helpful!
This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():
df[ rownames(df) , ]
You might have thought it would be necessary to use:
df[ order(rownames(df)) , ]
But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.
Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:
>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])
new_df <- df[ order(row.names(df)), ]
or something similar won't work. After this statement, the new_df does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname
you can simply sort your df by using this :
df <- df[sort(rownames(df)),]
and then do what you want !