There are other issues here addressing the same question, but I don't realize how to solve my problem based on it. So, I have 5 data frames that I want to merge rows in one unique data frame using rbind, but it returns the error:
"Error in row.names<-.data.frame(*tmp*, value = value) :
'row.names' duplicated not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘100’, ‘1000’, ‘10000’, ‘100000’, ‘1000000’, ‘1000001 [....]"
The data frames have the same columns but different number of rows. I thought the rbind command took the first column as row.names. So tried to put a sequential id in the five data frames but it doesn't work. I've tried to specify a sequential row names among the data frames via row.names() but with no success too. The merge command is not an option I think because are 5 data frames and successive merges will overwrite precedents. I've created a new data frame only with ids and tried to join but the resulting data frame don't append the columns of joined df.
Follows an extract of df 1:
id image power value pol class
1 1 tsx_sm_hh 0.1834515 -7.364787 hh FR
2 2 tsx_sm_hh 0.1834515 -7.364787 hh FR
3 3 tsx_sm_hh 0.1991938 -7.007242 hh FR
4 4 tsx_sm_hh 0.1991938 -7.007242 hh FR
5 5 tsx_sm_hh 0.2079365 -6.820693 hh FR
6 6 tsx_sm_hh 0.2079365 -6.820693 hh FR
[...]
1802124 1802124 tsx_sm_hh 0.1991938 -7.007242 hh FR
The four other df's are the same structure, except the 'id' columns that don't have duplicated numbers among it. 'pol' and 'image' columns are defined as levels.
and all.pol <- rbind(df1,df2,df3,df4,df5) return the this error of row.names duplicated.
Any idea?
Thanks in advance
I had the same error recently. What turned out to be the problem in my case was one of the attributes of the data frame was a list. After casting it to basic object (e.g. numeric) rbind worked just fine.
By the way row name is the "row numbers" to the left of the first variable. In your example, it is 1, 2, 3, ... (the same as your id variable).
You can see it using rownames(df) and set it using rownames(df) <- name_vector (name_vector must have the same length as df and its elements must be unique).
I had the same error.
My problem was that one of the columns in the dataframes was itself a dataframe. and I couldn't easily find the offending column
data.table::rbindlist() helped to locate it
library(data.table)
rbindlist(a)
# Error in rbindlist(a) :
# Column 25 of item 1 is length 2 inconsistent with column 1 which is length 16. Only length-1 columns are recycled.
a[[1]][, 25] %>% class # "data.frame" K- this should obviously be converted to a column or removed
After removing the errant columndo.call(rbind, a) worked as expected
Related
Basically I have 2 tables with the same column names and want to do calculations across tables. Ideally, I would have taken data from the two tables and created a third, but I could only find a way to do that if the data tables are the same dimensions because it would be by cell position. Instead, I'd like to do it by column name after having done a join so that I know that the calculations are taking from the correct values.
I am trying to loop through column names to do calculations between various associated columns in the same data table (I have 2 lists of column names, that I am using to call columns from a table where I've joined the two tables. I've adjusted the column name list to add the "_A" and "_B" which were added during the join as the columns had the same names). I'm trying to call the column names using [[i]] (in this case I am using [[1]] to test it).
Does anyone know why I can't call the column name in the name$colname format? If I replace the variable with the name, it works, and if I take just the variable (colnameslistInf[[1]]) it shows the right column name, but once I put it together it says "Unknown or uninitialised column".
> joininfsup$colnameslistInf[[1]]
NULL
Warning message:
Unknown or uninitialised column: `colnameslistInf`.
> colnameslistInf[[1]]
[1] "newName.x"
> joininfsup$newName.x
[1] 5 5 5 5 5 5 5 5 5 5 5
[12] 5 5 5 5 5 5 5 5 5 5 5
[23] 5 5 5 5 5 5 5 5 5 5 5
[34] 5 5 5 5 5 5 5 5 5 5 5
[45] 5 5 5 5 5 5 5 5 5 5 5
I am also getting this error:
Error in `[[<-.data.frame`(`*tmp*`, col, value = integer(0)) :
replacement has 0 rows, data has 264
The code I am trying to run is here. joininfsup is the joined table, and I use mutate to create new columns with the calculations across each of the 200+ columns and its associated column.
joined_day_inf_numeric <-select_if(joined_day_inf, is.numeric) joined_day_sup_numeric<-select_if(joined_day_sup, is.numeric) joininfsup<- left_join(joined_day_inf_numeric, joined_day_sup_numeric, "JOININF", suffix = c("_A", "_B"))
#take colnames from original tables and add _A and _B as those are added during the join
colnameslistInf <- paste0(colnames(joined_day_inf_numeric), "_A")
colnameslistSup <- paste0(colnames(joined_day_sup_numeric), "_B")
for (i in 1:length(colnameslistInf)) { #245 cols, for example
name <- paste0(colnames(joined_day_inf_numeric)[[i]]) #names of new columns as loops through
joininfsup2 <-joininfsup %>%
mutate(!!name := ((joininfsup[[ colnameslistInf[[i]] ]])-joininfsup[[colnameslistSup[[i]] ]]))*joininfsup$proportion_A+joininfsup[[ colnameslistInf[[i]] ]]
write_csv(joininfsup2, paste0("test/finalcalc.csv"))
}
I think this might be the key but am having trouble applying it: Use dynamic name for new column/variable in `dplyr`
UPDATE: I replaced name in the mutate function with !!name := and the code ran! But gave me the same output as the original joined table because I'm still getting the "Unknown or uninitialised column: colnameslistInf." warning.
UPDATE2: added missing join code, needed to save variable in for loop, added [[]] acording to #Parfait 's suggestion-- but the code still does not work (does not add any new columns).
UPDATE3:
I tried #Parfait's common_columns method but got an error:
Error: Can't subset columns that don't exist. x Columns 8_, 50_, 51_, 55_, 78_, etc. don't exist.
These columns were removed at the is.numeric step so not sure why it is pulling from the original dataset. Also, using match deletes a bunch of other columns that have characters as names
In R, when referencing names with the $ operator, identifiers are interpreted literally requiring a column named "colnameslistInf[[1]]" (but even this will fail without backticks). However, the extract operator, [[, can interpret dynamic variables:
joininfsup[[ colnameslistInf[[1]] ]]
Additionally, mutate also takes identifiers literally. Hence, in each iteration of loop, you are assigning and re-assigning to a variable named, name. But you resolved it with the double bang operator, !!.
However, consider avoiding the loop by columns and calculate your formula on block of columns in matrix-style arithmetic. Specifically, adjust the default suffix in dplyr::inner_join (or suffixes argument in base::merge) and then reassign non-underscored columns, finally remove underscored columns. Below assumes your join operation. Adjust type of join and by arguments as needed.
joined_day_inf_numeric <- select_if(joined_day_inf, is.numeric)
joined_day_sup_numeric <- select_if(joined_day_sup, is.numeric)
common_columns <- intersect(
colnames(joined_day_inf_numeric), colnames(joined_day_sup_numeric)
)
common_columns <- common_columns[common_columns != "JOININF"]
joininfsup <- left_join(
joined_day_inf_numeric, joined_day_sup_numeric, by = "JOININF", suffix = c("", "_")
)
# ASSIGN NON-UNDERSCORED COLUMNS
joininfsup[common_columns] <- (
(
joininfsup[common_columns] - joininfsup[paste0(common_columns, "_")]
) *
joininfsup$proportion + joininfsup[common_columns]
)
# REMOVE UNDERSCORED COLUMNS
joininfsup[paste0(common_columns, "_")] <- NULL
write_csv(joininfsup, paste0("test/finalcalc.csv"))
Im new to R and I have a very difficult task want to complete.
I have two set of data frame. DF1 consists of 810 observations with 4 variables, DF2 consists of 1707 observations with 51 variables.
Here is some example of
DF1:
Chr POS Range_Plus_10 Range_Minus_10
2 47403201 47403211 47403191
2 47403202 47403212 47403192
2 47403210 47403220 47403200
2 47403210 47403220 47403200
2 47403210 47403220 47403200
2 47403211 47403221 47403201
DF2:
Chromosome Position
2 47630258
2 47630263
2 47630263
2 47630269
2 47630271
2 47630275
Note: not all variables are shown for df2, I am not interested in other variables, but it would be good to keep other variables in the output data.
what I want is to filter through all the positions in df2 to see if any of these positions lies within the range of df1 (within the Range_Plus_10 and Range_Minus_10 for every single row).
For example, first position in df2 is 47630258 and I want to know whether this 47630258 lies within any of the range_plus_10 and Range_Minus_10 in df1 in any row, so I want R to give me an output column with all possible positions in df2 that could corresponds to every rows in df1 range.
I tried to use non equi join but I keep getting some errors and not sure where it got wrong.
Could someone provide a code to obtain the data I want, and secondly tell me why my errors occur.
here is the script I've used:
library (data.table)
result <- df2[df1, . ("Chromosome", "Position"), on = .(Position < Range_Plus_10, Position >Range_Minus_10), by = .EACHI]
But I keep getting an error message:
Error in [.data.frame(df2, df1, .("Chr", "Position", ...), on = .(Position < :
unused arguments (on = .(Position < Range_Plus_10, Position > Range_Minus_10), by = .EACHI)
Sorry for my formatting
I have data frame "data". I searched for a pattern using grep function and i would like to put result back in data frame to match rows with others.
data$CleanDim<-data$RAW_MATERIAL_DIMENSION[grep("^BAC",data$RAW_MATERIAL_DIMENSION)]
I would like to paste the result into a new column data$CleanDim but i get the following errors.... can someone please help me?
Error in `$<-.data.frame`(`*tmp*`, CleanDim, value = c(1393L, 1405L, 734L, : replacement has 2035 rows, data has 1881
grep() returns a vector of indices of entries that match the given criteria.
The only way that your code could work here is if the number of rows of data equals some even multiple of the number of matches grep() finds.
Consider the following reproducible example:
data = data.frame(RAW_MATERIAL_DIMENSION = c("BAC","bBAC","aBAC","BACK","lbd"))
> data
RAW_MATERIAL_DIMENSION
1 BAC
2 bBAC
3 aBAC
4 BACK
5 lbd
> grep("^BAC",data$RAW_MATERIAL_DIMENSION)
[1] 1 4
data$CleanDim <- data$RAW_MATERIAL_DIMENSION[grep("^BAC",data$RAW_MATERIAL_DIMENSION)]
Error in `$<-.data.frame`(`*tmp*`, CleanDim, value = 1:2) :
replacement has 2 rows, data has 5
Note: this would work out ok (though it would be pretty weird) if the original data object just had its first four rows. In that case, you'd just get repeated values populated in your new column.
But, what you want to do here is to look at the results of grep("^BAC",data$RAW_MATERIAL_DIMENSION) and think about what is going to be sensible in your context. Your operation will only work if the length of this result equals that of your data object, or at least if your data object is a whole multiple of that length.
As below, dataframe factorizedss is the factorized version of a sourcedata dataframe ss.
ss <- data.frame(c('a','b','a'), c(1,2,1)); #There are string columns and number columns.
#So, I factorized them as below.
factorizedss <- data.frame(lapply(ss, as.factor)); #factorized version
indices <- data.frame(c(1,1,2,2), c(1,1,1,2)); #Now, given integer indices
With given indices, using factorizedss, is it possible to get corresponding element of the source dataframe as below? (The purpose is to access data frame element by integer number in factor level )
a 1
a 1
b 1
b 2
You can access the first column like this
factorizedss[indices[,1],][,1]
and the second in a similar way
factorizedss[indices[,2],][,2]
It gets more difficult when trying to combine them, you might have to convert them back to native types
t(rbind(as.character(factorizedss[indices[,1],][,1]),as.numeric(factorizedss[indices[,2],][,2])))
So I have three data frames we will call them a,b,c
within each data frame there are columns called 1,2,3,4 with 54175 rows of data
Column 1 has id names that are the same in each data frame but not necessarily in the same order
Columns 2,3,4 are just numeric values
I want to pull out all the information from column 2 for a,b,c based on ID from column 1 so each values for a,b,c will correlate to the correct ID
I tried something like
m1 <- merge(A[,'2'], b[,'2'], c[,2'], by='1')
I get this error
Error in fix.by(by.x, x) : 'by' must match numbers of columns
Thank you for your help!
Couple problems:
Merge works two-at-a-time, no more.
You need to have the by column in the data.frames that are merged.
Fix these like this:
m1 <- merge(A[,c("1", "2")], B[,c("1", "2")])
m2 <- merge(m1, C[, c("1", "2")])
Then m2 should be the result you're looking for.
As an aside, it's pretty weird to use column names that are just characters of numbers. If they're in order, just use column indices (no quotes), and otherwise put something in them to indicate that they're names not numbers, e.g., R's default of "V1", "V2", "V3". Of course, the best is a meaningful name, like "id", "MeasureDescription", ...
You can either use merge two times:
merge(merge(a[1:2], b[1:2], by = "1"), c[1:2])
or Reduce with merge:
Reduce(function(...) merge(..., by = "1"), list(a[1:2], b[1:2], c[1:2]))
You have to merge them 2 at a time:
a<-data.frame(sample(1:100,100),100*runif(100),100*runif(100),100*runif(100))
colnames(a)<-1:4
b<-data.frame("C1"=sample(1:100,100),"C2"=100*runif(100),"C3"=100*runif(100),"C4"=100*runif(100))
colnames(b)<-1:4
c<-data.frame("C1"=sample(1:100,100),"C2"=100*runif(100),"C3"=100*runif(100),"C4"=100*runif(100))
colnames(c)<-1:4
f<-merge(a[,1:2],b[,1:2],by=(1))
f<-merge(f,c[,1:2],by=(1))
colnames(f)<-c(1,"A2","B2","C2")
head(f)
1 A2 B2 C2
1 1 54.63326 39.23676 28.10989
2 2 10.10024 56.08021 69.44268
3 3 45.02948 14.69028 22.44243
4 4 90.50883 33.61303 98.00917
5 5 13.80767 80.93382 77.22679
6 6 80.72241 27.22139 51.34516
I think the easiest way to answer this question is:
m1 <- merge(A[,'2'], b[,'2'], c[,2'], by='1')
should be by=(1)
m1 <- merge(A[,'2'], b[,'2'], c[,2'], by=(1))
only when you want to merge by a column name, you need single quotes, for example:
m1 <- merge(A[,'2'], b[,'2'], c[,2'], by='ID')