R replacement has rows data has rows - r

I have a dataframe df w/ a column id, and a list file_list. I want to create a new column based on the position of a list using the value in the id column
df$new <- ""
for (i in df$id) {
df$new[i] <- file_list[as.numeric(df$id[i])]
}
I am getting an error similar to "replacement has x rows, data has y rows". I searched and found the below, and initialized my new column, but am still getting the error message.
R Error - replacement has [x] rows, data has [y]
Alternatively, if I can simply replace the id column with the new value from file_list that would work as well.
I'm sure I'm missing something simple, it has been a few years since I touched R. Thanks in advance for pointers.

Related

R how to create a dataframe by adding columns

I am very very new to R....I have been using Python and MATLAB my whole life.
So here is what I would like to do. During each loop, I compute a column that I would like to add on to a dataframe.
Problem is that I do not know the length of the column. So I cannot create the dataframe to a specific length. So I keep getting an error when I try to add the column to the empty original empty dataframe...
# extract the data where the column 7 has no data.
df_glm <- data.frame(matrix(ncol = 11, nrow = 0))
for (j in 1:ncol(data_cancer)){
col_ele <- data_cancer[,j]
col_filtered <- col_ele[col_bool7]
# make new dataframe by concetenating the filtered column.
df_glm[,i] <- col_filtered
}
data_cancer_filter <- data_cancer[,col_bool7]
How can I resolve this issue?
I am getting an error at df_glm[,i] because the column is as long as col_bool7. But I want to learn how to do this without creating dataframe of exact size beforehand.
If I am understanding this correctly, you're looping through columns and taking the rows where col_bool7 is TRUE and putting it in another dataframe. dplyr filter() would be an efficient solution:
library(dplyr)
df_glm = data_cancer %>%
filter(col_bool7)

Merging Dataframes based on values

Apologies if I lack enough info in the question, first time posting here
I have two data frames, one with 12,000(GPS) second with 196 (Details).
The GPS dataframe has repeated values for a "names" column.
The Details datafame has a "position" column and a name column with a different position value for each name.
I need the GPS df to have a column "position" which pulls from Details$position but repeats each time a name is shown
I tried to do this by creating a list of the names and then using a combination of setDT & setDF using a line of code given to me by someone trying something similar:
Weigh_in_check <- setDF(setDT(Weigh_in_check)[setDT(Weight_first),
Weight_initial := Weight_first$Weight, on=c("Name")])
however I cannot change it around for it to work for me with as follows
Name_check <- setDF(setDT(Name_check)[setDT(GPSReview2), Position :=
PlayerDetails$Position, on=c("Player Name")])
New code following comment by Flo.P
GPSReview4[,"Position"] <- NA
GPSReview4$Position <- as.character(GPSReview4$Position)
GPSReview4$Position <- left_join(GPSReview4, PlayerDetails, by ="Position" )
Which gives following error
Error in $<-.data.frame(*tmp*, Position, value = list(Full session = c("Yes", :
replacement has 132235 rows, data has 26447
**EDIT:
These are the 2 dataframes
GPS Review4
Detail

'row.names' is not a character vector of length

I am simply trying to create a dataframe.
I read in data by doing:
>example <- read.csv(choose.files(), header=TRUE, sep=";")
The data contains 2 columns with 8736 rows plus a header.
I then simply want to combine this with the column of a dataframe with the same amount of rows (!) by doing:
>data_frame <- as.data.frame(example$x, example$y, otherdata$z)
It produces the following error
Warning message:
In as.data.frame.numeric(example$x, example$y, otherdata$z) :
'row.names' is not a character vector of length 8736 -- omitting it. Will be an error!
I have never had this problem before. It seems so easy to tackle but I cant help myself at the moment.
Overview
As long as the nrow(example) equals length(otherdata$z), use cbind.data.frame to combine columns into one data frame. An advantage with cbind.data.frame() is that there is no need to call the individual columns within example when binding them with otherdata$z.
# create a new data frame that adds the 'z' field from another source
df_example <- cbind.data.frame(example, otherdata$z)

integer function converting row names in to numbers

enter image description here
I used to this
mydata3 <- data.frame(sapply(mydata2, as.integer))
But now I see that row names which is gene names, has been converted to number like 1-200). But I should point that same command I used sometime ago when it was working well. So I thought there are some problems with my file then i used old file on which this command was working but i am seeing same problem like gene name is converted in to number here is full script:
countsTable<-read.table("JW.txt",header=TRUE,stringsAsFactors=TRUE,row.names=1)
mydata2 <- countsTable/1000
mydata3 <- data.frame(sapply(mydata2, as.integer))
str(mydata3)
Please let me know.
sapply works over columns of your data.frame mydata2, and returns respective output per column. as such, it does not return the row-names of your data.frame, so you either have to re-assign those, or re-assign the new column data into your original data.frame, like:
mydata2[] <- sapply(mydata2, as.integer)
Thus you can keep all of the original attributes.

Trying to predict in R

I created a data set using a random row generator:
training_data <- fulldata[sample(nrow(fulldata),100,]
I am under the impression that I can create a second data set of the rest of the data ... rest_data <- fulldata[-training_data] is the code I jotted down in my notes but I am getting
"Error in '[.default'(fulldata, -training_data) :
What part of my code is incorrect?
assuming that fulldatais a dataframe you need a comma in the subscript to indicate that you want the rows of the data frame (i.e. fulldata[rows,columns]). But the indices of the new dataframe training_data will be numbered 1:100so you need a different sort of indicator that corresponds between training_dataand fulldata to show which rows of fulldata should not be included. What you might do is use the rownames, something like:
rest_data<-fulldata[-which(rownames(fulldata)%in%rownames(training_data)),]
which should tell R to remove the rownames of fulldata that occur in training_data. If you have something like an ID variable that is unique to each row you could also use this
rest_data<-fulldata[-which(fulldata$ID%in%training_data$ID),]

Resources