unexpected numeric constant in R after using cast () - r

I tried to reshape the data frame that converting the entries in one column to be the row names. Then I use cast () , but I gotta following error when I retrieved the data inside new data frame.
Here is original data frame:
ID Type rating
1 1 3.5
1 2 4.0
2 2 2.5
And the code:
r_mat <-cast(r_data,ID~type)
r_mat$1
unexpected numeric constant in r_mat$1
here is new data frame looks like:
ID 1 2
1 3.5 4.0
2 NA 2.5
Can anyone kindly help me coping with the error ?
Thanks!

You can use make.names in {base} to "Make syntactically valid names out of character vectors" as follows:
colnames(r_mat) <-
make.names(colnames(r_mat),unique=T)
For a set of columns with numeric names, this will insert an "X" character in front of each number, e.g. X1,X2...
For details on the function specification, see:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html

Related

dealing with blank/missing data with write.table in R

I have a data frame where some of the rows have blanks entries, e.g. to use a toy example
Sample Gene RS Chromosome
1 A rs1 10
2 B X
3 C rs4 Y
i.e. sample 2 has no rs#. If I attempt to save this data frame in a file using:
write.table(mydata,file="myfile",quote=FALSE,sep='\t')
and then read.table('myfile',header=TRUE,sep='\t'), I get an error stating that the number of entries in line 2 doesn't have 4 elements. If I set quote=TRUE, then a "" entry appears in the table. I'm trying to figure out a way to create a table using write.table with quote=FALSE while retaining a blank placeholder for rows with missing entries such as 2.
Is there a simple way to do this? I attempted to use the argument NA="" in write.table() but this didn't change anything.
If result of my script's data frame has NA I always replace it , One way would be to replace NA in the data frames with a some other text which tells you that this entry was NA in the data frame -Specially if you are saving the result in a csv /database or some non -R env
a simple script to do that
replace_NA <- function(x,replacement="N/A"){
x[is.na(x)==T] <- replacement
}
sapply(df,replace_NA,replacement ="N/A" )
You are attempting to reinvent the fixed-width file format. Your requested format would have a blank column between every real column. I don't find a write.fwf, although the 'utils' package has read.fwf. The simplest method of getting your requested output would be:
capture.output(dat, file='test.dat')
# Result in a text file
Sample Gene RS Chromosome
1 1 A rs1 10
2 2 B X
3 3 C rs4 Y
This essentially uses the print method (at the end of the R REPL) for dataframes to do the spacing for you.

Rbind() doesn't work with character data with different names

I have tried to add a row to an existing dataset which I read into R from a csv file.
The dataset looks like this:
Format PctShare
1 NewsTalk 12.6
2 Country 12.5
3 AdultContemp 8.2
4 PopHit 5.9
5 ClassicRock 4.7
6 ClassicHit 3.9
7 RhythmicHit 3.7
8 UrbanAdult 3.6
9 HotAdult 3.5
10 UrbanContemp 3.3
11 Mexican 2.9
12 AllSports 2.5
After naming the dataset "share", I tried to add a 13th row to it by using this code:
totalshare <- rbind(share, c("Others", 32.7)
--> which didn't work and gave me this warning message:
Warning message:In`[<-.factor`(`*tmp*`, ri, value = "Others"):invalid factor level, NA generated
However, when I tried entering a row with an existing character value ("AllSports") in the dataset with this code:
rbind(share, c("AllSports", 32.7))
--> it added the row perfectly
I am wondering whether I need to tell R that there is a new character value under the column "Format" before I bind the new row to R?
Your format columns is a factor variable. Look at str(share), str(share$format), class(share$format) and levels(share$format) for more information. The reason rbind(share, c("AllSports", 32.7) worked is because "AllSports" is already an existing factor level for the format variable.
To fix the issue, convert the format column to character via:
share$format <- as.character(share$format)
Do some searches on factor variables and setting factor levels to learn more. Moreover, when you are reading in the file from csv, you can force any character strings to not convert to factors with the option, stringsAsFactors = FALSE -- for example, share <- read.csv(myfile.csv, stringsAsFactors = FALSE).
Two solution I have in mind
Solution 1:-
before reading data
options(stringsAsFactors = F)
or
Solution 2:-
as suggested by #JasonAizkalns

R rbind error row.names duplicates not allowed

There are other issues here addressing the same question, but I don't realize how to solve my problem based on it. So, I have 5 data frames that I want to merge rows in one unique data frame using rbind, but it returns the error:
"Error in row.names<-.data.frame(*tmp*, value = value) :
'row.names' duplicated not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘100’, ‘1000’, ‘10000’, ‘100000’, ‘1000000’, ‘1000001 [....]"
The data frames have the same columns but different number of rows. I thought the rbind command took the first column as row.names. So tried to put a sequential id in the five data frames but it doesn't work. I've tried to specify a sequential row names among the data frames via row.names() but with no success too. The merge command is not an option I think because are 5 data frames and successive merges will overwrite precedents. I've created a new data frame only with ids and tried to join but the resulting data frame don't append the columns of joined df.
Follows an extract of df 1:
id image power value pol class
1 1 tsx_sm_hh 0.1834515 -7.364787 hh FR
2 2 tsx_sm_hh 0.1834515 -7.364787 hh FR
3 3 tsx_sm_hh 0.1991938 -7.007242 hh FR
4 4 tsx_sm_hh 0.1991938 -7.007242 hh FR
5 5 tsx_sm_hh 0.2079365 -6.820693 hh FR
6 6 tsx_sm_hh 0.2079365 -6.820693 hh FR
[...]
1802124 1802124 tsx_sm_hh 0.1991938 -7.007242 hh FR
The four other df's are the same structure, except the 'id' columns that don't have duplicated numbers among it. 'pol' and 'image' columns are defined as levels.
and all.pol <- rbind(df1,df2,df3,df4,df5) return the this error of row.names duplicated.
Any idea?
Thanks in advance
I had the same error recently. What turned out to be the problem in my case was one of the attributes of the data frame was a list. After casting it to basic object (e.g. numeric) rbind worked just fine.
By the way row name is the "row numbers" to the left of the first variable. In your example, it is 1, 2, 3, ... (the same as your id variable).
You can see it using rownames(df) and set it using rownames(df) <- name_vector (name_vector must have the same length as df and its elements must be unique).
I had the same error.
My problem was that one of the columns in the dataframes was itself a dataframe. and I couldn't easily find the offending column
data.table::rbindlist() helped to locate it
library(data.table)
rbindlist(a)
# Error in rbindlist(a) :
# Column 25 of item 1 is length 2 inconsistent with column 1 which is length 16. Only length-1 columns are recycled.
a[[1]][, 25] %>% class # "data.frame" K- this should obviously be converted to a column or removed
After removing the errant columndo.call(rbind, a) worked as expected

obtain value from string in column headers in R

I have a text file that looks like the following
DateTime height0.1 height0.2
2009-01-01 00:00 1 1
2009-01-02 00:00 2 4
2009-01-03 00:00 10 1
Obviously this is just an example and the actual file contains a lot more data i.e. contains about 100 column, and the header can have values in decimals. I can read the file into R with the following:
dat <- read.table(file,header = TRUE, sep = "\t")
where file is the path of the table. This creates a data.frame in the workspace called dat. I would now like to generate a variable from this data.frame called 'vars' which is an array made up of the numbers in the column headers (except from DateTime which is the first column).
for example, here I would have vars = 1,2
Basically I want to take the number that is in the string of the header and then store this in a separate variable. I realize that this will be extremely easy for some, but any advice would be great.
If all the numbers you've are at the end of the names, for example, not like h984mm19, then, you can just remove everything except numbers and punctuations using gsub and convert it to numeric vector as follows:
# just give all names except the first column
my_var <- as.numeric(gsub("[^0-9[:punct:]]", "", names(dat)[-1]))
# [1] 0.1 0.2

passing a string as a data frame column name

I have a data frame called data.df with various columns say col1,col2,col3....col15. The data frame does not have a specific class attribute but any attribute could be potentially used as a class variable. I would like to use an R variable called target which points to the column number to be treated as class as follows :
target<-data.df$col3
and then use that field (target) as input to several learners such as PART and J48 (from package RWeka) :
part<-PART(target~.,data=data.df,control=Weka_control(M=200,R=FALSE))
j48<-J48(target~.,data=data.df,control=Weka_control(M=200,R=FALSE))
The idea is to be able to change 'target' only once at the beginning of my R code. How can this be done?
I sometimes manage to get a lot done by using strings to reference columns. It works like this:
> df <- data.frame(numbers=seq(5))
> df
numbers
1 1
2 2
3 3
4 4
5 5
> df$numbers
[1] 1 2 3 4 5
> df[['numbers']]
[1] 1 2 3 4 5
You can then have a variable target be the name of your desired column as a string. I don't know about RWeka, but many libraries such as ggplot can take string references for columns (e.g. the aes_string parameter instead of aes).
If you ask about using references in R, it is impossible.
However, if you ask about getting a column by name not explicitly given, this is possible with [ operator, like this:
theNameOfColumnIwantToGetSummaryOf<-"col3"
summary(data.df[,theNameOfColumnIwantToGetSummaryOf])
...or like that:
myIndexOfTheColumnIwantToGetSummaryOf<-3
summary(data.df[,sprintf("col%d",myIndexOfTheColumnIwantToGetSummaryOf)])

Resources