Autocompletion error during the data frame column selection in RStudio - r

I used the readxl package to import from the Excel file into RStudio. Now I'm trying to access a column in that dataset using the $ operator. However, I keep getting the notification:
(Type Error): null is not an object (evaluating a.length)
Even though I've performed this type of operation many times before without issue...
The error I'm getting:
The dataset in the Global Environment pane:

The root of the problem is located in NA used as a column name. The error is thrown since RStudio autocompletion is not able to extract column names.
Please see the reproduction of the problem:
df <- data.frame(a = 1:3, b = 1:3)
names(df)[2] <- NA
If you will try to typedf$a the error below will be generated.
To avoid this kind of situation you should assign data.frame column names explicitly. You have to options:
assign names(df) <- c("a", "b");
delete the spacer columns from the source Excel file to avoid NA use as the column names.

Related

R: Read specific columns of .dta file and converting variable names to lower case without reading whole file

I have a folder with multiple .dta files and I'm using the read_dta() function of the haven library to bind them. The problem is that some of the files have thier column names in lower case and others have them in upper case.
I was wondering if there is a way to only read the specific columns by changing their name to lower case in every case without reading the whole file and then selecting the columns, since the files are really large and this would take forever.
I was hoping that by using the .name_repair = element in the read_dta() function I could do this, but I really don't know how.
Im trying something like this
#Set working directory:
setwd("T:/")
#List of .dta file names to bind:
list_names<-list_names[grepl("_sdem.dta", list_names)]
#Variable names to select form those files:
vars_select<-c("r_def", "c_res", "ur", "con", "n_hog", "v_sel", "n_pro_viv","fac", "n_ren", "upm","eda", "clase1", "clase2", "clase3", "ent", "sex", "e_con", "niv_ins", "eda7c", "tpg_p8a","emp_ppal", "tue_ppal", "sub_o" )
#Read and bind ONLY the selected variables form the list of files
dataset <- data.frame()
for (i in 1:length(list_names)){
temp_data <- read_dta(list_names[i], col_select = vars_select)
dataset <- rbind(dataset, temp_data)
}
The problem is that when some of the files have their variable names in upper case format, their variables are not in the vars_select list and therefore, the next error appears:
Error: Can't subset columns that don't exist.
x Columns `r_def`, `c_res`, `n_hog`, `v_sel`, `n_pro_viv`, etc. don't exist.
I was trying to use the .name_repair = element in the read_dta() function to try to correct this, by using the tolower() function.
I was trying something like this with a specific file that has an upper case variable name format:
example_data <- read_dta("T:/2017_2_sdem.dta", col_select = vars_select, .name_repair = tolower(names()))
But the same error appears:
Error: Can't subset columns that don't exist.
x Columns `r_def`, `c_res`, `n_hog`, `v_sel`, `n_pro_viv`, etc. don't exist.
Thanks so much for your help!

How to bind rows in R such that instead of type conversion error binding defaults to filling value to NA?

I am currently tasked with merging multiple xlsx files into one master R (.rds) data file. Since these files are filled in manually there is a lot type conversion errors when using approaches such as dyplr::bind_rows such as
Column ``XYZ`` can't be converted from numeric to character
While I very much need the binding to be "smart" such that it happens according to the corresponding column names of the to be merged dataframes -when encountering conversion issues instead of getting an error, I would like to have these problematic cell contents treated as NA and not get an error - just a warning perhaps.
Is there a convenient way/function for doing this in R?
I have used bind_rows from dyplr package.
My current import procedure
files <- list.files("data",pattern = "xlsx", full.names = TRUE)
tmp <- read_excel(files[1], sheet = "data", trim_ws = TRUE)
names(tmp) <- make.names(str_squish(names(tmp)))
for (i in 2:length(files)) {
print(i)
tmp2 <- read_excel(files[i], sheet = "data",trim_ws = TRUE)
names(tmp2) <- make.names(str_squish(names(tmp2)))
tmp<-bind_rows(tmp,tmp2)
}
It has been pointed out that using a loop here is not efficient, but since the files are messy - many manual mistakes - and relatively small in number I focused on being able to sequentially track the binding process.

Error in is.data.frame(x) : object '' not found , how could I fix this?

I'm trying to run correlations on R.
This is my code so far:
library("foreign")
mydata<-read.csv(" ",header=FALSE)
options(max.print=1000000)
attach(mydata)
cor(as.numeric(agree_election),as.numeric(agree_party))
Then it gives me the error that object "agree_election" is not an object.
However, agree_election is just one of the headers of my columns for my excel spreadsheet.How do I fix this?
Check the names in your data frame! Does it contain a variable with a name agree_election?
Please avoid the attach function. It could be fine with just one data frame, but it can make a mess if you have several data frames attached.
This could should be fine, if the variable names are correct.
mydata <- read.csv("...", header = F)
names(mydata)
str(mydata)
cor(as.numeric(mydata$agree_election), as.numeric(mydata$agree_party))

cannot handle matrix/array columns with write.dbf

hope i get everything together for this problem. first time for me and it's a little bit tricky to describe.
I want to add some attributes to a dbf file and save it afterwards for use in qgis. its about elections and the data are the votes from the 11 parties in absolute and relative values. I use the shapefiles package for this, but also tried it simply with foreign.
my system: RStudio 0.97.311, R 2.15.2, shapefile 0.7, foreign 0.8-52, ubuntu 12.04
try #1 => no problems
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
try #2 => "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDataDistricts <- cbind(shpDataDistricts, votesDistrict[, 2]) # add a new column
names(shpDataDistricts)[5] <- "SPOE"
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
the write function returns "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
so by simply adding a column (integer) to the data.frame, the write.dbf function isn't able to write out anymore. am now debugging for 3 hours on this simple issue. tried it with shapefiles package via opening shapefile and dbf file, all the time the same problem.
When i use the foreign package directly (read.dbf).
if i save the dbf-file without the voting data (only with the small adapations from step 1+2), it's no problem. It must have to do with the merge with the voting data.
I got the same error message ("error in get("write.dbf"...) while working with shapefiles in R using rgdal. I added a column to the shapefile, then tried to save the output and got the error. I was added the column to the shapefile as a dataframe, when I converted it to a factor via as.factor() the error went away.
shapefile$column <- as.factor(additional.column)
writePolyShape(shapefile, filename)
The problem is that write.dbf cannot write a dataframe into an attribute table. So I try to changed it to character data.
My initial wrong code was:
d1<-data.frame(as.character(data1))
colnames(d1)<-c("county") #using rbind should give them same column name
d2<-data.frame(as.character(data2))
colnames(d2)<-c("county")
county<-rbind(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "PANY_animals_84.dbf") **##doesn't work**
##Error in write.dbf(dataname, ".bdf")cannot handle matrix/array columns
Then I changed everything to character, it works! right code is:
d1<-as.character(data1)
d2<-as.character(data2)
county<-c(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "filename")
Hope it helps!

colnames intgroup arguement of arrayQualityMetrics package of Biobase

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.
It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Resources