Plotting uneven row sizes in R - r

I have data in tab delimited rows of uneven length and I want to make a histogram for each row:
1    23    352    4    12    94    0    2
434    13    29
5    93    93    34
(...more rows)
This is what I currently have (no fanciness included):
data = read.delim(file.txt,header = F, sep="\t")
for (j in 1:nrow(data)) { #loop over each row
hist(data[j,])
But when I try to make the histogram, I think it tries to include the NA's in the row of the data frame, since R gives me the error message: "Error in hist.default(data[2, ]) : 'x' must be numeric".
When I try to use:
read.scan("file.txt, sep="\t")
I'm left with something I don't know how to separate by rows. Do I have a better option than splitting the file into one row per file and then reading in each row separately? (I am running into the same problem with uneven column size...)

The error results from the fact that grabbing a row from a data.frame yields an object of class data.frame (and hist() wants class numeric). Just convert it to numeric:
hist(as.numeric(data[j,]))

Related

Import CSV and plotting ECDF

I'm new to R and I'm having some troubles on how to use Empirical Cumulative Distribution function.
I have a CSV file containing 100k values (exported from excel), which I'm importing like so:
MyData <- read.csv(file="test.csv", header=TRUE, sep=",")
which seems to be okay but as soon when I type
P = ecdf(MyData)
I get the error:
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
I've noticed MyData[1] outputs all my values and tried
P = ecdf(MyData[1]) but alas I get the same error.
I've searched around and it seems the error pops up in a lot of scenarios so I can't really find what the exact issue is, any help will be nice as I'm extremely new to this.
You should use either ecdf(MyData[, 1]) or ecdf(MyData[[1]]) because ecdf expects a numeric vector as intput. When you use MyData[1] R will print all values but it is a dataframe, not a vector.
From ecdf help file you can read that x, the input for ecdf should be a numeric vector.
At least from my reading of ecdf, the input is a vector. So you'll need to pass a vector from your dataframe by specifying the column. You can do this by doing P <- ecdf(MyData$col1), where col1 is the name of that factor, or by doing so numerically: P <- ecdf(MyData[,1], which subsets the data, to all rows of column 1.

Can't get 'plotweb' in the Biparite package to work (R)

I am trying to visualise a biparite network using the biparite package in R. My data consists of 4 columns in a spreadsheet. The columns contain 1) plant species names2) bee species names 3) site 4) interaction frequency. I first read the data into R from a CSV file, then convert it to a web using the helper function frame2webs. When I then try to visualise the network with plotweb() I get the error message:
Error in web[rind, cind, drop = FALSE] : incorrect number of dimensions
My code looks like this:
library(bipartite)
bee <- read.csv('TestFile.csv')
bees <- as.data.frame(bee)
BeeWeb <- frame2webs(bees, type.out = "array")
plotweb(BeeWeb)
I've also tried:
BeeWeb <- frame2webs(bees,
varnames = c("higher","lower","webID","freq"),
type.out = "array")
Please help! I am new to R and am struggling to make this work. Cheers!
Not sure what your data look like, but this happens to me when I have a single factor level in either the "higher" or "lower" column, type.out is "list", and emptylist is TRUE.
This is due to a problem in empty, a function that frame2webs only calls when type.out is "list" and emptylist is TRUE. empty finds the dimensions of your data using NROW and NCOL, which interpret a single row of input as a vertical vector. When there's only one factor level in "lower" or "higher", the input to empty is a one-row array. empty interprets this row as a column, hence the 'incorrect number of dimensions' error.
Two simple workarounds:
Set type.out to "array"
Set emptylist to FALSE

Can't complete cases of a data.frame

I'm coming because, I don't need help to realize the exercise, but I need help on an error that I can't fix..
This is the subject:
In R the more appropriate indicator for missing data is “NA” (not available). Therefore, replace each occurrence of “?” with “NA”.
a. For this exercise, create an R data frame for the mammographic data using only datapoints that have no missing values. This can be done using the complete.cases function which inputs a data frame and returns a Boolean vector v, where v[i] equals TRUE iff the i the data-frame sample is complete (meaning it does not possess an NA). For example, if the data-frame is stored in mammogram.frame, then mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),] creates a new data frame called mammogram2.frame that has all the complete mammogram data samples.
So I coded that:
mammogram = read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/mammographic-masses/mammographic_masses.data",
sep=",",
col.names=c("Birads","Age","Shape","Margin","Density","Severity"),
fill=TRUE,
strip.white=TRUE)
#Replace N/A by -1
mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]
#Display data frame
mammogram2
However I get this error:
> mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]
Error: object 'mammogram.frame' not found
I can't find on internet any solution about it, I tried lot of stuff but the missing values are still '?'
Thank

R code debugging and error correction understanding

I have this code iv written for counting that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases. I need help in the error in this code which is:
Error in [.data.frame(data, i) : undefined columns selected
In addition: Warning messages:
1: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
2: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
3: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
The code is the following:
complete<-function(directory, id=1:332){
files.list<-list.files(directory, full.names=TRUE, pattern=".csv")
comp<-character()
return.data<-data.frame()
nobs<-numeric()
for(i in id){
data<-read.csv(files.list[i])
comp[i]<-!is.na(data[i])
nobs[i]<-nrow(comp[i])
}
return.data<-c(id,nobs)
}
Your problem is, that !is.na() returns a boolean vector and not a single value, you cannot insert multiple elements into the single element comp[i].
In R there is a function complete.cases which does exactly what you attempted. With this your function would look like this
complete<-function(directory, id=1:332){
files.list<-list.files(directory, full.names=TRUE, pattern=".csv")
nobs <- numeric(length(id))
for(i in id){
data<-read.csv(files.list[i])
nobs[i]<-sum(complete.cases(data))
}
return.data<-data.frame(id,nobs)
}
That aside your code has several flaws I want to point out
why is comp of type character?
allocate the size of a vector if you know it beforehand (nobs <- numeric(length(id)))
do you really want to check only columni of your ith loaded data.frame` for missing values?
if you assign return.data <- c(id,nobs) return.data will be a single numeric vector with ids at the beginning and nobs at the end.
you need to provide an index to your data.. so that it selects all rows and i column.e.g
comp[i]<-!is.na(data[ ,i])

Error in R "undefined columns selected"

I am trying to initiate this code using the zoo command:
gld <- zoo(gld[,7], gld_dates)
Unfortunately I get an error message telling me this:
Error in `[.data.frame`(gld, , 7) : undefined columns selected
I want to use the zoo function to create zoo objects from my data.
The function should take two arguments: a vector of data and
a vector of dates.
This is the data I am using[LINK BROKEN].
I believe I have have 7 columns in my data set. Any ideas?
The code I am trying to implement is found here[LINK BROKEN].
Is their anything wrong with this code?
You don't say what your gld_dates is exactly, but if gld starts as your original data and you want to make a zoo object of the 7th column ordering by the 1st column (dates), I can do
gld_zoo <- zoo(gld[, 7], gld[, 1])
just fine. Equivalently, but with more readability,
gld_zoo <- zoo(gld$Adj.close, gld$Date)
reminds me what each column is.
Subsetting requires the names of the subset columns to match those in the data frame. This code subsets the dataset french_fries with potat instead of potato:
data("french_fries")
df_potato <- french_fries[, c("potatoes")]
and it fails with:
Error in `[.data.frame`(french_fries, , c("potatoes")) :
undefined columns selected
but using the right name potato works:
df_potato <- french_fries[, c("potato")]

Resources