Read in only certain rows - r

I have the set of data below. It has a few rows of unwanted characters before the numbers I want to read in, as well as a few unwanted rows after the data. I created a substring that will serve as my first column, which is purely numerical. There is data, when the set is read in, above and below these numericals that were converted to NA. Is there a way, other than skip and nrow, that I can remove the NA rows and read in only those rows that are numerical?
x<-read.csv("..."),
header=FALSE, na.strings="Y")
y<-substr(x$V1,1,8)
y<-as.numeric(y)
x2<-cbind(y,x1)
x2<-as.data.frame(x2)
I have tried:
if (x$y == is.numeric) {
print(x)
} else {
print("")}
But that is clearly wrong as all I get are errors. I have been trying different combinations of the above code, as well as:
x3<-sapply(x$y,is.numeric)
x[x3,]
But nothing I try is working.. I am either completely off or am missing something.
UPDATE: I was able to do this with both methods that were answered below.. but the problem now is, since the rows above the numeric rows contained characters, my columns are factors rather than numeric. Rather than actually deleting the rows, we were just temporarily removing them. Is there a way to permanently remove them so that my columns will be class numeric?

If this is just the case of remove rows containing NAs, have you tried using complete.cases? Perhaps something like:
x2[complete.cases(x2),]
Also if would be great if you could provide a minimal reproducible sample.

Related

Removing NA values from rows, without removing the rows in R

I have a dataset in which the data would look something like this:
a fragment dataframe with the data
So lot's of NA's per row, but also regular answers, that I want in the final version.
Is it possible to remove the NAs, but without removing the rows as a whole?
I thought about pivoting and removing rows with NA, but then it would just remove the occurences that have actual answers as well.
The data is coming form a decision making procedure in qualtrics, in which not every option is displayed to the participants (hence the NAs), but we do not want to exclude people in any step. I also thought about maybe recoding the values, and subsetting them somehow, but that doesn't seem to work out right in my mind when it comes to the actual analysis.
I tried removing the NAs, as well as pivoting the table and removing them later.
I do not yet have the full dataset, but want to experiment on strategies of data analysis before I have the data collected, to not get lost once I have it.

Finding a character variable in a column

I have a huge data frame df, with many columns. One of the columns named id_nm happens to be a character with values such as: aksh123dn.Ins
class(df$id_nm)
returns character
I need to lookup all those values which have the id_nm say aksh123dn.Ins
I used:
new_df<-df[df$id_nm=='aksh123dn.Ins',]
this returns the entire df which isn't the case in reality
also tried:
new_df<-df%filter(id_nm=='aksh123dn.Ins']
still getting the same answer
I think its possibly because it is a character string. Please help me with this. TIA

Why does R think my imported vector of characters are numbers?

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

R data frame issue - non-numeric headers

This is definitely a rookie question but I'm not finding an answer for this (maybe because of my wording) so here goes:
I'm reading a data frame into R studio (csv file) that has 24 columns with headers. There are only numbers in these columns (they're essentially concentrations of several chemicals). It's called all. I need to use them as numeric vectors. When I read them in and type
is.numeric(all[,1])
I get
TRUE
When I type
is.numeric(all[1])
I get
FALSE
I think this is because R interprets the header as a factor. I also tried reading in a table without headers and with headers=FALSE, but R renames it to V1, V2 etc so the result ends up being the same.
I need to work with functions where I invoke something like all[2:24]. How can I go about to make R either "not see" the header or remove it altogether?
Thanks for the answers!
PS: the dataframe I am using (without headers - if it had headers, it would just have names instead of V1, V2, etc) is something like this:
This is a subset from the first column, not the first row.
all[,1]) #subset first column
The following is subset of first row
all[1,]) #subset first row (headers of df not included)
To give columnames
colnames(all) <- c("col1","col2")
Your assumption is wrong. You have a data.frame and all[1] does list subsetting, which results in a data.frame, which is not a vector, and not a numeric vector in particular.
You should study help("[") and An Introduction to R.

NA Values Appear for All Data in Imported .csv File

I imported a set of data into RStudio containing 85 variables and 139 observations. All values are integers except for the last column which is blank and for some reason was imported alongside everything else in the .csv file I created from a .xls file.
As such, this last column is all NA values. The problem is that when I try to run any kind of analysis it seems to be reading that all values are NA values. Despite this, in the data window in RStudio everything seems to be fine. Are there solutions to this problem that don't involve the data? Is it almost certainly the data that's the problem?
It seems strange that when opening the file anywhere else and even viewing it in R
The most likely issue is that the file is being imported as all text rather than as numeric data. If all of the data is numeric you can just use colClasses="numeric" as an argument to the read.csv() function and that should import correctly. You could also change the data class once it is in R, or give colClasses a vector of different classes if you have a variety of different data types (logical, character, numeric etc.) in your file.
Edit
Seeing as colClasses is not working (it is hard to say why without looking at your data), you can try this:
MyDF<-data.frame(sapply(MyDF,FUN=as.numeric))
Where MyDF is your datafraome. That will change all of your columns to numeric. If you have some charcter/factor/logical values in there this may not work as expected. You might want to check your excel file/csv to see why it is importing a NA column. It could be that there is a cell with a space in it that is being pulled in and this is throwing things off. You could always try deleting that empty column and retrying your import.
If you want to omit your last column while reading the data itself, you can try the following code. In this example, I am assuming that your file has 5 columns and the 5th column has NA values. So, you want to skip reading 5th column in your data set.
data <- read.csv (fileName, ....) [,1:4]
or, if you want to use column names, you can use:
data <- read.csv (fileName, ....) [,c('col1','col2','col3','col4')]
This will read all the observations from selected columns within your data set.
Hope this helps.
If you are trying too find the mean and standard deviation you can use
Data<- mean( dataframe$colname , na.rm = TRUE)
Data1<- sd( dataframe$colname , na.rm = TRUE)
This will give u the answer after omitting the na values from the column

Resources