Error message in Rstudio while uploading the data - r

Can anyone help me why I get this type of error every time I upload the data in R?
Any solution for that?

The data you're reading in probably has the columns poison, time and treat. For some reason these words are already taken in R's namespace.
When you attach the table, R tries to assign the names poison, time, and treat to refer to the respective columns. An example with the sleep dataset:
data1 <- as.data.frame(sleep)
names(sleep)
[1] "extra" "group" "ID"
attach(data1)
# Now ID is assigned to data1.ID in R's namespace
data2 <- as.data.frame(sleep)
attach(data2)
# The message you're getting
The following objects are masked from data1:
extra, group, ID
To avoid this problem, which can lead to unintended outcomes, make sure to call detach(data1) before attaching further datasets with non-unique column names.

Related

Replacing column values based on related column in R

I'm currently working on a dataset which has an address and a zip code column. I'm trying to deal with the invalid/missing data in zip code by finding a different record with same address, and then filling the corresponding zip code to the invalid zip code. What would be the best approach to go about doing this?
Step 1. Using the non-missing addresses and zip codes construct a dictionary
data frame of sorts. For example, in a data frame "df" with an "address"
column and a "zip_code" column, you could get this via:
library(dplyr)
zip_dictionary <- na.omit(select(df, address, zip_code))
zip_dictionary <- distinct(zip_dictionary)
This assumes there is only one unique value of "zip_code" for each "address"
in your data. If not, you need to figure out which value to use and filter or
recode it accordingly.
Step 2. Install the {elucidate} package from GitHub and use the translate()
function to fill in the missing zip codes using the extracted dictionary from
step 1:
remotes::install_github("bcgov/elucidate")
library(elucidate)
df <- df %>%
mutate(zip_code = if_else(is.na(zip_code),
translate(address,
old = zip_dictionary$address,
new = zip_dictionary$zip_code)
)
)
disclaimer: I am the author of the {elucidate} package

Sentiment Analysis Of A Dataset With Multiple NewsPaper Articles

I'm trying to call get_nrc_sentiment in R but getting the following error:
Error in get_nrc_sentiment(Test) : Data must be a character vector.
Can anyone see what I'm doing wrong?
library("RDSTK")
library("readr")
library("qdap")
library("syuzhet")
library("ggplot2")
library(readxl)
Test <- read_excel("Test.xlsx")
View(Test)
scores = get_nrc_sentiment(Test) //throwing error
I suspect that the Test.xlsx file your are reading in has multiple columns. In that case, the Test object would not be a character vector, but a dataframe. Putting the dataframe object into the get_nrc_sentiment() causes the error. You can check test with class(Test) to determine what kind of R object it is.

Reading CSV file in R and formatting dates and time while reading and avoiding missing values marked as?

I am trying to Reading CSV file in R . How can I read and format dates and times while reading and avoid missing values marked as ?. The data I load after reading should be clean.
I tried something like
data <- read.csv("Data.txt")
It worked, but the dates and times were as is.
Also how can I extract a subset of data from specific data range?
For this I tried something like
subdata <- subset(data,
Date== 01/02/2007 & Date==02/02/2007,
select = Date:Sub_metering_3)
I get error Error in eval(expr, envir, enclos) : object 'Date' not found
Date is the first column.
The functions read.csv() and read.table() are not set up to do detailed fancy conversion of things like dates that can have many formats. When these functions don't automatically do what's wanted, I find it best to read the data in as text and then convert variables after the fact.
data <- read.csv("Data.txt",colClasses="character",na.strings="?")
data$FixedDate <- as.Date(data$Date,format="%Y/%m/%d")
or whatever your date format is. The variable FixedDate will then be of type Date and you can use equality and other conditions to subset.
Also, in your example code you are putting 01/02/2007 as bare code, which results in dividing 1 by 2 and then by 2007 yielding 0.0002491281, rather than inserting a meaningful date. Consider as.Date("2007-01-02") instead.

R spCbind error

I have successfully added information to shapefiles before (see my post on http://rusergroup.swansea.ac.uk/Healthmap.ashx?HL=map ).
However, I just tried to do it again with a slightly different shapefile (new local health boards for Wales) and the code fails at spCbind with a "row names not identical error"
o <- match(wales.lonlat$NEW_LABEL, wds$HB_CD)
wds.xtra <- wds[o,]
wales.ncchd <- spCbind(wales.lonlat, wds.xtra)
My rows did have different names before and that didn't cause any problems. I relabeled the column in wds.xtra to match "NEW_LABEL" and that doesn't help.
The labels and order of labels do match exactly between wales.lonlat and wds.xtra.
(I'm using Revolution R 5.0, which is built on R 2.13.2)
I use match to merge data to the sp data slot based on rownames (or any other common ID). This avoids the necessity of maptools for the spCbind function.
# Based on rownames
sdata#data=data.frame(sdata#data, new.df[match(rownames(sdata#data), rownames(new.df)),])
# Based on common ID
sdata#data=data.frame(sdata#data, new.df[match(sdata#data$ID, new.df$ID),])
# where; sdata is your sp object and new.df is a data.frame object that you want to merge to sdata.
I had the same error and could resolve it by deleting all other data, which were not actually to be added. I suppose, they confused spCbind because the matching wanted to match all row-elements, not only the one given. In my example, I used
xtra2 <- data.frame(xtra$ID_3, xtra$COMPANY)
to extract the relevant fields and fed them to spCbind afterwards
gadm <- spCbind(gadm, xtra2)

colnames intgroup arguement of arrayQualityMetrics package of Biobase

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.
It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Resources