I have an error that I don't understand.
I have downloaded an Excel file with unemploymente rates by country and by year.
Basically, column 1 is Country, column 2 is 1990, column 3 etc...
I am trying to plot an histogram unemployment rate in 2005.
I use this code:
qplot(x=2005,y=Country,data=data)
But I always have this error:
Error: unexpected numeric constant in
I have tried to:
- convert all the names in character
- add a "y" before the year
- put brackets
But I still have this error.
Error: unexpected numeric constant in "qplot(y=data$2005"
Error: unexpected numeric constant in "qplot(x=y 2005"
With brackets, I have this error
Error: unexpected '[' in "qplot(x=["
Any idea? Many thanks in advance!
Edit:
Dataset:[link]https://docs.google.com/spreadsheets/d/1frieoKODnD9sX3VCZy5c3QAjBXMY-vN7k_I9gR-gcU8/pub?gid=0[link]
I have downloaded it (xlxs format), and changed the name of the first column
library(ggplot2)
library(readxl)
file<-"indicator_t 15-24 unemploy.xlsx"
excel_sheets(file)
data<-read_excel(file)
I've tried to plot:
qplot(x=2005,y=Total 15-24 unemployment (%),data=data)
Error: unexpected numeric constant in "qplot(x=2005,y=Total 15"
I have changed the named of the first column, and added a "y" before the years.
names2<-paste("y",names(data[,2:length(data)]))
data2<-c("Country",names2)
colnames(data)<-data2
I still have an error:
qplot(x=y2005,y=Country,data=data)
Error in eval(expr, envir, enclos) : object 'y2005' not found
There are several problems in your code, and you could certainly benefit from reading some basic references on R, such as http://tryr.codeschool.com/
What you are trying to do may be accomplished by
qplot ( x = data$"2005" , ylab="Total 15-24 unemployment (%)")
Here, the first argument specifies which data should be plotted, and ylab is used to set the y-axis label. Notice that this label must be enclosed by "quotes".
Edit:
Note also that "2005" may or may not be the name of your column. Check what are your column names with colnames(data).
Regarding the comment below, if the name of the column is actually 2005, you need to quote it as well. If you don't, R will interpret 2005 as a numerical constant:
> x$2000
Error: unexpected numeric constant in "x$2000"
> x$"2000"
[1] 1 2 4 6
Related
I am trying to perform Mandalian Randomisation using the R package “TwoSampleMR”.
As exposure data, I use instruments from the GWAS catalog. (Phenotype - Sphingolipid levels).
As a outcome data, I use GISCOME ischemic stroke outcome GWAS (http://www.kp4cd.org/index.php/node/391)
I have an error when I do harmonization by the command harmonise_data().
The text of the error is:
**Error in data.frame(…, check.names = FALSE) : arguments imply differing number of rows: 1, 0**.
I have noticed that the error is caused by some exact lines in the file with outcomes. When I make a text file that contains only one line from the original file and use it as outcome data, some lines cause an error, and someones don’t.
As an example this one causes an error:
MarkerName CHR POS Allele1 Allele2 Freq1 Effect StdErr P-value
rs10938494 4 47563448 a g 0.2139 0.0294 0.0519 0.5706
This one doesn’t:
rs1000778 11 61655305 a g 0.2559 0.0939 0.0493 0.05705
Here is all commands that I use.
library(TwoSampleMR)
library(MRInstruments)
data(gwas_catalog)
exp <- subset(gwas_catalog, grepl("Sphingolipid levels", Phenotype))
exp_dat<-format_data(exp)
exp_dat<-clump_data(exp_dat)
exp_dat
out_dat<-read_outcome_data(
snps=exp_dat$SNP,
filename='giscome.012vs3456.age-gender-5PC.meta1.txt'
sep='\t', snp_col='MarkerName',
beta_col='Effect',
se_col='StdErr',
effect_allele_col='Allele1',
other_allele_col='Allele2',
eaf_col='Freq1',
pval_col='Р-value'
)
dat<-harmonise_data(exporsure_dat=exp_dat, outcome_dat=out_dat)
What would be the reason for this problem?
Thank you.
It is difficult to comment without looking at your sample input file but you might encounter this sort of error when there are inconsistencies with naming the exposure columns in your data frame.
Please see this thread on.
https://github.com/MRCIEU/TwoSampleMR/issues/226
I am use R mutate to update a specific (conditional) row with a calculated function, namely, nrow(), to update with an add (+) value. I cannot use apply() as I need to update only one (1) row for a specific value.
For example, when find row Year==2007 and Month==06, add Incoming.Exam + nrow(df3), so that row will be 698+nrow value.
I get the following error from mutate impl:
Error in mutate_impl(.data, dots) :
Column abberville_LA must be length 96 (the number of rows) or one, not 4
abberville_LA %>%
mutate(abberville_LA, Incoming.Exam = ifelse(abberville_LA$Year == 2007 & abberville_LA$Month == 06, abberville_LA, Incoming.Exam + nrow(abberville_df3), abberville_LA$Incoming.Exam))
head(abberville_LA, 3)
Incoming.Exam Year Month ts_date
1 698 2007 6 2007-06-01
2 NaN 2010 6 2010-06-01
1 .Your question is not clear , So I am trying to apprehend what you want and answering the question
2 .You are using $ in mutate which is not required . Running the below code should solve the issue .
abberville_LA %>%
mutate(Incoming.Exam = ifelse(Year == '2007' & Month == '06', Incoming.Exam + nrow(abberville_df3),Incoming.Exam))
the issue was the library dplyr. I discovered that I had an slightly older version and needed to update to resolve the "Error in mutate_impl(.data, dots) : Evaluation error: as_dictionary() is defunct as of rlang 0.3.0. Please use as_data_pronoun() instead" error message, which was pointing out that another version of dplyr should be utilized. This fixed the code that was provided as answers on this forum.
I have a csv file having 4 columns labeled AGE, DIASTOLIC, BMI and EVER.PREGNANT and 700 rows. The last column consists of only yes or no. I wish to plot the data BMI vs EVER.PREGNANT with an intent to comparing BMI of those with yes in the fourth column and no in the same column. What code should I write to get the required boxplot?
I have tried the following code:
Sheet=read.csv(/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv, sep=",")
boxplot(BMI~EVER.PREGNANT,data=sheet, main="BMI vs PREG",xlab="BMI",ylab="PREGNANT")
The error that I get is
Error in eval(expr,envr,enclos): object 'Sheet' not found
Similarly, what modifications can be done to plot AGE vs DIASTOLIC, where both columns are numbers? Will I get the 700 odd values nicely?
I answer here because it tells me not to extend the discussion :-).
I think you haven't loaded correctly your data set. You need to add header = T when loading to tell the program that your first row corresponds with the names of the variables.
Sheet=read.csv("/Downloads/1739230_1284354330_PIMA.csv", sep=",", header = T)
I understand the subject "Error: Aesthetics must be either length 1 or the same as the data" has been done a lot (plenty of reading available online), however, I still have some unresolved questions
I am working with a dataset regarding all calls made to the Seattle Police Department in 2015. After I am done cleaning the data into an acceptable format I wind up with a dataset that is 62,092 rows and 13 columns (dataset name is SPD_2015). I would add a portion of the dataset to this question but I'm not entirely sure how to do it in a clean and legible format.
I used package lubridate to extract the times associated with my data set. I then created a bar graph that showed what time the crimes occur
ggplot(SPD_2015, aes(hour(date.reported.time))) +
geom_bar(width = 0.7)
and that works perfectly.
Since Car Prowls were the most frequently reported crime, I wanted to graph what time these car prowls occurred. And this is when I come across the error ""Error: Aesthetics must be either length 1 or the same as the data".
I read that ggplot2 does not like it when you subset within the ggplot code, so I subsetted my data by creating a separate data frame.
car.prowl <- filter(SPD_2015, summarized.offense.description == "CAR PROWL")
So here is my question. Why is it that when I look at the dimensions of my newly created dataset "car.prowl" I see that it has a dimension of 11,539 rows and 13 columns. But when I examine the length of the hours in the occurred.time column (the time that the crime occurred) I get a length of 62,092 which is the length of the original dataset?
In my mind I am picturing that the following code would work:
ggplot(car.prowl, aes(hour(occured.time))) +
geom_bar()
The length of the car.prowl$occured.time is correct:
> length(car.prowl$occured.time)
[1] 11539
but when I apply the hour function I get the length of the original dataset:
> length(hour(car.prowl$occured.time))
[1] 62092
when it should be 11,539.
Thank you. Please let me know what I can do to make my question more clear.
It could be a caching issue as Jeremy said above. I'm not sure this would work, but you could try the below, chaining things together.
SPD_2015%>%
filter(summarized.offense.description == "CAR PROWL")%>%
ggplot(aes(hour(occured.time)))+
geom_bar()
I am having a big dataframe (dataset_n) consisting of several columns, each for a different variable.
I am concentrating now on the variables:
q32, i.e. net recalled wages
pgssyear, i.e. the year when a person was asked the question about the wages
I would like to create an additional column that would stand for the CPI in a given year (pgssyear), so that I can calculate real wages later on (dividing the q32 with the CPI column).
I tried the following:
- copying the pgssyear under a different name (creating a new vector with the name "year", but with the same contents) and replacing the first year: 1992 with 1 using "replace" hoping to be able to replace 1993 with e.g. 1.35, etc.:
attach(dataset_n)
dataset_n$year <- pgssyear
detach(dataset_n)
dataset_n$year
neither of the options:
replace(dataset_n$year, dataset_n$year == 1992, 1)
replace(dataset_n$year, dataset_n$year == "1992", 1)
with(data.frame(dataset_n), replace(dataset_n$year, dataset_n$year == 1992, 1)))
worked for me. Each time I got the massage: "object of type 'closure' is not subsettable"
dataset_n$year[dataset_n$year==1992] <- 1
did not work either and I got the message:
Warning message:
In [<-.factor(*tmp*, dataset_n$year == 1992, value = c(NA, NA, :
invalid factor level, NA generated
I suspect that when creating the new vector the numeric data got treated as factors.
I tried also:
as.numeric(gsub(1992, 1, dataset_n$year))
as.numeric(gsub(1993, 1.35, dataset_n$year))
This time the values got replaced, but I failed to achieve it "all at once", which is what I need.
I have also run out of further ideas, so any help would be appreciated.
The other threads I have seen and which might be related are:
Replace given value in vector
Error in <my code> : object of type 'closure' is not subsettable
To make this line work:
dataset_n$year[dataset_n$year==1992] <- 1
Convert your year vector to numeric from factor like this:
dataset_n$year <- as.numeric(as.character(dataset_n$year))