R mutate ifelse update conditional row with calculated function value - r

I am use R mutate to update a specific (conditional) row with a calculated function, namely, nrow(), to update with an add (+) value. I cannot use apply() as I need to update only one (1) row for a specific value.
For example, when find row Year==2007 and Month==06, add Incoming.Exam + nrow(df3), so that row will be 698+nrow value.
I get the following error from mutate impl:
Error in mutate_impl(.data, dots) :
Column abberville_LA must be length 96 (the number of rows) or one, not 4
abberville_LA %>%
mutate(abberville_LA, Incoming.Exam = ifelse(abberville_LA$Year == 2007 & abberville_LA$Month == 06, abberville_LA, Incoming.Exam + nrow(abberville_df3), abberville_LA$Incoming.Exam))
head(abberville_LA, 3)
Incoming.Exam Year Month ts_date
1 698 2007 6 2007-06-01
2 NaN 2010 6 2010-06-01

1 .Your question is not clear , So I am trying to apprehend what you want and answering the question
2 .You are using $ in mutate which is not required . Running the below code should solve the issue .
abberville_LA %>%
mutate(Incoming.Exam = ifelse(Year == '2007' & Month == '06', Incoming.Exam + nrow(abberville_df3),Incoming.Exam))

the issue was the library dplyr. I discovered that I had an slightly older version and needed to update to resolve the "Error in mutate_impl(.data, dots) : Evaluation error: as_dictionary() is defunct as of rlang 0.3.0. Please use as_data_pronoun() instead" error message, which was pointing out that another version of dplyr should be utilized. This fixed the code that was provided as answers on this forum.

Related

cleaning data - expanding one column to multiple columns in a dataframe

The textsample below is in one column. Using R, I hope to separate it into 5 columns with the following headings: "Name" , "Location", "Date", "Time", "Warning" . I have tried separate() and strsplit() and haven't succeeded yet. I hope someone here can help.
textsample <- "Name : York-APC-UPS\r\n
Location : York SCATS Zigzag Road\r\n
Contact : Mechanical services\r\n
\r\n
http://York-APC-UPS.domain25.minortracks.wa.gov.au\r\n
http://192.168.70.56\r\n
http://FE81::3C0:B8FF:FE6D:8065\r\n
Serial Number : 5A1149T24253\r\n
Device Serial Number : 5A1149T24253\r\n
Date : 12/06/2018\r\n
Time : 08:45:46\r\n
Code : 0x0125\r\n
\r\n
Warning : A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor
(Port 1 Temp 1 at Port 1) reporting over 50%CD.\r\n"
Here's an approach that should at least get you started:
We can use extract from tidyr extract the text of interest with regular expressions.
Then we can use mutate_all to apply the same str_replace to get rid of the labels.
library(dplyr)
library(tidyr)
library(stringr)
as.data.frame(extsample) %>%
extract(1, into=c("Name","Location","Date","Time","Warning"),
regex = "(Name : .+)[^$]*(Location : .+)[^$]*(Date : .+)[^$]*(Time : .+)[^$]*(Warning : .+)[^$]*") %>%
mutate_all(list(~str_replace(.,"^\\w+ : ","")))
# Name Location Date Time
#1 York-APC-UPS York SCATS Zigzag Road 12/06/2018 08:45:46
# Warning
#1 A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor
This relies on capturing groups with (), see help(tidyr::extract) for details. We use [^$]* to match anything other than the end of the string 0 or more times between the groups.
Note the first argument to extract is 1, which indicates the first (and only) column of the data.frame I made from your example data. Change this as necessary.

Adding additional Rows to DF

I have a completed working script, however i would like to add additional rows to the script. Same error happens in colnames additions
adding the new ones & redirecting to fresh sheet
colnames(main)<-c("Company","Mapped","Not.Mapped","Pending")
rownames(main)<-c("CompA","CompB","CompC")
write.table(main, file="Main.csv", sep=",", row.names = FALSE)
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
output should look like the below
Company Mapped Not.Mapped Pending X.Mapped
CompA 190 19 63 90.91%
CompB
CompC 66 9 36 88.00%```
I think your question is not clear at all. If you want to add a new row you can use []. Note that dim(main)[1] returns the total rows in your current object. dim(main)[1] + 1 indicates that you are going to add an additional row. The vector you pass must have 4 elements (or as many variables your dataframe have)
main[dim(main)[1] + 1,] <- c(1,2,3,4)

R: Error: unexpected numeric constant (in names of columns)

I have an error that I don't understand.
I have downloaded an Excel file with unemploymente rates by country and by year.
Basically, column 1 is Country, column 2 is 1990, column 3 etc...
I am trying to plot an histogram unemployment rate in 2005.
I use this code:
qplot(x=2005,y=Country,data=data)
But I always have this error:
Error: unexpected numeric constant in
I have tried to:
- convert all the names in character
- add a "y" before the year
- put brackets
But I still have this error.
Error: unexpected numeric constant in "qplot(y=data$2005"
Error: unexpected numeric constant in "qplot(x=y 2005"
With brackets, I have this error
Error: unexpected '[' in "qplot(x=["
Any idea? Many thanks in advance!
Edit:
Dataset:[link]https://docs.google.com/spreadsheets/d/1frieoKODnD9sX3VCZy5c3QAjBXMY-vN7k_I9gR-gcU8/pub?gid=0[link]
I have downloaded it (xlxs format), and changed the name of the first column
library(ggplot2)
library(readxl)
file<-"indicator_t 15-24 unemploy.xlsx"
excel_sheets(file)
data<-read_excel(file)
I've tried to plot:
qplot(x=2005,y=Total 15-24 unemployment (%),data=data)
Error: unexpected numeric constant in "qplot(x=2005,y=Total 15"
I have changed the named of the first column, and added a "y" before the years.
names2<-paste("y",names(data[,2:length(data)]))
data2<-c("Country",names2)
colnames(data)<-data2
I still have an error:
qplot(x=y2005,y=Country,data=data)
Error in eval(expr, envir, enclos) : object 'y2005' not found
There are several problems in your code, and you could certainly benefit from reading some basic references on R, such as http://tryr.codeschool.com/
What you are trying to do may be accomplished by
qplot ( x = data$"2005" , ylab="Total 15-24 unemployment (%)")
Here, the first argument specifies which data should be plotted, and ylab is used to set the y-axis label. Notice that this label must be enclosed by "quotes".
Edit:
Note also that "2005" may or may not be the name of your column. Check what are your column names with colnames(data).
Regarding the comment below, if the name of the column is actually 2005, you need to quote it as well. If you don't, R will interpret 2005 as a numerical constant:
> x$2000
Error: unexpected numeric constant in "x$2000"
> x$"2000"
[1] 1 2 4 6

Error: impossible to replicate vector of size in mutate

I have been using the following code to determine diversity (using vegan package) and it has been going well. In order to calculate diversity using vegan, you have to create a dataframe with only site by species. Then you calculate diversity and then use dplyr's mutate to be able to create a new column to your original dataframe that is your diversity metric.
final_corrected %>% select(eu_density, para_density, bleph_density, colp_density, rot_density, vort_density) -> final_speciesonly
H.protists <- diversity(final_speciesonly)
final_corrected %>% mutate(diversity = H.protists) -> final_diversity
My problem is that I tried to do this analysis again with a summarized dataset, and when I try and mutate, an error pops up:
summary_diversity[3:8] -> summary_speciesonly
H.protists.sum <- diversity(summary_speciesonly)
summary_speciesonly %>% mutate(diversity_sum = H.protists.sum) -> summary_diversity_total
Error: impossible to replicate vector of size in mutate
When I look at the differences between H.protists and H.protists.sum I find that H. protists is a named num value, whereas H.protists.sum is just a num value. Here is a header for each:
header(H.protists)
1 2 3 4 5 6
0.3144922 0.8980537 0.8740576 0.2771206 0.5701381 0.3502690
header(H.protists.sum)
[1] 1.336860 1.331183 1.193013 1.192450 1.258912 1.412319
I think that this is the reason that I am getting an error message, but I am not sure how to fix it. Help?

In R, how do I select a single value from one column, based upon a value in a second column?

thank you for the help. I am attempting to write an equation that uses values selected from an .csv file. It looks something like this, let's call it df.
df<-read.csv("SiteTS.csv", header=TRUE,sep=",")
df
Site TS
1 H4A1 -42.75209
2 H4A2 -43.75101
3 H4A3 -41.75318
4 H4C3 -46.76770
5 N1C1 -42.68940
6 N1C2 -36.95200
7 N1C3 -43.16750
8 N2A2 -38.58040
9 S4C1 -35.32000
10 S4C2 -34.52420
My equation requires the value in the TS column for each site. I am attempting to create a new column called SigmaBS with the results of the equation using TS.
df["SigmaBS"]<-10^(subset(df, Site=="H4A1"/10)
Which is where I am running into issues, as the subset function returns all columns that correlate with the Site column = H4A1
subset(df, Site =="H4A1")
Site TS
1 2411 -42.75209
But again, I only need the value -42.75209.
I apologize if this is a simple question, but I would very much appreciate any help you may be able to offer.
If you insist on using the subset function, it has a select argument:
subset(df, Site=="H4A1", select="TS")
A better option is to use [] notation:
df[df$Site=="H4A1", "TS"]
Or the $ operator:
subset(df, Site=="H4A1")$TS
You can use this simple command:
df$SigmaBS <- 10 ^ (df$TS / 10)
It sounds like you're trying to create a new column called SigmaBS where the values in each row are 10^(value of TS) / 10
If so, this code should work:
SigmaBS <- sapply(df$TS, function(x) 10^(x/10))
df$SigmaBS <- SigmaBS

Resources