I need to add to my data a new variable, but I would like to do it using the mutate function. How can I do it? ISLR library
Create a new variable called "HighVol" that has the classes "yes" and "no"
to indicate whether the location sold 10,000 units or more in the past year.
How many stores produced a high volume?
Example below.
carseats.df$HighVol <- factor(carseats.df$HighVol,
levels = c(0,1),
labels = c("No", "Yes"))
You are going to include the entire data frame if you use mutate. You'll want to whole data frame if the assignment of yes or no is conditionally based on sales.
library(tidyverse)
# create carseats.df
set.seed(39582) # make it repeatable
carseats.df <- data.frame(sales = rnorm(100, 10000, 505))
# now create conditional variable
carseats.df <- carseats.df %>%
mutate(HighVol = ifelse(sales > 10000, # true or false
"yes", # result if true
"no") %>%
as.factor()) # result if false
head(carseats.df)
# sales HighVol
# 1 9992.190 yes
# 2 10077.482 no
# 3 9507.145 yes
# 4 10780.788 no
# 5 10433.133 no
# 6 10907.665 no
It looks like you're fairly new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from dput(head(dataObject))) and any libraries you are using. Check it out: making R reproducible questions.
The reason you haven't seen any help is most likely due to the lag of meaningful tags. You only have the tag tree which isn't meaningful. At a minimum, you would want to include a tag for the programming language: r. You could also add things like mutate or the library it's derived from, dplyr.
I am analyzing six single-cell RNA-seq datasets with Seurat package.
These 6 datasets were acquired through each different 10X running, then combined with batch effect-corrected via Seurat function "FindIntegrationAnchors".
Meanwhile, among the 6 datasets, data 1, 2, 3 and 4 are "untreated" group, while data 5 and 6 belongs to "treated" group.
I merged all the 6 datasets together with batch-corrected, but I also need to compare features of "untreated" vs "treated".
How can I group data 1,2,3 and 4 into "untreated group", and data 5 and 6 into "treated group", and then perform downstream analysis?
Thanks.
One quick and dirty way to do this, is to add the information before merging the Seurat objects:
...
so_samples[[1]]#meta.data$treatment <- "control"
so_samples[[2]]#meta.data$treatment <- "control"
so_samples[[3]]#meta.data$treatment <- "control"
so_samples[[4]]#meta.data$treatment <- "control"
so_samples[[5]]#meta.data$treatment <- "treated"
so_samples[[6]]#meta.data$treatment <- "treated"
...
anchors <- FindIntegrationAnchors(object.list = so_samples, dims = 1:20)
so_all_samples <- IntegrateData(anchorset = anchors, dims = 1:20)
In general, it would be better to load such meta data from a file and join it to the seurat object without such error-prone copy-paste code. Also note that it is in general a bad idea to modify R S4 objects (those where you can access elements with #) like this, but the functions provided to modify Seurat objects provided by the Seurat package are so cumbersome to use that I doubt they will ever change the underlying data structure.
How do I extract the 'VarCompContrib" column in the data frame produced using the gageRR function in R?
This is for a GageRR analysis of a measurement system. I'm trying to make a very user friendly program where other people can just enter the information required, like number of operators, parts, and measurements, as well as the measurements themselves, and output the correct analysis. I'm gonna use an if-statement later on to do the "analysis" portion, but I am having trouble actually managing the data frame produced with gageRR.
library(MASS)
library(Rsolnp)
library(qualityTools)
design = gageRRDesign(Operators=3, Parts=10, Measurements=2, randomize=FALSE)
response(design) = c(23,22,22,22,22,25,23,22,23,22,20,22,22,22,24,25,27,28,
23,24,23,24,24,22,22,22,24,23,22,24,20,20,25,24,22,24,21,20,21,22,21,22,21,
21,24,27,25,27,23,22,25,23,23,22,22,23,25,21,24,23)
gdo=gageRR(design)
plot(gdo)
I am looking to get a 7 number column vector under VarCompContrib
For starters, you can look at the structure of gdo with str(gdo). From there, we see that Varcomp is a slot, so we can access it with gdo#Varcomp and just convert it to a data.frame:
library(qualityTools)
design <- gageRRDesign(Operators = 3, Parts = 10, Measurements = 2, randomize = FALSE)
response(design) <- c(
23,22,22,22,22,25,23,22,23,22,20,22,22,22,24,25,27,28,23,24,23,24,24,22,22,22,24,23,22,24,
20,20,25,24,22,24,21,20,21,22,21,22,21,21,24,27,25,27,23,22,25,23,23,22,22,23,25,21,24,23
)
gdo <- gageRR(design)
data.frame(gdo#Varcomp)
# totalRR repeatability reproducibility a a_b bTob totalVar
# 1 1.66441 1.209028 0.4553819 0.4553819 0 1.781211 3.445621
I need to perform the following sequence:
Open Excel Workbook
Read specific worksheet into R dataframe
Read from a database updating dataframe
Write dataframe back to worksheet
I have steps 1-3 working OK using the BERT tool. (the R scripting interface)
For step 2 I use range.to.data.frame from BERT
Any pointer on how to perform step 4? There is no data.frame.to.range
I tried range$put_Value(df) but no error return and no update to Excel
I can update a single cell from R using put_Value - which I cannot see documented
#
# manipulate status data using R BERT tool
#
wb <- EXCEL$Application$get_ActiveWorkbook()
wbname = wb$get_FullName()
ws <- EXCEL$Application$get_ActiveSheet()
topleft = ws$get_Range( "a1" )
rng = topleft$get_CurrentRegion()
#rngbody = rng$get_Offset(1,0)
ssot = rng$get_Value()
ssotdf = range.to.data.frame( ssot, headers=T )
# emulate data update on 2 columns
ssotdf$ServerStatus = "Disposed"
ssotdf$ServerID = -1
# try to write df back
retcode = rng$put_Value(ssotdf)
This answer doesn't use R Excel BERT.
Try the openxlsx library. You probably can do all the steps using that library. For the step 4, after installing openxlsx library, the following code will write a file:
openxlsx::write.xlsx(ssotdf, 'Dataframe.xlsx',asTable = T)
I think your problem is that you are not changing the size of the range, so you are not going to see your new columns. Try creating a new range that has two extra columns before you insert the data.
I just had the same question and was able to resolve it by transforming the data.frame to a matrix in the call to put_value. I figured this out after playing with the old version in excel-functions.r. Try something like:
retcode = rng$put_Value(as.matrix(ssotdf))
You may have already solved your problem but, if not, the following stripped down R function does what I think you need:
testDF <- function(rng1,rng2){
app <- EXCEL$Application
ref1 <- app$get_Range( rng1 ) # get source range reference
data <- ref1$get_Value() # get source range data
#
ref2 <- app$get_Range( rng2 ) # get destination range reference
ref2$put_Value( data ) # put data in destination range
}
I simulated a dataframe by setting values in range "D4:F6" of the speadsheet to:
col1 col2 col3
1 2 txt1
7 3 txt2
then ran
testDF("D4:F6","H10:J12")
in the Bert console. The dataframe then appears in range "H10:J12".
I am new to R programming. I was trying to visualize some dataset. I was using Googlevis in R and was unable to visualize it.
The error I got was:
Error: Length of logical index vector must be 1 or 8, got: 14835
Can someone help?
Dataset is here:
https://www.kaggle.com/c/predict-west-nile-virus/data
Code is below
# Read competition data files:
library(readr)
data_dir <- "C:/Users/Wesley/Desktop/input"
train <- read_csv(file.path(data_dir, "train.csv"))
spray <- read_csv(file.path(data_dir, "spray.csv"))
# Generate output files with write_csv(), plot() or ggplot()
# Any files you write to the current directory get shown as outputs
# Install and read packages
library(lubridate)
library(googleVis)
# Create useful date columns
spray$Date <- as.Date(as.character(spray$Date),format="%Y-%m-%d")
spray$Week <- isoweek(spray$Date)
spray$Year <- year(spray$Date)
# Create a total count of measurements
spray$Total <- 1
for(i in 1:nrow(spray)) {
spray$Total[i] = i
}
# Aggregate data by Year, Week, Trap and order by old-new
spray_agg <- aggregate(cbind(Total)~Year+Week+Latitude+Longitude,data=spray,sum)
spray_agg <- spray[order(spray$Year,spray$Week),]
# Create a misc format for Week for Google Vis Motion Chart
spray_agg$Week_Format <- paste(spray_agg$Year,"W",spray_agg$Week,sep="")
# Function to create a motion chart together with a overview table
# It takes the aggregated data as input as well as a year of choice (2007,2009,2011,2013)
# It filters out "no presence" weeks since they distort the graphical view
# Next to that it creates an overview table of that year
# With gvisMerge you can merge the 3 html outputs into 1
create_motion <- function(data=spray_agg,year=2011){
data_motion <- data[data$Year==year]
motion <- gvisMotionChart(data=data_motion,idvar="Total",timevar="Week_Format",xvar="Longitude",yvar="Latitude"
,sizevar=0.1,colorvar="Blue",options=list(width="600"))
return(motion)
}
# Get the per year motion charts
#motion1 <- create_motion(spray_agg,2007)
#motion2 <- create_motion(spray_agg,2009)
motion3 <- create_motion(spray_agg,2011) : (Error: Length of logical index vector must be 1 or 8, got: 14835)
motion4 <- create_motion(spray_agg,2013) :(Error: Length of logical index vector must be 1 or 8, got: 14835)
# Merge them together into 1 dashboard
output <- gvisMerge(gvisMerge(motion1,motion2,horizontal=TRUE),gvisMerge(motion3,motion4,horizontal=TRUE),horizontal=FALSE)
plot(output)
# Plot the output in your browser