how to delete specific rows from a specific variable in a dataframe - r

I'm trying to delete specific rows (843:1133) from a certain variable (spcfc_t == "91017181") in my dataframe neuedaten. My syntax seems wrong. I would be very pleased for any help. Here is what I tried:
neuedaten3 <- neudaten[which(neudaten$spcfc_t == "91017181",],[c(-843:1133),]
Thank you so much.

The minus is on wrong position. Try it without which and delete one comma. Set it like this:
neuedaten3<-neudaten[neudaten$spcfc_t == "91017181",][-c(843:1133),]
EDIT:
If you want to keep the rows that are not equal (spcfc_t == "91017181"), use:
neuedaten3 <- neudaten[-which(neudaten$spcfc_t == "91017181")[843:1133],]

Related

How to change data within a column in a dataset in R

I have created a for loop that goes through each row of a certain column. I want to change the information written in that cell depending on certain conditions, so I implemented an if/else statement.
However, the current problem is that the data is printing out one specific outcome: B.
I tried to combat this problem by exporting using write.csv and importing using read.csv.
When I applied the head() function though, I still got Medium for all rows.
Would anyone be able to help with this please?
Walkthrough the following example step by step. You need to assign for loop variable correctly. Could you show us the data frame where you are changing values? That would be helpful.
#creating new data frame
Df <- data.frame(a=c(1,2,3,4,5),b=c(2,3,5,6,8),c=c(10,4,2,3,7))
for (k in 1:dim(Df)[1]) {
#see how k is utilised and Df$Newcolumn creates new column in existing dataframe
if (Df$a[k]<=3) {
Df$Newcolumn[k] <- "low"
}else if (Df$a[k]>3 && Df$a[k]<=6) {
Df$Newcolumn[k] <- "medium"
}
}
you do not need to use a for loop for creating a new column based upon conditions. You could simply use this:
cool$b<-cool$a
cool$b[cool$a <3]<-"low"
cool$b[cool$a >= 3 & school_data2019$Taxable.Income< 4]<-"Medium"
cool$b[cool$a >= 4 & school_data2019$Taxable.Income < 5]<-"Rich"
cool$b[cool$a >5]<-"Very Rich"

How to conditionally replace values in r data frame using if/then statement

I'd like to learn how to conditionally replace values in R data frame using if/then statements. Suppose I have a data frame like this one:
df <- data.frame(
customer_id = c(568468,568468,568468,485342,847295,847295),
customer = c('paramount','paramount','paramount','miramax','pixar','pixar'));
I'd like to do something along the lines of,
"if customer in ('paramount','pixar') make customer_id 99. Else do nothing". I'm using this code, but it's not working:
if(df$customer %in% c('paramount','pixar')){
df$customer_id == 99
}else{
df$customer_id == df$customer_id
}
I get a warning message such as the condition has length > 1 and only the first element will be used. And the values aren't replaced.
I'd also like to know how to do this using logical operators to perform something like,
"if customer_id >= 500000, replace customer with 'fox'. Else, do nothing.
Very easy to do in SQL, but can't seem to figure it out in R.
My sense is that I'm missing a bracket somewhere?
How do I conditionally replace values in R data frame using if/then statements?
You can use ifelse, like this:
df$customer_id <- ifelse(df$customer %in% c('paramount', 'pixar'), 99, df$customer_id)
The syntax is simple:
ifelse(condition, result if TRUE, result if FALSE)
This is vectorized, so you can use it on a dataframe column.
You are using == instead of =(Assignment Operator) in if block. And I dont think there's need of else block in your example as you are not going to change values
if(df$customer %in% c('paramount','pixar')){
df$customer_id = 99
}
Above code will do the job for you

Copy rows of a df when NA == TRUE plus upper and lower rows in R

Not sure if the title is clear enough. I have the following dataframe: (ST.final is the name of the df)
n;date;ws;wd
1;2011-11-01 00:00:00;7,15;113,7
2;2011-11-01 00:10:00;7,25;115,7
3;2011-11-01 00:20:00;NA;NA
4;2011-11-01 00:30:00;NA;NA
5;2011-11-01 00:40:00;7,2;100,7
6;2011-11-01 00:50:00;6,95;104,7
And I want to create a new one with the rows containing NAs plus the upper and lower limit rows. The result should be something like this:
n;date;ws;wd
2;2011-11-01 00:10:00;7,25;115,7
3;2011-11-01 00:20:00;NA;NA
4;2011-11-01 00:30:00;NA;NA
5;2011-11-01 00:40:00;7,2;100,7
Maybe I am missing something but I have no clue on how to perform this task. So far I am trying to use this
interp.df <- ST.final[(is.na(ST.final$ws)),]
and as expected it just copy every row containing NAs. I searched for a solution on google but couldnt find anything similar.
Any help is appreciated.
You could try
idx <- which(!complete.cases(ST.final))
idx <- sort(unique(c(idx-1, idx, idx+1)))
ST.final[idx, ]

Creating a data frame using a single line code

I need to select data for 3 variables and place them in a new data frame using a single line of code. The data frame I'm pulling from is Dance, the 3 variables are Lindy, Blues and Contra.
I have this:
Dance$new<-subset(Dance$Type==Lindy, Dance$Type==Blues, Dance$Type==Contra)
Can you tell what I'm doing wrong?
There are a number of ways you can do this, but I'd forget the subset part
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",]
If you only want specific columns
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",c("Col1", "Col2")]
Alternatively
danceNew <- Dance[Dance$Type %in% c("Blues", "Contra", "Lindy"),]
Again, if you only want specific columns do the same. The advantage with the final options is you can pass the values in as a variable, thereby making it more dynamic, e.g
danceNames <- c("Lindy", "Blues", "Contra")
danceNew <- Dance[Dance$Type %in% danceNames,]
you're mixing up the variables and the dataframes
this should do the trick..
if your initial dataframe is called "Dance" and the new dataframe is called "Dance.new":
Dance.new <- subset(Dance, Dance$Type=="Lindy" & Dance$Type=="Blues" & Dance$Type=="Contra"); row.names(Dance.new) <- NULL
I like using "row.names(Dance.new) <- NULL" line so I won't have the useless column of "row.names" in the new dataframe
Thanks for your help everyone. This is what ended up working for me.
dancenew<-subset(Dance, Type=="Lindy" | Type== "Blues" | Type=="Contra")

Problem with data.table ifelse behavior

I am trying to calculate a simple ratio using data.table. Different files have different tmax values, so that is why I need ifelse. When I debug this, the dt looks good. The tmaxValue is a single value (the first "t=60" encountered in this case), but t0Value is all of the "t=0" values in dt.
summaryDT <- calculate_Ratio(reviewDT[,list(Result, Time), by=key(reviewDT)])
calculate_Ratio <- function(dt){
tmaxValue <- ifelse(grepl("hhep", inFile, ignore.case = TRUE),
dt[which(dt[,Time] == "t=240min"),Result],
ifelse(grepl("hlm",inFile, ignore.case = TRUE),
dt[which(dt[,Time] == "t=60"),Result],
dt[which(dt[,Time] == "t=30"),Result]))
t0Value <- dt[which(dt[,Time] == "t=0"),Result]
return(dt[,Ratio:=tmaxValue/t0Value])
}
What I am getting out is theResult for tmaxValue divided by all of the Result's for all of the t0Value's, but what I want is a single ratio for each unique by.
Thanks for the help.
You didn't provide a reproducible example, but typically using ifelse is the wrong thing to do.
Try using if(...) ... else ... instead.
ifelse(test, yes, no) acts very weird: It produces a result with the attributes and length from test and the values from yes or no.
...so in your case you should get something without attributes and of length one - and that's probably not what you wanted, right?
[UPDATE] ...Hmm or maybe it is since you say that tmaxValue is a single value...
Then the problem isn't in calculating tmaxValue? Note that ifelse is still the wrong tool for the job...

Resources