How to change data within a column in a dataset in R - r

I have created a for loop that goes through each row of a certain column. I want to change the information written in that cell depending on certain conditions, so I implemented an if/else statement.
However, the current problem is that the data is printing out one specific outcome: B.
I tried to combat this problem by exporting using write.csv and importing using read.csv.
When I applied the head() function though, I still got Medium for all rows.
Would anyone be able to help with this please?

Walkthrough the following example step by step. You need to assign for loop variable correctly. Could you show us the data frame where you are changing values? That would be helpful.
#creating new data frame
Df <- data.frame(a=c(1,2,3,4,5),b=c(2,3,5,6,8),c=c(10,4,2,3,7))
for (k in 1:dim(Df)[1]) {
#see how k is utilised and Df$Newcolumn creates new column in existing dataframe
if (Df$a[k]<=3) {
Df$Newcolumn[k] <- "low"
}else if (Df$a[k]>3 && Df$a[k]<=6) {
Df$Newcolumn[k] <- "medium"
}
}

you do not need to use a for loop for creating a new column based upon conditions. You could simply use this:
cool$b<-cool$a
cool$b[cool$a <3]<-"low"
cool$b[cool$a >= 3 & school_data2019$Taxable.Income< 4]<-"Medium"
cool$b[cool$a >= 4 & school_data2019$Taxable.Income < 5]<-"Rich"
cool$b[cool$a >5]<-"Very Rich"

Related

How to create a new variable in R that returns 1 if a case has a missing value while another variable has an observed value?

I have two variables containing missing data loon and profstat. For a better overview of the data that are missing and are needed to impute, I wanted to create an additional variable problem in the data frame, that would return for each case 1 if loon is missing and profstat is observed, and 0 if otherwise. I have generated the following code, which only gives me as output x[] = 1. Any solution to this problem?
{
problem <- dim(length(t))
for (i in 1:nrow(dflapopofficial))
{
if (is.na(dflapopofficial$loon[i])==TRUE & is.na(dflapopofficial$profstat[i])==FALSE) {
dflapopofficial$problem[i]=1
} else {
dflapopofficial$problem[i]=0
}
return(problem)
}
There are a few things that could be improved here:
Remember, many operations in R are vectorized. You don't need to loop through each element in a vector when doing logical checks etc.
is.na(some_condition) == TRUE is just the same as is.na(some_condition) and is.na(some_condition) == FALSE is the same as !is.na(some_condition)
If you want to write a new column inside a dataframe, and you are referring to several variables in that dataframe, using within can save you a lot of typing - particularly if your dataframe has a long name
You are returning problem, yet in your loop, you are writing to dflapipofficial$problem which is a different variable.
If you want to write 1s and 0s, you can implicitly convert logical to numeric using +(logical_vector)
Putting all this together, you can replace your whole loop with a single line:
within(dflapopofficial, problem <- +(is.na(loon) & !is.na(profstat)))
Remember to store the result, either back to the dataframe or to a copy of it, like
df <- within(dflapopofficial, problem <- +(is.na(loon) & !is.na(profstat)))
So that df is just a vopy of dflapopofficial with your extra column.

Creating a new column conditional on the character values of two other columns

I am only a learner in R and have a fairly basic question.
I have a dataset called edata with two columns relevant to the posted question. These are GazeCue and TargetLocation. I wish to create a new column called CueType that shows as "Cued" or "Uncued" based on the values of the other two columns.
When GazeCue is equal to RightGazePic1.png and TargetLocation is equal to TargetR, the new CueType column should show as "Cued". Similarly when GazeCue is equal to LeftGazePic1.png and TargetLocation is equal to TargetL, the CueType column should again show as "Cued". Any other variation of values should show in CueType as "uncued".
An example of what I would like is pasted below.
GazeCue TargetLocation CueType
RightGazePic1.png TargetR Cued
LeftGazePic1.png TargetL Cued
RightGazePic1.png TargetL Uncued
LeftGazePic1.png TargetR Uncued
I have been trying to complete this code using ifelse but with no luck. Any advice would be greatly appreciated.
This is pretty basic. One way would be to extract the L and R from both the png and the Target, and compare those using ifelse:
CueType <- ifelse(substr(GazeCue, 1,1) == substr(TargetLocation, 7,7),
"Cued",
"Uncued")
If the names can vary a bit more, take a look at gsub to extract the relevant information from the strings before making the comparison.
You can also make use of the fact that R recycles vectors:
ix <- (substr(df$GazeCue,1,1) == substring(df$TargetLocation,7)) + 1
df$CueType <- c("Uncued","Cued")[ix]
you can try this:
edata[,3] <- NA #add a new column
names(edata)[3] <- "CueType" #add a name column
for (i in 1 : nrow(edata)) {
if (edata$GazeCue[i] == 'RightGazePic1.png' & edata$TargetLocation[i]==
'TargetR') {
edata[i,3] <- "Cued"
} else if (edata$GazeCue[i] == 'LeftGazePic1.png' & data$TargetLocation[i]
=='TargetL') {
edata[i,3] <- "Cued"
}
else {
edata[i,3] <- "Uncued"
}
}
Test, it should work properly!

Creating a data frame using a single line code

I need to select data for 3 variables and place them in a new data frame using a single line of code. The data frame I'm pulling from is Dance, the 3 variables are Lindy, Blues and Contra.
I have this:
Dance$new<-subset(Dance$Type==Lindy, Dance$Type==Blues, Dance$Type==Contra)
Can you tell what I'm doing wrong?
There are a number of ways you can do this, but I'd forget the subset part
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",]
If you only want specific columns
danceNew <- Dance[Dance$Type=="Lindy"|Dance$Type=="Blues"|Dance$Type=="Contra",c("Col1", "Col2")]
Alternatively
danceNew <- Dance[Dance$Type %in% c("Blues", "Contra", "Lindy"),]
Again, if you only want specific columns do the same. The advantage with the final options is you can pass the values in as a variable, thereby making it more dynamic, e.g
danceNames <- c("Lindy", "Blues", "Contra")
danceNew <- Dance[Dance$Type %in% danceNames,]
you're mixing up the variables and the dataframes
this should do the trick..
if your initial dataframe is called "Dance" and the new dataframe is called "Dance.new":
Dance.new <- subset(Dance, Dance$Type=="Lindy" & Dance$Type=="Blues" & Dance$Type=="Contra"); row.names(Dance.new) <- NULL
I like using "row.names(Dance.new) <- NULL" line so I won't have the useless column of "row.names" in the new dataframe
Thanks for your help everyone. This is what ended up working for me.
dancenew<-subset(Dance, Type=="Lindy" | Type== "Blues" | Type=="Contra")

Append row to dataset in R

I'm trying to create a new dataset deleting some rows (trough a comparison with a dataset ds1) from ds2. I wrote a function that should do this:
compare<-function(ds1,ds2){
for(i in 1:length(ds1$long)){
for(j in 1:length(ds2$long)){
if(ds1$long[i]<(ds2$long[j]+500) & ds1$long[i]>(ds2$long[j]-500)){
if(ds1$lat[i]<(ds2$lat[j]+500) & ds1$lat[i]>(ds2$lat[j]-500)){
ds3<-data.frame(merge(ds2[j,],ds3))
}
}
}
}
return(ds3)
}
ds3 is the dataset I want to return, it should be formed by the rows of the original dataset ds2 that satisfy the condition above.
My function gives me an error:
Error in as.data.frame(y) :
argument "y" is not specified and has not a definite value
Is "merge()" the right function for creating such a dataset, appending rows to ds3?
If not, which is the right function to do this?
Thank you all in advance
Edit: I modified the function thanks to your tips, using
ds3<-data.frame()
ds3<-rbind(ds3,ds2[j,])
instead of
ds3<-data.frame(merge(ds2[j,],ds3))
Now I've got this error:
Errore in rbind(ds3, ds2[j, ]) :
no method for coercing this S4 class to a vector
If I use rbind(), can I operate with SpatialPoints? (data contained in my dataset are spatial points)
Edit2: I have 2 datasets, one with 330 rows (points on irregular grid, ds1), one with ~150000 rows (points on regular grid, ds2). I want to compute correlation between the variables in the first dataset and the variables in the second one. For making it, I want to "reduce" the second dataset to the dimensions of the first, saving only the points which have the same coordinates (or quasi) in both datasets.
Without a small example this has no testing but if you are happy with the performance of the for-loop then this may be what you are attempting:
compare<-function(ds1,ds2){
for(i in 1:length(ds1$long)){
for(j in i:length(ds2$long)){ # I think starting at 1 will give twice as many hits
if(ds1$long[i]<(ds2$long[j]+500) & ds1$long[i]>(ds2$long[j]-500)){
if(ds1$lat[i]<(ds2$lat[j]+500) & ds1$lat[i]>(ds2$lat[j]-500)){
if( length(d3) ) { # check to see if d3 exists or not
ds3<-rbind( ds3, ds2[,j] ) } else { # append as the next row
d3 <- ds2[ ,j] } # should only get executed once
}
}
}
}
return(ds3)
}
I tried to avoid the added overhead of retesting for j,i matches where you already had an i,j match. Again, I cannot tell for sure this is appropriate because the problem description still is not exactly clear to me.

Reading variable name of dataframe into another dataframe using loops

So using a for loop I was able to break my 1.1 million row dataset in r into 110 tables of approximately 10,000 rows each in hopes of getting r to handle the data better. I now want to run another for loop that assigns the values in each of these tables to a different dataframe name.
My table names are:
Pom_1
Pom_2
Pom_3
...
Pom_110
What I want to do is create a for loop like the following:
for (i in 1:110)
{
Pom <- read.table(paste("Pom",i,sep = "_"))
for (j in 1:nrows(Pom))
{do something}
}
So I want to loop through the array and assign the values of each Pom table to "Pom" so that I can then run a for loop on each subsection of Pom. This problem is the read.table function does not seem to be the right one. Any ideas?
Can you give a more specific example of what you want to do withing each dataframe? You should avoid using the inner loop when possible and if you really need to have a look at ?apply
nrow instead of nrows
This is a generic solution using a example data.frame. The function you're looking for is assign, check it's help page:
Pom = data.frame(x = rnorm(30)) #original data.frame
n.tables = 3 # number of new data.frames you want to creat
Pom.names = paste("Pom",1:3,sep="") # name of all new data.frames
breaks = nrow(Pom)/n.tables * 0:n.tables # breaks of the original data.frame
for (i in 1:n.tables) {
rows = (breaks[i]+1):breaks[i+1] # which rows from Pom are going to be assign to the new data.frame?
assign(Pom.names[i],Pom[rows,]) # create new data.frame
}
ls()
[1] "breaks" "i" "n.tables" "Pom" "Pom.names" "Pom1"
[7] "Pom2" "Pom3" "rows"
I'm willing to bet the problem with your table call is that you aren't specifying the file extension (assuming Pom_1 - Pom_110 are files in your working directory, which I think they are since you're using read.table).
You can fix it by the following
fileExtension<-".xls" #specify your extension, I assume xls
for (i in 1:110)
{
tablename<-paste("Pom",i,sep = "_")
Pom <- read.table(paste(tablename, fileExtension, sep=""))
for (j in 1:nrows(Pom))
{do something}
}
Of course that's assuming a couple things about how everything in your problem is set up, but it's my best guess based on your description and code

Resources