Removing columns from an index request in a data frame - r

Hi I currently have the code:
matrixed.data <- data.matrix(df[1:row.dim,7:col.dim])
Where the row.dim and col.dim are variables for the size of the whole frame. I would like to remove the column "df$WEATHER" that is included in the col.dim selection but don't know how to word it. I have tried adding - df$WEATHER and !df$WEATHER inside the bracket but fear I'm misinterpreting the scope of these commands.
Is it possible to do this without creating a new col.dim variable; I'm trying to keep the code as limitless as possible as the data frame may increase in size in the future.

Thank you digEmAll! I thought it would be reasonably simple I'm just a bit too green at R to think of something like that. For others what worked for me was:
(df[1:row.dim, setdiff(7:col.dim,which(names(df) == "WEATHER"))])

Related

Assigning a list to column properties

I have a list containing values and I want to assign it to the column properties of a table in spotfire. I am currently using a for loop to do it. Is there a better approach to this, like assigning the entire list in one go?
As mentioned previously I am doing it currently using a for loop which can be seen below:
high=c(5,2,10)
low=c(3,1,0)
for(col in 1:ncol(temp)){
attributes(temp[,col])$SpotfireColumnMetaData$limits.whatif.upper=(high[col])[1]
attributes(temp[,col])$SpotfireColumnMetaData$limits.whatif.lower=(low[col)[1]
}
}
I have also tried just to do
attributes(temp2)$SpotfireColumnData$limits.whatif.upper=high
but that didnt seem to work.
So I want the column for limits.whatif.upper to be 5 for the first row, 2 for the second, and 10 for the third. As I said this code works, but I want to see if there is a faster way of doing it, since it seems that accessing the column property every time and changing it slows down the code a lot.The columns properties already exist so I am not creating new ones with this code.
It seems that python works faster than R with column properties. So if you need to do it faster, it may be better just to transfer the data over to python and do it from there. I dont have as much expierence in R, so it may just be poorly written R code as well.

Not aggregating correctly

My goal of this code is to create a loop that aggregates each company's word frequency by a certain principle vector I created and adds it to a list. The problem is, after I run this, it only prints the 7 principles that I have rather than the word frequencies along side them. The word frequencies being the certain column of the FREQBYPRINC.AG data frame. Individually, running this code without the loop and just testing out a certain column, it works no problem. For some reason, the loop doesn't want to give me the correct data frames for the list. Any suggestions?
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
attach(FREQBYPRINC.AG)
list.agg[i]<-aggregate(FREQBYPRINC.AG[,i+1],by=list(Type=principle),FUN=sum,na.rm=TRUE)
}
I really wish I could help. After reading your statement, It seems that to you , you feel that the code should be working and it is not. Well maybe there exists a glitch.
Since you had previously specified list. agg as a list, you need to subset it with double square brackets. Try this one out:
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
list.agg[[i]]<-aggregate(FREQBYPRINC.AG[,i+1],by=list
(Type=principle),FUN=sum,na.rm=TRUE)}

Declaration of mass variables in column headings in R

I cannot figure out how to assign the column headers from my imported xlsx sheet as variables. I have several column headers, for example DAY_CHNG and INPUT_CHG. So far, I can only run gls(DAY_CHG~INPUT_CHG) by first assigning the values as variables by X<-mydata$DAY_CHG. Is there some command to get these variables assigned automatically when I import?
I had horrible problems getting the program up and running, by the way, due to firewalls at the firm for which I'm working, wondering if that's causing some of the issue.
Any help is much appreciated. Thanks!
attach(mydata) will allow you to directly use the variable names. However, attach may cause problems, especially with more complex data/analyses (see Do you use attach() or call variables by name or slicing? for a discussion)
An alternative would be to use with, such as with(mydata, gls(DAY_CHG~INPUT_CHG)
I would suggest using the $ in order to use the headers as variables and still be able to use other data sets. All that needs to be done is assign the data to an object such as your mydata and by putting a $ immediately following, you will be able to refer to your headers as variables.
As an example for your case, instead of creating a new object x, simply take what you assigned x to and put it directly into your command.
gls(mydata$DAY_CHG ~ mydata$INPUT_CHG)
when it becomes more complicated with more data sets this will allow you to have access to all of them still while not limiting yourself to the data set you attach()

How to get R to use a certain dataset for multiple commands without usin attach() or appending data="" to every command

So I'm trying to manipulate a simple Qualtrics CSV, and I want to use colSums on certain columns of data, given a certain filter.
For example: within the .csv file called data, I want to get the sum of a few columns, and print them with certain labels (say choice1, choice2 etc). That is easy enough by itself:
firstqn<-data.frame(choice1=data$Q7_2,choice2=data$Q7_3,choice3=data$Q7_4);
secondqn<-data.frame(choice1=data$Q8_6,choice2=data$Q8_7,choice3=data$Q8_8)
print colSums(firstqn); print colSums(secondqn)
The problem comes when I want to repeat the above steps with different filters, - say, only the rows where gender==2.
The only way I know how is to create a new dataset data2 and replace data$ with data2$ in every line of the above code, such as:
data2<-(data[data$Q2==2,])
firstqn<-data.frame(choice1=data2$Q7_2,choice2=data2$Q7_3,choice3=data2$Q7_4);
however i have 6 choices for each of 5 questions and am planning to apply about 5-10 different filters, and I don't relish the thought of copy/pasting data2 and `data3' etc hundreds of times.
So my question is: Is there any way of getting R to reference data by default without using data$ in front of every variable name?
I can probably use attach() to achieve this, but i really don't want to:
data2<-(data[data$Q2==2,])
attach(data2)
firstqn<-data.frame(choice1=Q7_2,choice2=Q7_3,choice3=Q7_4);
detach(data2)
is there a command like attach() that would allow me to avoid using data$ in front of every variable, for a specified amount of code? Then whenever I wanted to create a new filter, I could just copy/paste the same code and change the first command (defining a new dataset).
I guess I'm looking for some command like with(data2, *insert multiple commands here*)
Alternatively, if anyone has a better way to do the above in an entirely different way please enlighten me - i'm not very proficient at R (yet).

Using ifelse statement to condense variables

New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!

Resources