R - Change value names in data.frame but keeping a distinguishing mark - r

i have the following data.frame:
> goals.names
id name
1 1 Registro NL Widget
2 2 Fidelizado
3 3 Entusiasmado
4 4 Registro Newsletter
How can I change every id value to look like goal1, goal2, goal3, goal4, goalX? I suppose, that I have to first get the id, save it to a variable, and use it again to substitute it with the new key. Something for what "for each" would really be helpfull, but I have not found a substitute for "for each" in R.

You could try mutate() from the dplyr package, like this:
library(dplyr)
goals.names <- mutate(goals.names, id = paste("goal",id,sep=""))

Related

Static variable next to a dynamic variable in R

I posted yesterday another question but I feel I need to clarify it.
Let's say I have this code
md.NAME <- (subset(MyData, HotelName=="ALAMEDA"))
md.NAME.fc <- (subset(md.ALAMEDA, TIPO=="FORECAST"))
md.NAME.fc.bar <- (subset(md.ALAMEDA.fc, Market.Segment=="BAR"))
What I want is that NAME changes according to a variable set before those 3 lines are run,
So NAME is just dynamic in the sense that before these 3 lines I could say, ok, NAME now is equal to JOHN, but then, I could say that NAME is now equal to PATRIC.
So after running those 3 lines, twice (once for John and once for Patric) somehow in the environment I will get something like this:
6 dataframes, 3 for JOHN and 3 for PATRIC
DATAFRAME 1 WILL BE md.JOHN
DATAFRAME 2 WILL BE md.JOHN.fc
DATAFRAME 3 WILL BE md.JOHN.fc.bar
DATAFRAME 1 WILL BE md.PATRIC
DATAFRAME 2 WILL BE md.PATRIC.fc
DATAFRAME 3 WILL BE md.PATRIC.fc.bar
All the answers I had so far would help me only if "md" and "fc" or "fc.bar" are always the same. But I will have several variables like this, which will change a lot as far as the naming goes. So, it is the center part (NAME) the only one that should change.
I could even have something like:
md.test$NAME <- ...

How to deal with non-consecutive (non-daily) dates in R, while looping?

I am trying to write a script that loops through month-end dates and compares associated fields, but I am unable to find a way to way to do this.
I have my data in a flatfile and subset based on 'TheDate'
For instance I have:
date.range <- subset(raw.data, observation_date == theDate)
Say TheDate = 2007-01-31
I want to find the next month included in my data flatfile which is 2007-02-28. How can I reference this in my loop?
I currently have:
date.range.t1 <- subset(raw.data, observation_date == theDate+1)
This doesnt work obviously as my data is not daily.
EDIT:
To make it more clear, my data is like below
ticker observation_date Price
ADB 31/01/2007 1
ALS 31/01/2007 2
ALZ 31/01/2007 3
ADB 28/02/2007 2
ALS 28/02/2007 5
ALZ 28/02/2007 1
I am using a loop so I want to skip from 31/01/2007 to 29/02/2007 by recognising it is the next date, and use that value to subset my data
First get unique values of date like so:
unique_dates<-unique(raw.data$observation_date)
The sort these unique dates:
unique_dates_ordered<-unique_dates[order(as.Date(unique_dates, format="%Y-%m-%d"))]
Now you can subset based on the index of unique_dates_ordered i.e.
subset(raw.data, raw.data$observation_date == unique_dates_ordered[i])
Where i = 1 for the first value, i = 2 for the second value etc.

R Refer to (part of) data frame using string in R

I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work
It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.
Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6

In R, how do I select a single value from one column, based upon a value in a second column?

thank you for the help. I am attempting to write an equation that uses values selected from an .csv file. It looks something like this, let's call it df.
df<-read.csv("SiteTS.csv", header=TRUE,sep=",")
df
Site TS
1 H4A1 -42.75209
2 H4A2 -43.75101
3 H4A3 -41.75318
4 H4C3 -46.76770
5 N1C1 -42.68940
6 N1C2 -36.95200
7 N1C3 -43.16750
8 N2A2 -38.58040
9 S4C1 -35.32000
10 S4C2 -34.52420
My equation requires the value in the TS column for each site. I am attempting to create a new column called SigmaBS with the results of the equation using TS.
df["SigmaBS"]<-10^(subset(df, Site=="H4A1"/10)
Which is where I am running into issues, as the subset function returns all columns that correlate with the Site column = H4A1
subset(df, Site =="H4A1")
Site TS
1 2411 -42.75209
But again, I only need the value -42.75209.
I apologize if this is a simple question, but I would very much appreciate any help you may be able to offer.
If you insist on using the subset function, it has a select argument:
subset(df, Site=="H4A1", select="TS")
A better option is to use [] notation:
df[df$Site=="H4A1", "TS"]
Or the $ operator:
subset(df, Site=="H4A1")$TS
You can use this simple command:
df$SigmaBS <- 10 ^ (df$TS / 10)
It sounds like you're trying to create a new column called SigmaBS where the values in each row are 10^(value of TS) / 10
If so, this code should work:
SigmaBS <- sapply(df$TS, function(x) 10^(x/10))
df$SigmaBS <- SigmaBS

Simple lookup to insert values in an R data frame

This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:
Case zip market
1 44485 NA
2 44488 NA
3 43210 NA
There are over 3.5 million records.
Then, I have a second data frame, 'zipcodes'.
market zip
1 44485
1 44486
1 44488
... ... (100 zips in market 1)
2 43210
2 43211
... ... (100 zips in market 2, etc.)
I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.
Since you don't care about the market column in alldata, you can first strip it off using and merge the columns in alldata and zipcodes based on the zip column using merge:
merge(alldata[, c("Case", "zip")], zipcodes, by="zip")
The by parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield").
Another option that worked for me and is very simple:
alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])
With such a large data set you may want the speed of an environment lookup. You can use the lookup function from the qdapTools package as follows:
library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])
Or
alldata$zip %l% zipcodes[, 2:1]
Here's the dplyr way of doing it:
library(tidyverse)
alldata %>%
select(-market) %>%
left_join(zipcodes, by="zip")
which, on my machine, is roughly the same performance as lookup.
The syntax of match is a bit clumsy. You might find the lookup package easier to use.
alldata <- data.frame(Case=1:3, zip=c(44485,44488,43210), market=c(NA,NA,NA))
zipcodes <- data.frame(market=c(1,1,1,2,2), zip=c(44485,44486,44488,43210,43211))
alldata$market <- lookup(alldata$zip, zipcodes$zip, zipcodes$market)
alldata
## Case zip market
## 1 1 44485 1
## 2 2 44488 1
## 3 3 43210 2

Resources