Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a dataset with monthly return of various stocks over a certain period of time where the months are already formatted to consecutively numbered month-ID. To compare those I have imported a .csv file with unique one-month interest rates during that time and saved it as a vector. Now, I want to add this vector to my datasaet. Problem is the difference in length.
My question is: how can I extend this vector to the length of my data by duplicating the elements such that every rate is correctly assigned to the corresponding month?
Say the stock dataset is called stocks with a variable called months. and the interest vector dataset is called interest. I assume they're both in the same order, and have the same months,
Then add the months to the interests with int_dat <- data.frame(months = unique(stocks$months), interest = interest). Add the months to the stocks data with stocks_new <- merge(stocks, int_dat, by = 'month', all = TRUE).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
I have a dataset that I want to randomly sample up to 6 rows for each animal ID per day. Not every animal ID was sampled each day, and some animals were sampled less than 6 times per day. For example - if an animal was only sampled 2 times in a given day, then I want to retain both those samples. I want to retain all the info in the sampled row.
in the image:
New_ID is the animal Id
New_Date_Hour is the hour of the day (this is the variable that I want to randomly sample 6 times per day)
There are 13 different animal IDs
[Image of Example data]
I have tried a few different approaches but without success.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
So I have data on CpG sites, and a column which defines their chromosomal position (e.g. 10000).
How would I change these values such that I can attain values in a range dependent on that original value. For example 10000 would be +/- 500 (9500 - 10500).
I'm going to be using the same parameters for each variable regardless of it's value.
I have tried
df$upstream <- df$value - 500
df$downstream <- df$value + 500
Which returns the upper and lower values I need, but how do I get this 'range' into a single column (e.g. such that I can search for it in genomebrowser)?
I worked with such dataset during and on my side, to perform this, I use to create new columns on my dataset using (as mentioned in the comment):
df$upstream = df$position - 500
df$downstream = df$position + 500
Hope it helped
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have a huge dataframe that contains a column of gene's IDs. Each Gene ID appears in the column in a different number of times.
I want to extract from the dataframe a column that presents every Gene ID once, and at the same time I want to keep the data as a dataframe and not to change it to a list with factors.
example:
GeneID
589034
489034
589034
589034
48999
99449
99449
And i want my output to be:
GeneID
589034
489034
48999
99449
You can use the unique function for this:
dat = c('GeneID', '589034', '489034', '589034', '589034', '48999', '99449', '99449')
unique(dat)
[1] "GeneID" "589034" "489034" "48999" "99449"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I currently have multiple data frames (named cont, cont2 .... cont7), and need to combine them
Each data frame has 2 columns; date and a mean temperature value (taken from a netcdf file)
The dates are monthly values, in
cont = 1951-1 to 1960-12
cont7 = 2011-1 to 2014-12
(basically monthly values split into groups of 10 years, from Jan 1951- Dec 2014)
How can I extent my data frame so all values are in 1 table? I want to make it continuous so as to plot a time series
Perhaps I do not understand your problem correctly, but would not rbind() do the Job?
cont_all <- rbind(cont1,cont2,cont3,cont4,con5,cont6,cont7)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have data frame consisting of tick data of options for the day. I want to evaluate max trade for each ticker for every minute. I am using for loop but as I have 3000+ tickers and 9600000+ trades,its turning out to be really slow.
Is there any way to fetch row values from dataframe for each ticker using map/dictionary/hash table?
Again objective is :
max trade in every minute for every ticker for that day
optionData --->ticker--->Minute data---> max trade in that minute
Given:
optionData as data frame with columns like date,time,tickerSymbols,TradeVolume,Delta,iVol
Make sure you install and load the necessary packages. If they're already installed, don't worry about running the install.pacakges line. It's commented out in this example.
#install.packages(c('dplyr', 'lubridate'))
library(dplyr)
library(lubridate)
So, this should do the trick if your time variable is more granular than minute. (Like it includes seconds, etc.)
optionData %>%
mutate(minute = minute(time),
hour = hour(time)) %>%
group_by(tickerSymbols, date, hour, minute) %>%
filter(TradeVolume == max(TradeVolume))
This should do the trick if your time variable is already representative of minute.
optionData %>%
group_by(tickerSymbols, date, time) %>%
filter(TradeVolume == max(TradeVolume))
Both answers are assuming that TradeVolume is where we can find the max trade that you're looking for.