Merge 2 columns into one in dataframe [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
This should be simple, but I am struggling with it.
I want to combine two columns in a single dataframe into one. I have separate columns for custemer ID (20227) and year (2009). I want to create a new column that has both (2009_20227).

You could use paste
transform(dat, newcol=paste(year, customerID, sep="_"))
Or use interaction
dat$newcol <- as.character(interaction(dat,sep="_"))
data
dat <- data.frame(year=2009:2013, customerID=20227:20231)

Some alternative way with function unite in tidyr:
library(tidyr)
df = data.frame(year=2009:2013, customerID=20227:20231) # using akrun's data
unite(df, newcol, c(year, customerID), remove=FALSE)
# newcol year customerID
#1 2009_20227 2009 20227
#2 2010_20228 2010 20228
#3 2011_20229 2011 20229
#4 2012_20230 2012 20230
#5 2013_20231 2013 20231

Another alternative (using the example of #akrun):
dat <- data.frame(year=2009:2013, customerID=20227:20231)
dat$newcol <- paste(dat$year, dat$customerID, sep="_")

Related

New to R, I'm struggling to change the reference date of one of my variables to that of another date. Any help would be much appreciated [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
New to R, I'm struggling to change the reference date of one of my variables to that of another date. Any help would be much appreciated.
The current reference year is 2002 and I need to change the reference year so that it is 2015. Also, all my values in this column needs to be updated in accordance to the changed reference year. Thanks.
tsunami_data
colnames(tsunami_data)[1] <- "years_before"
Currently the years before are in relation to the date 2002 and I need to update to that of 2015, so that the years before become larger because of the shift in the date. Aslo, I am using a large data set at present. thanks.
e.g.
years_before
7
56
87
45
it is hard to understand what you want. Here I created a new variable and added the difference between 2002 and 2015.
library(dplyr)
df <- data_frame(years_before_2002 = c(7,56,87,45))
df_new <- df %>% mutate(years_before_2015 = years_before_2002 + (2015 - 2002))
#for your example
tsunami_data <- tsunami_data %>% mutate(years_before_2015 = years_before + (2015 - 2002))
#if you would like to keep "years_before", but just adjust for 13 years, run this
tsunami_data <- tsunami_data %>% mutate(years_before = years_before + (2015 - 2002))

Selecting column in dataframe returns NULL [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to access a column in my dataframe using dataframe$column format. But it returns NULL. What am I doing wrong ? Please help
As you can see from the output, you don't have a column called Ozone; the column, and the only one, you have is called V1. You will have to split the data in V1 into columns. This can be done using tidyr's separate, like so:
Data:
df <- data.frame(
V1 = c("Ozone,Solar.R,Wind,Temp,Month,Day",
"41,190,7.4,67,5,1")
)
First, get your column names:
col_names <- unlist(strsplit(df$V1[1], ","))
The column names are now stored in a vector:
col_names
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
Now transform df:
library(dplyr)
library(tidyr)
df %>%
# first rename the col to be transformed:
rename("Ozone,Solar.R,Wind,Temp,Month,Day" = V1) %>%
# remove the first row, which is now redundant:
slice(2:nrow(.)) %>%
# separate into columns using the `col_names`:
separate(1, into = col_names, sep = ",")
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1

R programming- find lowest value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've just started learning R. I wanted to know how can I find the lowest value in a column for a unique value in other column. For example, in this case I wanted to know the lowest avg price per year.
I have a data frame with about 7 columns, 2 of them being average price and year. The year is obviously recurrent ranges from 2000 to 2009. The data also has various NA's in different columns.
I have very less idea about running a loop or whatsoever in this regard.
Thank you :)
my data set looks something like this:
avgprice year
332 2002
NA 2009
5353 2004
1234 NA and so on.
To break down my problem to find first five lowest values from year 2000-2004.
s<-subset(tx.house.sales,na.rm=TRUE,select=c(avgprice,year)
s2<-subset(s,year==2000)
s3<-arrange(s2)
tail(s2,5)
I know the code fails miserably. I wanted to first subset my dataframe on the basis of year and avgprice. Then sort it for each year through 2000-2004. Arrange it and using tail() print the lowest five. However I also wanted to ignore the NAs
You could try
aggregate(averageprice~year, df1, FUN=min)
Update
If you need to get 5 lowest "averageprice" per "year"
library(dplyr)
df1 %>%
group_by(year) %>%
arrange(averageprice) %>%
slice(1:5)
Or you could use rank in place of arrange
df1 %>%
group_by(year) %>%
filter(rank(averageprice, ties.method='min') %in% 1:5)
This could be also done with aggregate, but the 2nd column will be a list
aggregate(averageprice~year, df1, FUN=function(x)
head(sort(x),5), na.action=na.pass)
data
set.seed(24)
df1 <- data.frame(year=sample(2002:2008, 50, replace=TRUE),
averageprice=sample(c(NA, 80:160), 50, replace=TRUE))

How to separate data based on different variable values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a dataset of around 1.5 L observations and 2 variables: name and amount. name can have same value again and again, for example a name ABC can appear 50 times in the dataset.
I want a new data frame with two variables: name and total amount, where each name has a unique value and total amount is the sum of all amounts in previous dataset. For example if ABC appears three times with amount == 1, 2 and 3 respectively in the previous dataset then in the new dataset, ABC will only appear one time with total amount == 6.
You can use data.table for big datasets:
library(data.table)
res<- setDT(df)[, list(Total_Amount=sum(amount)), by=name]
Or use dplyr
library(dplyr)
df %>%
group_by(name) %>%
summarise(Total_Amount=sum(amount))
Or as suggested by #hrbrmstr,
count(df, name, wt=amount)
data
set.seed(24)
df <- data.frame(name=sample(LETTERS[1:5], 25, replace=TRUE),
amount=sample(150,25, replace=TRUE))

Differences between two data frames in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have two data frames, each with 9 columns, and DF2 is a subset of DF1. I'm trying to create a third data frame that contains only the contents of DF1 that are NOT present in DF2.
What is the most efficient way of doing this? I can write a while loop, but I was wondering if there is another way (besides sqldf as for some reason I cannot upload it into my R Studio) that I can do this?
The following can work (directly from Identify records in data frame A not contained in data frame B)
fun.12 <- function(x.1,x.2,...){
x.1p <- do.call("paste", x.1)
x.2p <- do.call("paste", x.2)
x.1[! x.1p %in% x.2p, ]
}
DF1 <- data.frame(a=c(1,2,3,4,5), b=c(1,2,3,4,5))
DF2 <- data.frame(a=c(1,1,2,3,4), b=c(1,1,99,3,4))
fun.12(DF1, DF2)
# a b
# 2 2 2
# 5 5 5

Resources