Convert columns into multiple rows per entry in R [duplicate] - r

This question already has answers here:
Convert columns to rows keeping the name of the column
(2 answers)
Closed 9 years ago.
I have the following data:
word Jan-2013 Feb-2013 Mar-2013
A 1 2 3
B 5 2 4
I want to convert the multiple date columns into one, named date and add an additional column for the value.
word date value
A Jan-2013 1
A Feb-2013 2
A Mar-2013 3
B Jan-2013 5
B Feb-2013 2
B Mar-2013 4
Can anyone assist?
Thanks

Additional R options
In addition to Metrics's answer, here are two additional options for R (assuming your data.frame is called "mydf"):
cbind(mydf[1], stack(mydf[-1]))
library(reshape)
melt(mydf, id.vars="word")
Excel option
I am not an Excel user, but since this question is tagged "Excel" as well, I would suggest the Tableau Reshaper Excel add-on.
For your example, it's pretty straightforward:
Go to the "Tableau" menu after installing the add-on and activating it.
Select the cells which contain the values you want to unstack. Click on OK.
View the result.

Using reshape from base R (df1 is your dataframe)
reshape(df1,times=names(df1)[-1],timevar="date",varying=names(df1)[-1],v.names="value",new.row.names=1:6,ids=NULL,direction="long")
word date value
1 A Jan.2013 1
2 B Jan.2013 5
3 A Feb.2013 2
4 B Feb.2013 2
5 A Mar.2013 3
6 B Mar.2013 4

Related

how to convert large column using factor [duplicate]

This question already has answers here:
Convert data.frame column format from character to factor
(8 answers)
Closed 3 years ago.
Im writing a machine learning code for my dataset having hotels column.The hotel column contains 300 hotels name.For data preprocessing,I saw we have to use factor.Is there any easy way to covert it as there are so many values for level?
It's simple, use the as.factor() function to convert the column form character to factor.
Here's a sample
# Sample data
data
a b
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
class(data$a)
[1] "character"
# Converting to factor
data$a <- as.factor(data$a)
# Results
class(data$a)
[1] "factor"
summary(data$a)
A B C
2 2 1
if you are using read.csv option to load the csv data into a dataframe, then column having string values are by default loaded as a factor column.
Anyway you can use factor() function to convert a column to factor:
df$a <- factor(df$a).

Doing something similar to melt to an R dataframe [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I've got a dataframe like this:
The first column is numeric, and the second column is a comma separated list (character)
id numbers
1 2,4,5
2 1,4,6
3 NA
4 NA
5 5,1,2
And I want to in essence "melt" the dataframe similar to the reshape package. So that the output is a dataframe which looks like this
id numbers
1 2
1 4
1 5
2 1
2 4
2 6
3 NA
4 NA
5 5
5 1
5 2
Except in the reshape2 package each number will have to be each in a column... which takes up too much storage space if there are many numbers... which is why I have opted to set the list of numbers as a comma separated list. But melt no longer works with this setup.
Can you recommend the most efficient way to achieve the transformation from the input dataframe to output dataframe?
The way I would do it for each row, create a data.frame and store them in a list, where df is your initial data.frame.
l = list()
for (j in 1:nrow(df)){
l[[j]] = data.frame(id = df$id[[j]],
numbers = split(df$numbers[[j]], ','))
}
Afterwards, you can stack all list elements into a single data.frame using plyr::ldply with the 'data.frame' option.

Replacing values in data frame column using another data frame [duplicate]

This question already has answers here:
How to do vlookup and fill down (like in Excel) in R?
(9 answers)
Closed 7 years ago.
I have a table of pending bills in the Scottish Parliament. One of the columns (BillTypeID) is populated with numbers that indicate what type of bill each one is (there are seven different types of bills).
I have another table that describes which number corresponds to which bill types ( 1 = "Executive", 2 = "Member's", etc.)
I want to replace the number in my main table with the corresponding string that describes the type for each bill.
Data:
bills <- jsonlite::fromJSON(url("https://data.parliament.scot/api/bills"))
bill_stages <- jsonlite::fromJSON(url("https://data.parliament.scot/api/billstages"))
This is probably a duplicate but I can't find the corresponding answer ...
The easiest way to do this is with merge().
d1 <- data.frame(billtype=c(1,1,3,3),
bill=c("first","second","third","fourth"))
d2 <- data.frame(billtype=c(1,2,3),
billtypename=c("foo","bar","bletch"))
d3 <- merge(d1,d2)
##
## billtype bill billtypename
## 1 1 first foo
## 2 1 second foo
## 3 3 third bletch
## 4 3 fourth bletch
... then drop the billtype column if you don't want it any more. You can probably do it slightly more efficiently with match() (see my answer to the linked question).

filter R data frame with one column - keep data frame format [duplicate]

This question already has an answer here:
Filtering single-column data frames
(1 answer)
Closed 7 years ago.
I am looking for a simple way to display a subset of a one column data frame
Let's assume, I have a a data frame:
> df <- data.frame(a = 1:100)
Now, I only need the first 10 rows. If I subset it by index, I'll get a result vector instead of a data frame:
> df[1:10,]
[1] 1 2 3 4 5 6 7 8 9 10
I tried to use 'subset' but not using the 'subset'-parameter will result in an error (only for one-column-data-frames?):
subset(df[1:10,])
Error in subset.default(df[1:10, ]) :
argument "subset" is missing, with no default
There should be a very easy solution to achive a subset (still a data frame) filtered by row index, no?
I am lookung for a solution with basic R commands (it should not depend on any special library)
you can use drop=FALSE, which prevent from droping the dimensions of the array.
df[1:10, , drop=FALSE]
a
1 1
2 2
3 3
4 4
5 5
...
For subset you need to add a condition.

R, convert multiple text filenames to a column name [duplicate]

This question already has answers here:
Read a list of files with R, each file contains a list of float numbers. what's the proper way to do it?
(2 answers)
Closed 9 years ago.
I've got many text files with named by year i.e. yob1940.txt,yob1941.txt. Each file has 3 colums of data. I'm trying to import the data into R in a single data table, and add the year for each file in a 4th column.
Any help would be much appreciated.
Thanks
Smth like this will work:
rbindlist(lapply(list.files(pattern = "yob[0-9]+\\.txt"),
function(x) data.table(year = sub('.*?([0-9]+).*', '\\1', x),
fread(x)))))
Assuming you have read these files as x1 and x2
df.list<-list(x1,x2)
kk<-do.call(rbind,df.list)
year<-data.frame(rep(c(1940,1941),c(nrow(x1),nrow(x2))))
names(year)<-"year"
mydata<-data.frame(cbind(kk,year))
A sample example:
x1<-data.frame(x=c(1,3),y=c(2,3))
x2<-data.frame(x=c(3,3),y=c(2,2))
df.list<-list(x1,x2)
kk<-do.call(rbind,df.list)
year<-data.frame(rep(c(1940,1941),c(nrow(x1),nrow(x2))))
names(year)<-"year"
mydata<-data.frame(cbind(kk,year))
mydata
x y year
1 1 2 1940
2 3 3 1940
3 3 2 1941
4 3 2 1941

Resources