extract command from data.frame to create same df

extract command from data.frame to create same df - r

is there any package/command in R that reads a data.frame and then creates a command that can be used to create the exactly same data.frame without loading data, i.e., all data of the data.frame would have to be stored within the command?
e.g. if one has a data.frame like this:
mydata <- data.frame(col1=c(1,2),col2=c(3,4))
I just want to get the command such that reading "mydata" results in the command on the right hand side.
BR
Fabian

The dput function "Writes an ASCII text representation of an R object to a file or connection" and is as close to the right hand side as you would get. It actually contains more details about the structure of the object, as is seen below:
> dput(mydata)
structure(list(col1 = c(1, 2), col2 = c(3, 4)), .Names = c("col1",
"col2"), row.names = c(NA, -2L), class = "data.frame")

You could also use enquote, which turns mydata back into an unevaluated call. It can then be evaluated with eval.
> ( e <- enquote(mydata) )
# quote(list(col1 = c(1, 2), col2 = c(3, 4)))
> eval(e)
# col1 col2
# 1 1 3
# 2 2 4
> identical(eval(e), mydata)
# [1] TRUE

Related

Pass multiple arguments to function from dataframe with unknown number of columns

I have a function similar to this:
testfun = function(jID,kID,d){
g=paste0(jID,kID)
date = d
bb=data.frame(g,date)
return(bb)
}
Data frame:
x=data.frame(jID = c("a","b"),kID=c("c","d"),date="20170206",stringsAsFactors = FALSE)
I want to pass each row as inputs into the function. The solutions provided here: Passing multiple arguments to a function taken from dataframe are great but in their case, the number of columns was known. How would a solution like this:
vtestfun <- (Vectorize(testfun, SIMPLIFY=FALSE))
vtestfun(x[,1],x[,2],x[,3])
be applied if the number of columns in the dataframe is not known or keeps changing?

If you can match the argument names to the column names like so:
testfun <- function(jID, kID, date){ # 'date', not 'd'
g <- paste0(jID, kID)
bb <- data.frame(g, date)
return(bb)
}
You could do:
purrr::pmap(x, testfun)
Returning:
[[1]]
g date
1 ac 20170206
[[2]]
g date
1 bd 20170206
# Data used:
x <- structure(list(jID = c("a", "b"), kID = c("c", "d"), date = c("20170206", "20170206")), class = "data.frame", row.names = c(NA, -2L))

Returning specific values within a row

I have 1 row of data and 50 columns in the row from a csv which I've put into a dataframe. The data is arranged across the spreadsheet like this:
"FSEG-DFGS-THDG", "SGDG-SGRE-JJDF", "DIDC-DFGS-LEMS"...
How would I select only the middle part of each element (eg, "DFGS" in the 1st one, "SGRE" in the second etc), count their occurances and display the results?
I have tried using the strsplit function but I couldn't get it to work for the entire row of data. I'm thinking a loop of some kind might be what I need

You can do unlist(strsplit(x, '-'))[seq(2, length(x)*3, 3)] (assuming your data is consistently of the form A-B-C).
# E.g.
fun <- function(x) unlist(strsplit(x, '-'))[seq(2, length(x)*3, 3)]
fun(c("FSEG-DFGS-THDG", "SGDG-SGRE-JJDF", "DIDC-DFGS-LEMS"))
# [1] "DFGS" "SGRE" "DFGS"
Edit
# Data frame
df <- structure(list(a = "FSEG-DFGS-THDG", b = "SGDG-SGRE-JJDF", c = "DIDC-DFGS-LEMS"),
class = "data.frame", row.names = c(NA, -1L))
fun(t(df[1,]))
# [1] "DFGS" "SGRE" "DFGS"

First we create a function strng() and then we apply() it on every column of df. strsplit() splits a string by "-" and strng() returns the second part.
df = data.frame(a = "ab-bc-ca", b = "gn-bc-ca", c = "kj-ll-mn")
strng = function(x) {
strsplit(x,"-")[[1]][2]
}
# table() outputs frequency of elements in the input
table(apply(df, MARGIN = 2, FUN = strng))
# output: bc ll
2 1

converting list variables within data frame to data frame in R

I have read data from a sav (spss) file. Using the following code:
library(foreign)
test <- read.spss(path_to_file, to.data.frame = TRUE)
the resultant data frame is in the following format:
structure(list(srl = c(4096, 15024, 4094), mem_id = c(278812,
2341700, 251337), q1 = c(2, 2, 1)), row.names = c(NA, 3L), class = "data.frame")
While the object test is a data frame, each of the columns is rendered as a list. I tried the following to convert:
dd <- data.frame(srl = unlist(df$srl), mem_id = unlist(df$mem_id), q1 = unlist(df$q1))
still the resultant data frame is in the same as given in the dput.

Even if we cannot reproduce it and run it so that we could check if it works, why don't you try:
lst <- lst[-c(4,5)]
and then
new_lst <- as.data.frame(lst)
,where lst is the name of your list. I suggest remove the 4th and 5th element cause in a dataframe you probably won't need it.

Manipulate string in R

I'm looking to manipulate a set of strings in R.
The data I have:
Data Field
Mark Twain 5
I want it to instead be:
Data Field
Twain Mark 5
My idea was to first split the string into two columns and then concatenate. But I'm wondering if there is an easier way.

you can try this approach:
> df <- data.frame(Data=c("Mark Twain"), Field=5)
> df$Data <- lapply(strsplit(as.character(df$Data), " "), function(x) paste(rev(x), collapse=" "))
> df
Data Field
1 Twain Mark 5
This will work even if the number of rows in your data frame is > 1

we can use sub to do this
df1$Data <- sub("(\\S+)\\s+(\\S+)", "\\2 \\1", df1$Data)
df1
# Data Field
#1 Twain Mark 5
data
df1 <- structure(list(Data = "Mark Twain", Field = 5L),
.Names = c("Data", "Field"), class = "data.frame",
row.names = c(NA, -1L))

Weird behaviour by ordering a data frame

I have the following data frame that I want to order by the fifth column ("Distance").
When I try `
df.order <- df[order(df[, 5]), ]
I always get the following error message.
Error in order(df[, 5]) : unimplemented type 'list' in 'orderVector1'`
I don't know why R consider my data frame as a list. Running is.data.frame(df) returns TRUE. I have to admit that is.list(df) also returns TRUE. Is is possible to force my data frame to be only a data frame and not a list?
Thanks for your help.
structure(list(ID = list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Latitude = list(50.7368, 50.7368, 50.7368, 50.7369, 50.7369, 50.737, 50.737, 50.7371, 50.7371, 50.7371),
Longitude = list(6.0873, 6.0873, 6.0873, 6.0872, 6.0872, 6.0872, 6.0872, 6.0872, 6.0872, 6.0872),
Elevation = list(269.26, 268.99, 268.73, 268.69, 268.14, 267.87, 267.61, 267.31, 267.21, 267.02),
Distance = list(119.4396, 119.4396, 119.4396, 121.199, 121.199, 117.5658, 117.5658, 114.9003, 114.9003, 114.9003),
RxPower = list(-52.6695443922406, -52.269130891243, -52.9735258244422, -52.2116571930007, -51.7784534281727, -52.7703448813654, -51.6558862949081, -52.2892907635308, -51.8322993596551, -52.4971436682333)),
.Names = c("ID", "Latitude", "Longitude", "Elevation", "Distance", "RxPower"),
row.names = c(NA, 10L), class = "data.frame")

Your data frame contains lists, not vectors. You can convert this data frame to the "classical" format using as.data.frame and unlist:
df2 <- as.data.frame(lapply(df, unlist))
Now, the new data frame could be sorted in the intended way:
df2[order(df2[, 5]), ]

I've illustrated with a small example what's the problem:
df <- structure(list(ID = c(1, 2, 3, 4),
Latitude = c(50.7368, 50.7368, 50.7368, 50.7369),
Longitude = c(6.0873, 6.0873, 6.0873, 6.0872),
Elevation = c(269.26, 268.99, 268.73, 268.69),
Distance = c(119.4396, 119.4396, 119.4396, 121.199),
RxPower = c(-52.6695443922406, -52.269130891243, -52.9735258244422,
-52.2116571930007)),
.Names = c("ID", "Latitude", "Longitude", "Elevation", "Distance", "RxPower"),
row.names = c(NA, 4L), class = "data.frame")
Notice that list only occurs once. And all the values are wrapped by c(.) and not list(.). This is why doing sapply(df, class) on your data resulted in all columns having class list.
Now,
> sapply(df, classs)
# ID Latitude Longitude Elevation Distance RxPower
# "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
Now order works:
> df[order(df[,4]), ]
# ID Latitude Longitude Elevation Distance RxPower
# 4 4 50.7369 6.0872 268.69 121.1990 -52.21166
# 3 3 50.7368 6.0873 268.73 119.4396 -52.97353
# 2 2 50.7368 6.0873 268.99 119.4396 -52.26913
# 1 1 50.7368 6.0873 269.26 119.4396 -52.66954

This turns you data.frame of lists into a matrix:
mat <- sapply(df,unlist)
Now you can order it.
mat[order(mat[,5]),]
If all columns are of one type, e.g., numeric, a matrix often is preferable, because operations on matrices are faster than on data.frames. However, you can transform to a data.frame using as.data.frame(mat).
Btw, a data.frame is a special kind of list and thus is.list returns TRUE for every data.frame.

Ran across this same problem. This worked for me (maybe it might help someone else who is having the same problem and stumbled on this page).
I had a structure like:
lst <- list(row1 = list(col1="A",col2=1,col3="!"), row2 = list(col1="B",col2=2,col3="#"))
> lst
$row1
$row1$col1
[1] "A"
$row1$col2
[1] 1
$row1$col3
[1] "!"
$row2
$row2$col1
[1] "B"
$row2$col2
[1] 2
$row2$col3
[1] "#"
I was doing:
df <- as.data.frame(do.call(rbind, lst))
And I kept getting the same error you were getting when I tried to df[order(df$col1),]. Turns out I had to do:
df <- do.call(rbind.data.frame, lst)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

extract command from data.frame to create same df - r

You could also use enquote, which turns mydata back into an unevaluated call. It can then be evaluated with eval. > ( e <- enquote(mydata) ) # quote(list(col1 = c(1, 2), col2 = c(3, 4))) > eval(e) # col1 col2 # 1 1 3 # 2 2 4 > identical(eval(e), mydata) # [1] TRUE

Related

Pass multiple arguments to function from dataframe with unknown number of columns

Returning specific values within a row

converting list variables within data frame to data frame in R

Manipulate string in R

Weird behaviour by ordering a data frame

Categories

Resources