I am trying to do [multimerge][1] but it keep getting an error message
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 119, 75, 78, 71, 74
I thought merge would be able to handle different row numbers - which is why I am using merge rather than cbind (and the fact that i need each row in each column to match specifically to a rat ID).
Here is what I've got:
datalist = list()
modeldf <- d %>%
select("Dam", "Rat", "Group", "Sex", "CS.NCS")
datalist <- list(modeldf) #adding to list
for (i in colnames(d[c(6:47)])) {
#STUFF IN FOR LOOP
resultcolumn <- newdf %>% #final result dataframe
select("Rat", all_of(i))
datalist[[i]] <- resultcolumn #adding result dataframe to list
}
resultsdf <- merge(datalist, by="Rat", all.x=TRUE, sort = FALSE)
The datalist is a perfect list of dataframes, the resultcolumn outputs a perfect dataframe (i checked with class()).
What is the problem and how do I fix it?
edit: typo
[1]: https://cran.r-project.org/web/packages/Orcs/vignettes/merge.html
turns out i'm an idiot. solved by installing this package and using multimerge
Related
I have a CSV file with 141 rows and several columns. I wanted my data to be ordered in ascending order by the first two columns i.e. 'label' and 'index'. Following is my code:
final_data <- read.csv("./features.csv",
header = FALSE,
col.names = c('label','index', 'nr_pix', 'rows_with_1', 'cols_with_1',
'rows_with_3p', 'cols_with_3p', 'aspect_ratio',
'neigh_1', 'no_neigh_above', 'no_neigh_below',
'no_neigh_left', 'no_neigh_right', 'no_neigh_horiz',
'no_neigh_vert', 'connected_areas', 'eyes', 'custom'))
sorted_data_by_label <- final_data[order(label),]
sorted_data_by_index <- sorted_data_by_label[order(index),]
write.table(sorted_data_by_index, file = "./features.csv",
append = FALSE, sep = ',',
row.names = FALSE)
I chose to read from a CSV and use write.table because that was necessary for my code requirement to override the CSV with column names.
Now even when I added a , after order(label), and order(index), the code sorted data should still read other rows and columns right?
After running this code, I only get the first row out of 141 rows. Is there a way to fix this problem?
As #akrun has mentioned briefly, what you need to do is to change
sorted_data_by_label <- final_data[order(label),]
to
sorted_data_by_label <- final_data[order(final_data$label),]
and to change
sorted_data_by_index <- sorted_data_by_label[order(index),]
to
sorted_data_by_index <- sorted_data_by_label[order(sorted_data_by_label$index),]
This is because when you write label, R will try to find the index object in the global environment, not within the final_data data frame.
If you intended to use index that is a column of final_data, you need to use explicit final_data$index.
Other options
You can use with:
sorted_data_by_label <- with(final_data, final_data[order(label),])
sorted_data_by_index <- with(sorted_data_by_label, sorted_data_by_label[order(index),])
In dplyr you can use
sorted_data_by_label <- final_data %>% arrange(label)
sorted_data_by_index <- sorted_data_by_label %>% arrange(index)
I m working with a Rstudio code, i have 450 JSON files, i have all in my workspace, with some JSON files are all rigth, but with some files like this one (https://drive.google.com/file/d/1DsezCmN8_8iLNCAsLZiRnxTrwnWu6LkD/view?usp=sharing , is a 296kb json) when i try to make the field tm to dataframe i have this mistake
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 1
The code that i use is
JSONList <- rjson::fromJSON(file = "2.json", simplify = F)
DF <- as.data.frame(JSONList$tm)
With the files that are ok i obtain a 1 observation of 5168 variables.
How can i avoid this priblem with some files?
Thanks
Another posibility that i think is select the rows that i need
candidatos = list(
"name",
"score",
"tot_sFieldGoalsMade",
"tot_sFieldGoalsAttempted",
"tot_sTwoPointersMade",
"tot_sTwoPointersAttempted",
"tot_sThreePointersMade",
"tot_sThreePointersAttempted",
"tot_sFreeThrowsMade",
"tot_sFreeThrowsAttempted",
"tot_sReboundsDefensive",
"tot_sReboundsOffensive",
"tot_sReboundsTotal",
"tot_sAssists",
"tot_sBlocks",
"tot_sTurnovers",
"tot_sFoulsPersonal",
"tot_sPointsInThePaint",
"tot_sPointsSecondChance",
"tot_sPointsFromTurnovers",
"tot_sBenchPoints",
"tot_sPointsFastBreak",
"tot_sSteals"
)
ListColum<-map(candidatos, function(x){
as.data.frame(data$tm$"2"$x)
} )
But R give me a list of 23 DF with no elements
I'm basically trying to call an API to retrieve weather information from a government website.
library(data.table)
library(jsonlite)
library(httr)
base<-"https://api.data.gov.sg/v1/environment/rainfall"
date1<-"2020-01-25"
call1<-paste(base,"?","date","=",date1,sep="")
get_rainfall<-GET(call1)
get_rainfall_text<-content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.frame(get_rainfall_json)
I'm getting an error
"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 52, 287, 1"
Not too sure how to resolve this, i'm trying to format the retrieved data into a dataframe format so i can make sense of the readings.
Your "get_rainfall_json" object comes back as a "list". Trying to turn this into a data frame is where you are getting the error. If you specify the "items" object within the list, your error is resolved! (The outcome of this looks like it has some more embedded data within objects... So you'll have to parse through that into a format you're interested in.)
get_rainfall_df <- as.data.frame(get_rainfall_json$items)
Update
In order to loop through the next data frame. Here is one way you could do it. Which loops through each row, extracts the list in each row and turns that into a data frame and appends it to the "df". Then, you are left with one final df with all the data in one place.
library(data.table)
library(jsonlite)
library(httr)
library(dplyr)
base <- "https://api.data.gov.sg/v1/environment/rainfall"
date1 <- "2020-01-25"
call1 <- paste(base, "?", "date", "=", date1, sep = "")
get_rainfall <- GET(call1)
get_rainfall_text <- content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.table(get_rainfall_json$items)
df <- data.frame()
for (row in 1:nrow(get_rainfall_df)) {
new_date <- get_rainfall_df[row, ]$readings[[1]]
colnames(new_date) <- c("stationid", "value")
date <- get_rainfall_df[row, ]$timestamp
new_date$date <- date
df <- bind_rows(df, new_date)
}
I am using WikipediR to query revision ids to see if the very next edit is a 'rollback' or an 'undo'
I am interested in the tag and revision comment to identify if the edit was undone/rolled back.
my code for this for a single revision id is:
library(WikipediR)
wp_diff<- revision_diff("en", "wikipedia", revisions = "883987486", properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
I then convert the output of this to a df using the code
library(dplyr)
library(tibble)
diff <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
This works great for a single revision id.
I am wondering how I would loop or map over a vector of many revision ID's
I tried
vec <- c("883987486","911412795")
for (i in 1:length(vec)){
wp_diff[i]<- revision_diff("en", "wikipedia", revisions = i, properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
}
But this creates the error
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
When I try to convert the output list to a dataframe.
Does anybody have any suggestions. I am not sure how to proceed.
Thanks.
Try the following code:
# Make a function
make_diff_df <- function(rev){
wp_diff <- revision_diff("en", "wikipedia", revisions = rev,
properties = c("tags", "comment"),
direction = "next", clean_response = TRUE,
as_wikitext = TRUE)
DF <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
# Define the names of the DF
names(DF) <- c("pageid","ns","title","revisions.diff.from",
"revisions.diff.to","revisions.diff..",
"revisions.comment","revisions..mw.rollback.")
return(DF)
}
vec <- c("883987486","911412795")
# Use do.call and lapply with the function
do.call("rbind",lapply(vec,make_diff_df))
Note that you have to fixed the names of the DF inside the make_diff_df function in order to "rbind" inside do.call could work. The names with the 2 versions from the example are pretty similar.
Hope this can help
Data:
structure(list(`p value` = c(0.00151124736422317, 0.804709799937324,
0.0192537412780042, 0.000467854188597731, 4.80216666553605e-06,
0.0231434946595433), significance = c(TRUE, FALSE, TRUE, TRUE,
TRUE, TRUE)), .Names = c("p value", "significance"), row.names = c("Q5.i",
"Q5.ii", "Q5.iii", "Q5.iv", "Q5.v", "Q5.vi"), class = "data.frame")
Objective:
To create a function that would take input of dataframe name and a (new) variabe name.
The function would:
create a new variable that is based on the row name of the dataframe
delete the row name
reorder the variable so that the newly created
column is first column
Challenges:
I am stuck at the first step.
I've searched the internet and stackoverflow for snippets of code that would help and I've managed to hammer something although it couldn't work.
What have I tried:
row2col<-function(df, varname){
eval(parse(text=paste(df, "[[", "'", varname, "'", "]]", "<-row.names(", df, ")", sep="")))
}
row2col<-function(df, varname){
assign(parse(text=paste(df, varname, sep="$")), row.names(df))
}
Results:
nothing happened (not even an error message)
a character vector of row names (rather than a variable within the dataframe) was created
Thanks for your help and attention to this post.
You don't need to use eval, parse, assign - that's in many cases not the right approach. Here's a simple alternative:
row2col <- function(dat, varname) {
dat[[varname]] <- row.names(dat)
row.names(dat) <- NULL
dat[, c(varname, setdiff(names(dat), varname))]
}
And then you can test it:
> row2col(df, "testcol")
# testcol p value significance
#1 Q5.i 1.511247e-03 TRUE
#2 Q5.ii 8.047098e-01 FALSE
#3 Q5.iii 1.925374e-02 TRUE
#4 Q5.iv 4.678542e-04 TRUE
#5 Q5.v 4.802167e-06 TRUE
#6 Q5.vi 2.314349e-02 TRUE
Create new var using row names.
data$new_var <- row.names(data)
Reset row names
row.names(data) <- NULL
Reorder data frame with new var first
data <- data[, c(ncol(data):(ncol(data) - 1))]