Error when doing Panel VAR (panelvar package) in R - r

I tried to run a panel var on dataset I got from Statistics Sweden and here is what I get:
df<- read_excel("Inkfördelning per kommun.xlsx")
nujavlar <- pvarfeols(dependent_vars = c("Kvintil-1", "Kvintil-4", "Kvintil-5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year")
)
Error: Can't subset columns that don't exist.
x Column `Kvintil-1` doesn't exist.
I often get this message too:
Warning in xtfrm.data.frame(x) : cannot xtfrm data frames
Error: Can't subset columns that don't exist.
x Location 2 doesn't exist.
ℹ There are only 1 column.
I have made sure that all data is numeric. I have also tried cleaning my workspace and restarted the programme. I also tried to convert it into a paneldata frame with palm package. I also tried converting my entity variable "Kommun" (Municipality) into factors and it still doesn't work.
Here's the data if someone wants to give it a go.
https://docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC/edit?usp=sharing&ouid=113164216369677216623&rtpof=true&sd=true

The column names in your dataframe are Kvintil 1, not Kvintil-1, so the variable you are referring to really does not exist. Please be aware that in R, variable names cannot have hyphens and it is good practice to avoid spaces in variable names because it is annoying to refer to variables with spaces. I have included a reproducible example below.
library(tidyverse)
library(gsheet)
library(panelvar)
url <- 'docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC'
df <- gsheet2tbl(url) %>%
rename(Kvintil1 = `Kvintil 1`) %>%
rename(Kvintil2 = `Kvintil 2`) %>%
rename(Kvintil3 = `Kvintil 3`) %>%
rename(Kvintil4 = `Kvintil 4`) %>%
rename(Kvintil5 = `Kvintil 5`) %>%
as.data.frame()
nujavlar <- pvarfeols(
dependent_vars = c("Kvintil1", "Kvintil4", "Kvintil5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year"))

Related

Purrr package add species names to each output name in SSDM

I have a list of species and I am running an ensemble SDM modelling function on the datset filtering by each species, to give an ensemble SDM per species from the dataset.
I have used purrr package to get it running, and the code works fine when there is no naming convention added in. However, when it outputs the Ensemble.SDM for each species, they are all named the same thing "ensemble.sdm", so when I want to stack them, I cannot as they are all named the same thing.
I would like to be able to name each output of the model something different, ideally linked to the species name picked out in the line: data <- Occ_full %>% filter(NAME == .x)
The working code is written below:
list_of_species <- unique(unlist(Occ_full$NAME))
# Return unique values
output <- purrr::map(limit_list_of_species, ~ {
data <- Occ_full %>% filter(NAME == .x)
ensemble_modelling(c('GAM'), data, Env_Vars,
Xcol = 'LONGITUDE', Ycol = 'LATITUDE', rep = 1)
})
The code I have tried to get it named within it, is below, but it does not work, it names it with lots of repeitions of the row number.
output <- purrr::map(limit_list_of_species, ~ {
data <- Occ_full %>% filter(NAME == .x)
label <- as.character(data)
ensemble_modelling(c('GAM'), data, Env_Vars,
Xcol = 'LONGITUDE', Ycol = 'LATITUDE', rep = 1, name = label )
})
Could anyone help me please? I simply want each "output" to be named with the species name specified in the filter. Thank you
Try using split with imap -
list_of_species <- split(Occ_full, Occ_full$NAME)
output <- purrr::imap(list_of_species,~{
ensemble_modelling(c('GAM'), .x, Env_Vars,Xcol = 'LONGITUDE',
Ycol = 'LATITUDE', rep = 1, name = .y)
})
split would ensure that the list_of_species is named which can be used in imap.

How to pipe in dplyr

I am trying to use the pipe function in dplyr and left_join to clean some meta data up. Setting up variables....
library(openxlsx)
library(tidyverse)
mdat <- read.xlsx("https://journals.plos.org/plospathogens/article/file?type=supplementary&id=info:doi/10.1371/journal.ppat.1005511.s011",
startRow = 3, fillMergedCells = TRUE) %>%
mutate(sample=Accession.Number)
dge$samples$sample=
[1] "SRR1346026" "SRR1346027" "SRR1346028" "SRR1346029" "SRR1346030" "SRR1346031" "SRR1346032" "SRR1346033" "SRR1346034"
[10] "SRR1346035" "SRR1346036" "SRR1346037" "SRR1346038" "SRR1346039" "SRR1346040" "SRR1346041" "SRR1346042" "SRR1346043"
[19] "SRR1346044" "SRR1346045" "SRR1346046" "SRR1346047" "SRR1346049" "SRR1346048" "SRR1346050" "SRR1346051" "SRR1346052"
I am trying to pipe in the dge$samples$sample, which is a character class. It needs to become a data frame of one column named sample so I can merge mdat with it by left join in order to remove all the metadata I don't have a sample for. If you run dim(mdat) you will find it is 35 by 15, I want to reduce it to the 19 samples I actually have data for, these are given in the dge$samples$sample list. I am trying to use the following code to first convert dge$samples$sample into a data frame with one column titled sample for joining the two and essentially removing all metadata that is not of interest to me. The code below has been my progress so far but I think I am failing to understand how pipe works.
test = data.frame(dge$samples$sample) %>%
colnames(.) = c("sample") %>%
left_join(
.,
mdat,
by = sample,
copy = FALSE,
suffix = c(".x", ".y"),
keep = FALSE,
na_matches = c("na", "never")
)
Why not just check if theyre in there and filter them:
mdat %>% filter( sample %in% dge$samples$sample )
It's easier to understand and controll than a join and performance shouldn't be an issue.
I think your code can be reduced to
library(dplyr)
test <- data.frame(sample = dge$samples$sample) %>%
left_join(mdat, by = 'sample')
Or an inner join should work as well, using base R :
test <- merge(data.frame(sample = dge$samples$sample), mdat, by = 'sample')
Using collapse
library(collapse)
sbt(mdat, sample %in% dge$samples$sample)

Changing multiple markup errors over multiple columns in R

I run stuck over a piece of coding that allows me to change multiple text markup errors, ie "�nt" which should be "ent", or "�de" which should be "ide", or "ïn" which should be "in" (without quotation marks), over 47 columns in total.
A quick example of the dataframe
x <- data.frame("Name"(c("Pat�nt", "Pat�nt"),"Type"(c("�de", "�de"),"Role"(c("ïn", "ïn")))))
$x
Name Type Role
Pat�nt �de ïn
Pat�nt �de ïn
(Not sure if code is correct, but you get the meaning of how my dataframe looks like).
Now, what I tried from other posted solutions, especially from: the one posted here is that I ended up with quite some line of code, an excerpt:
x <- data.frame(lapply(x, function(y){ gsub("�ne", "ine", y)}))
So, for every text markup error there is a line of code like above, that fixes the markup error. However, using the lapply/apply family it changes my original classes to factors. And using the stringr package, I can not get this done since it keeps putting it in a single vector, an excerpt
x <- str_replace(string = x, pattern="�nt", replacement = "ent")
So my question here is: is there another way to replace multiple strings over multiple columns, meanwhile maintaining the original dataframe classes?
EDIT
I edited slightly the code from Calum You to:
mutate_at(.vars = vars(everything()),
.funs = ~ str_replace_all(., pattern = "�nt", replacement = "")) %>%
Such that it replaces all instances within a row, instead of the first.
Next, by using the snippet below before the functions of mutate_at, the code now iterates only over character vectors instead of all vectors, i.e. numeric/factor/date etc.
df2 <- df1[, sapply(df1, class) == 'character'] %>%
So you just want to apply your replace function to every column, because the markup errors can be in any column? Try this approach, which uses mutate_at to apply your function to as many columns as you like (here to all of them)
library(tidyverse)
df <- tibble(
Name = c("Pat�nt", "Pat�nt"),
Type = c("�de", "�de"),
Role = c("ïn", "ïn")
)
df %>%
mutate_at(
.vars = vars(everything()),
.funs = ~ str_replace(., pattern = "�nt", replacement = "ent")
) %>%
mutate_at(
.vars = vars(everything()),
.funs = ~ str_replace(., pattern = "�de", replacement = "ide")
) %>%
mutate_at(
.vars = vars(everything()),
.funs = ~ str_replace(., pattern = "ïn", replacement = "in")
)
# A tibble: 2 x 3
Name Type Role
<chr> <chr> <chr>
1 Patent ide in
2 Patent ide in

How to join Spatial data with Dataframe so it can be displayed with Tmap?

Short version: when executing the following command qtm(World, "amount") I get the following error message:
Error in $<-.data.frame(*tmp*, "SHAPE_AREAS", value =
c(653989.801201595, : replacement has 177 rows, data has 175
Disclaimer: this is the same problem I used to have in this question, but if I'm not wrong, in that one the problem was that I had one variable on the left dataframe that matched to several variables on the right one, and hence, I needed to group variables on right dataframe. In this case, I am pretty sure that I do not have the same problem, as can be seen from the code below:
library(tmap)
library(tidyr)
# Read tmap's world map.
data("World")
# Load my dataframe.
df = read.csv("https://gist.githubusercontent.com/ccamara/ad106eda807f710a6f331084ea091513/raw/dc9b51bfc73f09610f199a5a3267621874606aec/tmap.sample.dataframe.csv",
na = "")
# Compare the countries in df that do not match with World's
# SpatialPolygons.
df$iso_a3 %in% World$iso_a3
# Return rows which do not match
selected.countries = df$iso_a3[!df$iso_a3 %in% World$iso_a3]
df.f = filter(df, !(iso_a3 %in% selected.countries))
# Verification.
df.f$iso_a3[!df.f$iso_a3 %in% World$iso_a3]
World#data = World#data %>%
left_join(df.f, by = "iso_a3") %>%
mutate(iso_a3 = as.factor(iso_a3)) %>%
filter(complete.cases(iso_a3))
qtm(World, "amount")
My guess is that the clue may be the fact that the column I am using when joining both dataframes has different levels (hence it is converted to string), but I'm ashamed to admit that I still don't understand the error that I am having here. I'm assuming I have something wrong with my dataframe, although I have to admit that it didn't work even with a smaller dataframe:
selected.countries2 = c("USA", "FRA", "ITA", "ESP")
df.f2 = filter(df, iso_a3 %in% selected.countries2)
df.f2$iso_a3 = droplevels(df.f2$iso_a3)
World#data = World#data %>%
left_join(df.f2, by = "iso_a3") %>%
mutate(iso_a3 = as.factor(iso_a3)) %>%
filter(complete.cases(iso_a3))
World$iso_a3 = droplevels(World$iso_a3)
qtm(World, "amount")
Can anyone help me pointing out what's causing this error (providing an solution may also be much appreaciated)
Edited: It is again your data
table(df$iso_a3)

Tmap Error - replacement has [x] rows, data has [y]

Short version: when executing the following command qtm(countries, "freq") I get the following error message:
Error in $<-.data.frame(*tmp*, "SHAPE_AREAS", value =
c(652270.070308042, : replacement has 177 rows, data has 210
Disclaimer: I have already checked other answers like this one or this one as well as this explanation that states that usually this error comes from misspelling objects, but could not find an answer to my problem.
Reproducible code:
library(rgdal)
library(dplyr)
library(tmap)
# Load JSON file with countries.
countries = readOGR(dsn = "https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/a6f69b6c3b4a75b02858e966b9d36c85982cbd32/countries.geojson")
# Load dataframe.
df = read.csv("https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/754ea37e4aba1b7ed88eaebd2c75fd4afcc54c51/sample-dataframe.csv")
countries#data = left_join(countries#data, df, by = c("iso_a2" = "country_code"))
qtm(countries, "freq")
Your error is in the data - the code works fine.
What you are doing right now is:
1) attempting a 1:1 match
2) realize that your .csv data contains several ids to match
3) a left-join then multiplies the left hand side with all matches on the right hand-side
To avoid this issue you have to aggregate your data one more time like:
library(dplyr)
df_unique = df %>%
group_by(country_code, country_name) %>%
summarize(total = sum(total), freq = sum(freq))
#after that you should be fine - as long as just adding up the data is okay.
countries#data = left_join(countries#data, df, by = c("iso_a2" =
"country_code"))
qtm(countries, "freq")

Resources