Applying different functions to different elements in a nested list - r

I have a nested list:
my_list <- list(id1 = list(Overview = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC001CA/overview.json",
Climate = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC001CA/climatic-features.json",
Physiography = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC001CA/physiographic-features.json",
Soil = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC001CA/soil-features.json",
Ecology = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC001CA/ecological-dynamics.json"),
id2 = list(Overview = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC002CA/overview.json",
Climate = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC002CA/climatic-features.json",
Physiography = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC002CA/physiographic-features.json",
Soil = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC002CA/soil-features.json",
Ecology = "https://edit.jornada.nmsu.edu/services/descriptions/esd/022A/F022AC002CA/ecological-dynamics.json"))
I want to apply different functions to different sub-elements of the list. Here are the functions I want to apply:
Fns <- list(
function(x) fromJSON(x)$generalInformation$developmentStage,
function(x) fromJSON(x)$climaticFeatures$narratives$climaticFeatures,
function(x) fromJSON(x)$physiographicFeatures$intervalProperties,
function(x) fromJSON(x)$soilFeatures$texture$texture,
function(x) fromJSON(x)$ecologicalDynamics$narratives$ecologicalDynamics
)
There are 5 sub-elements in each list. The 5 provided functions should be applied in the order provided to those sub-elements.
I have found a couple of useful resources.
This similar question uses map() as a solution. The problem is map() is only applied to a sub-element in a single position and I am unable to determine how to apply multiple functions to multiple positions using map().
The other similar question uses Map(). This question does apply different functions to different positions of a list, but it does not use a nested list.
Any recommendations for applying multiple different functions to multiple different listed elements in a nested list?
Thanks!

We can use lapply to loop over the list, then with Map loop over the 'Fns' and the the inner list element to apply the functions on the corresponding elements
library(jsonlite)
out <- lapply(my_list, function(x) {
x[] <- Map(function(fn, y) fn(y), Fns, x)
x} )
-output
> out
$id1
$id1$Overview
[1] "Approved"
$id1$Climate
[1] "The average annual precipitation ranges from 35 to 55 inches, and falls mostly in the form of snow from November to April. The mean annual air temperature ranges from 34 to 37 degrees Fahrenheit. The frost-free (>32F) season is 25 to 45 days, and the freeze-free (>28F) season is 35 to 60 days. \r\n\r\nMaximum and minimum monthly climate data for this ESD were generated using PRISM data (PRISM Climate Group, Oregon State University, http://prism.oregonstate.edu, created 4 Feb 2004.) and the ArcGIS ESD extract tool."
$id1$Physiography
property unit representativeLow representativeHigh rangeLow rangeHigh
1 Elevation ft 8500 12000
2 Slope % 8 75
$id1$Soil
[1] "Loamy coarse sand" "Coarse sand"
$id1$Ecology
[1] "Abiotic Features: \r\nThis ecological site occurs in the highest elevations of the northern subalpine LRU, typically between 9,000 and 10,000 feet on mountain slopes. Soils are derived from granitic parent material, and are shallow to moderately deep over paralithic granitic bedrock, with a sandy skeletal particle size class. Cold temperatures and a short growing season restrict development of less frost resistant conifers. Coarse soils with very low water holding capacity support a minimal understory. \r\n\r\nEcology-Disturbance Factors: \r\nIndividual whitebark pine trees are very slow growing, and may be up to1500 years old (Millar 2014). Stands are composed of multiple age-class single and multiple stem trees because of ongoing seedling establishment. Caching of whitebark pine seeds by Clark’s nutcracker is the primary mode of seed dispersal, with birds often caching seeds in open areas that are suitable for young seedlings. If all seeds are not consumed, they give rise to dense clusters of genetically similar whitebark pine. These clusters appear to be one tree with many stems, but are more often individual trees (Burns et al. 1990, Tomback et al. 2001a). In the absence of disturbance, ongoing recruitment from seed-caches occurs, leading to an increase in stand density over time. \r\n\r\nFire and avalanche are the primary natural drivers for succession. Fire ignition is frequent on these exposed ridges and mountain peaks, but there is minimal and discontinuous fuel to carry large or hot fires. Small fires may play a minor role in maintaining openings that favor the germination and survival of young whitebark pine seedlings (Burns et al. 1990, Tomback et al. 2001b, Howard 2002). Avalanche is common among the alpine peaks and ridges, and can remove swaths of vegetation in avalanche prone chutes or below wind formed cornices. \r\n\r\nWhitebark pine forests are threatened by the non-native Cronartium ribicola, the cause of white pine blister rust (WPBR) and the native mountain pine beetle (Dendroctonus ponderosae) (Cox 2000, Tomback et al. 2001b, Howard 2002). Severe epidemics of WPBR in combination with MPB outbreaks have killed large areas of forest in the Rocky Mountains, but the whitebark pine forests in the Sierra Nevada have not suffered as high mortality. There is a complex interaction between MPB outbreaks, WPBR infection, and climate. Mountain pine beetles prefer larger diameter trees (> 6 inch diameter at breast height), as these are necessary to complete their life cycle,, and attack at the warmer, lower elevation zone of whitebark pine. Mountain pine beetles preferentially attack trees infected by WPBR. White pine blister rust will infest all whitebark pines, regardless of age or elevation (Cluck 2014). \r\n\r\nMountain pine beetles are a native species in North American forests, but warmer temperatures have shifted the thermal zone for mountain pine beetles upslope, subjecting higher elevations of whitebark pine to beetle attacks (Craig 2010, Keane et al. 2012, Keane and al 2013). Severe mountain pine beetle epidemics cause high mortality of overstory trees, while suppressed understory trees may be released (Meyer and Safford 2014). A flush of regeneration may occur due to the reduction in the overstory canopy providing new areas for establishment. However, the decline in seed production due to the loss of large overstory trees will leave fewer seeds to be consumed by Clark’s nutcracker and other animals which leaves fewer seeds available for regeneration, threatening stand sustainability. \r\n\r\nThe non-native WPBR was introduced into North America near Vancouver, British Columbia in approximately 1910, and has been slowly spreading across the western United States and Canada. It currently occurs throughout the Cascades, and north and central Sierra Nevada. So far, it has not been detected on whitebark pine in the southern extent of the Sierra Nevada, but has been found on a whitebark pine in Yosemite National Park and in a high Sierra location on the western slope of the Sierra National Forest (Maloney 2011). A survey was conducted in 2009 to determine WPBR presence and effect on whitebark pine survivorship in the Lake Tahoe Basin. Mean incidence of WPBR among whitebark pine populations was 35 percent, with a range of 1 to 65 percent (Maloney et al. 2012). \r\n\r\nIn order for WPBR to infect whitebark pine several synchronous phenological and environmental factors need to occur. For infection to occur in five-needled white pines, relative humidity has to be greater than 90 percent, temperatures have to be between 35.6 and 64.4 degrees F (2 to 18 degrees C), and stomates need to be open to allow WPBR entry (Maloney 2011). The basidiospores, which infect whitebark pine, are released in fall from the alternate host currants (Ribes sp.), or less commonly, lousewort or Indian paintbrush (Pedicularis or Castilleja sp.). These spores do not travel far or last long in the environment, and years with late summer or early fall precipitation are most likely when infection will occur. Whitebark pine may have early onset winter dormancy, so stomates are closed at the time WPBR basidiospores are released (Maloney 2011). The onset of winter dormancy is dependent upon the length of the growing season (temperature), precipitation and soil available water capacity (AWC).\r\n\t\r\nThere appears to be a relationship between soils with higher AWC and higher infection rates or intensity of stem girdling (Maloney et al. 2012). Higher soil moisture could increase WPBR mycelium growth rates and increase basidiospore production, while also allowing for whitebark pine stomates to remain open longer in the season, increasing the probability of infection (Maloney et al. 2012). This ecological site occurs on shallow to moderately deep sandy-skeletal soils, with lower AWC than the corresponding volcanic ecological site (R022AC200CA), and is likely less susceptible to WPBR infestation. A 2009 inventory of WPBR showed that the whitebark stands occurring on granitic soils had infestation rates ranging from 1 to 19%, while stands on volcanic soils ranged from -- to 65% (Maloney et al. 2012). \r\n\r\nThe main impact of WPBR on whitebark pine is reduction in stand cone production due to die-back of cone bearing branches from cankers girdling the branches. Mortality rates in older trees are low, and may take decades to occur. Younger trees may be killed quickly if main stem girdling causes disruption of water flow (Maloney et al. 2012). A few studies have been conducted on genetic resistance to WPBR, and results range from no resistance (Maloney, personal communication), to 26 to 47 percent in the Rocky Mountains and the Pacific Northwest (Keane et al. 2012). \r\n\r\nReduced seed production affects the presence and abundance of Clark’s nutcracker, and thus the number and distribution of seed caches (Tomback and Resler 2007, Keane et al. 2012). This can lead to recruitment below the threshold required to sustain populations (McKinney et al. 2009). \r\n\r\nPredictions about climate change suggest that the whitebark pine communities in the Sierra Nevada Mountains may be threatened by rising temperatures and precipitation changes. Recent California based climate models predict a 9 degree F increase in temperature by 2100, and broader models predict a 2 to 4 degree F increase in winter and 4 to 8 degree increase in summer (Safford et al. 2012). Models are more variable for precipitation, but local models for the Sierra Nevada, predict similar to slightly less precipitation. Most models agree that summers will become drier, since more of the precipitation is predicted to come as rain, and snow melt-off will occur earlier in spring (Hayhoe et al. 2004, Safford et al. 2012). Presently a severe drought is occurring in the Sierra Nevada, with 10 to 30 percent of average precipitation and very little snow accumulation. Whether this is climate driven, and thus will become more of the future normal remains to be seen. \r\n\r\nHigh elevation areas with suitable soils and landforms for the upward migration of whitebark pine will be important for the sustainability of this community. However, in this region of the central Sierra Nevada, whitebark pine already occurs at the upper most elevations of the highest mountains in the area, so has little room to move upslope. The southern Sierra Nevada, with its higher mountain peaks, may prove to be an important refugium for this species. \r\n\r\nThe historic temperature range for this ecological site is between 34 to 37 degrees F. With a 2 to 6 degree warming, species such as Sierra lodgepole pine (Pinus contorta var. murrayana), or mountain hemlock (Tsuga mertensiana) may become dominant in this zone. A 9 degree warming shift over the next 85 years could make conditions favorable for upper montane species to establish. Species such as Jeffrey pine (Pinus jeffreyi) and California red fir (Abies magnifica) could survive with the longer growing season and warmer temperatures for seedling germination and leader growth. If lower elevation conifers establish in the whitebark pine zone, whitebark pine may become a seral species, dependent upon fire for continued regeneration and elimination of competitors. \r\n\r\nThe reference state consists of the most successionally advanced community phase (numbered 1.1) as well as other community phases that result from natural and human disturbances. Community phase 1.1 is deemed the phase representative of the most successionally advanced pre-European plant/animal community including periodic natural surface fires that influenced its composition and production. This phase is determined from the oldest modern day remnant forests and/or historic literature. \r\n\r\nAll tabular data listed for a specific community phase within this ecological site description represent a summary of one or more field data collection plots taken in communities within the community phase. Although such data are valuable in understanding the phase (kinds and amounts of ground and surface materials, canopy characteristics, community phase overstory and understory species, production and composition, and growth), it typically does not represent the absolute range of characteristics nor an exhaustive listing of species for all the dynamic communities within each specific community phase."
$id2
$id2$Overview
[1] "Approved"
$id2$Climate
[1] "The average annual precipitation ranges from 35 to 55 inches, and falls mostly in the form of snow from November to April. The mean annual air temperature ranges from 34 to 37 degrees Fahrenheit. The frost-free (>32F) season is 25 to 45 days, and the freeze-free (>28F) season is 35 to 60 days. \r\n\r\nMaximum and minimum monthly climate data for this ESD were generated using PRISM data (PRISM Climate Group, Oregon State University, http://prism.oregonstate.edu, created 4 Feb 2004.) and the ArcGIS ESD extract tool."
$id2$Physiography
list()
$id2$Soil
[1] "Loamy coarse sand" "Coarse sand"
$id2$Ecology
[1] "Abiotic Features: \r\n\r\nThis ecological site occurs in the highest elevations of the northern subalpine LRU, typically between 9,000 and 10,000 feet on north facing mountain slopes. Soils are derived from granitic parent material, and are moderately deep over paralithic granitic bedrock, with a sandy skeletal particle size class. North-facing aspects hold snow for longer into the summer, providing additional moisture that allows mountain hemlock to be co-dominant or dominant over whitebark pine. \r\n\r\nEcological Features: \r\n\r\nThe high elevations in which this site occurs are buried with deep snow from November to June and remain cool for most of the year. Several physiological adaptations allow mountain hemlock and white bark pine to survive in this cold environment. Both species have maximum photosynthetic rates at colder temperatures than lower elevation trees, and close stomata to reduce water loss during dry or cold periods (Smith and Hinckley 1995). The tips of mountain hemlock branches are very flexible, an attribute that reduces snow build-up and stem breakage. Snow burial can be helpful in protecting trees from strong winter winds, desiccation from warm winter winds and sunny winter days, extreme cold, and repeated freezing and thawing (Arno and Hammerly 1984). Snow burial can, however, be detrimental as well. For example, portions of trees exposed above the snow can die back, leaving short multi-stemmed trees. Snow creep can create J shaped tree trunks, and avalanches can destroy swaths of forest. \r\n\r\nTimberline trees are able to withstand extremely cold winter conditions when they are dormant, but need at least a 2 to 3-month frost free growing period in the summer. Leaves, shoots, cones, and new seedlings develop during this short growing season, typically from mid-June through August. As elevations increase, temperatures drop and the growing season is shortened. Growing season length is one of the limiting factors to determine treeline. Another is wind. Wind induced treelines can be caused by drought conditions, due to increased evapotranspiration (Tomback et al. 2001). \r\n\r\nWhitebark pine is a long-lived timberline tree species that grows 40 to 60 feet tall in favorable conditions. The cones are indehiscent, meaning they do not open at maturity. Caching of whitebark pine seeds by Clark’s nutcracker is the primary mode of seed dispersal. Seeds are often cached in open areas that are suitable for young seedlings. If all seeds are not consumed, they give rise to dense clusters of genetically similar whitebark pine. These clusters appear to be one tree with many stems, but are more often individual trees (Burns et al. 1990, Tomback et al. 2001a). In the absence of disturbance, ongoing recruitment from seed-caching occurs, leading to an increase in stand density over time.\r\n\r\nWhite bark pine germination and seedling survival is best in canopy openings, such as those created by small fires. This is especially important in areas where whitebark pine develops dense canopies or can be replaced by shade tolerant conifers, as in the northern Cascades and the Rocky Mountains (Arno and Hoff 1990, Tomback et al. 2001, Howard 2002) and the cool, north-facing slopes of this ecological site. The slow growing, shade-tolerant mountain hemlock will gradually gain dominance over whitebark pine in the absence of fire or other disturbance in this ecological site. \r\n\r\nDisturbance features: \r\n\r\nFire and avalanche are the primary natural drivers for succession in this site. Fire ignition is frequent on these exposed ridges and mountain peaks, but there is minimal and discontinuous fuel to carry large or hot fires. Small fires may play a minor role in maintaining openings that favor the germination and survival of young whitebark pine seedlings (Burns et al. 1990, Tomback et al. 2001, Howard 2002). Avalanche is common among the alpine peaks and ridges, and can remove swaths of vegetation in avalanche prone chutes or below wind formed cornices. \r\n\r\nNatural fire return intervals for whitebark pine and mountain hemlock forests in the Sierra Nevada are poorly documented. Fire occurrence for mountain hemlock in the Pacific Northwest may range from 400 to 800 years, and is typically stand replacing (Tesky 1992). However, the Pacific Northwest is much wetter and has a different stand structure than mountain hemlock in the Sierra Nevada. The mean fire return intervals for whitebark pine forest across the US range from 29 to 300 years, while moderate severity fires range from 25 to 75 years, and stand replacing fires occur at greater than 140 year intervals (Fryer 2002). These whitebark pine studies are primarily from areas where whitebark pine forms continuous forests, rather than the small, open stands typically found in the Sierra Nevada. \r\n\r\nWhitebark pine forests are threatened by the non-native Cronartium ribicola, the cause of white pine blister rust (WPBR) and the native mountain pine beetle (Dendroctonus ponderosae) (Cox 2000, Tomback et al. 2001b, Howard 2002). Severe epidemics of WPBR in combination with MPB outbreaks have killed large areas of forest in the Rocky Mountains, but the whitebark pine forests in the Sierra Nevada have not suffered as high mortality. There is a complex interaction between MPB outbreaks, WPBR infection, and climate. Mountain pine beetles prefer larger diameter trees (> 6 inch diameter at breast height), as these are necessary to complete their life cycle, and attack at the warmer, lower elevation zone of whitebark pine. Mountain pine beetles preferentially attack trees infected by WPBR. White pine blister rust will infest all whitebark pines, regardless of age or elevation (Cluck 2014). \r\n\r\nMountain pine beetles are a native species in North American forests, but warmer temperatures have shifted the thermal zone for mountain pine beetles upslope, subjecting higher elevations of whitebark pine to beetle attacks (Craig 2010, Keane et al. 2012, Keane and al 2013). Severe mountain pine beetle epidemics cause high mortality of overstory trees, while understory suppressed trees may be released (Meyer and Safford 2014). A flush of regeneration may occur due to the reduction in the overstory canopy. However, the decline in seed production due to the loss of large overstory trees will leave fewer seeds available for regeneration, threatening stand sustainability.\r\n\r\nThe non-native WPBR was introduced into North America near Vancouver, British Columbia in approximately 1910, and has been slowly spreading across the western United States and Canada (Maloney 2011). It currently occurs throughout the Cascades, and north and central Sierra Nevada. So far, it has not been detected on whitebark pine in the southern extent of the Sierra Nevada, but has been found on a whitebark pine in Yosemite National Park and in a high Sierra location on the western slope of the Sierra National Forest (Maloney 2011). A survey was conducted in 2009 to determine WPBR presence and affect on whitebark pine survivorship in the Lake Tahoe Basin. Mean incidence of WPBR among whitebark pine populations was 35 percent, with a range of 1 to 65 percent (Maloney et al. 2012). \r\n\r\nIn order for WPBR to infect whitebark pine several synchronous phenological and environmental factors need to occur. For infection to occur in five-needled white pines, relative humidity has to be greater than 90 percent, temperatures have to be between 35.6 and 64.4 degrees F (2 to 18 degrees C), and stomates need to be open to allow WPBR entry (Maloney 2011). The basidiospores, which infect whitebark pine, are released in fall from the alternate host currants (Ribes sp.), or less commonly, lousewort or Indian paintbrush (Pedicularis or Castilleja sp.). These spores do not travel far or last long in the environment, and years with late summer or early fall precipitation are most likely when infection will occur. Whitebark pine may have early onset winter dormancy, so stomates are closed at the time WPBR basidiospores are released (Maloney 2011). The onset of winter dormancy is dependent upon the length of the growing season (temperature), precipitation and soil available water capacity (AWC).\r\n\r\nThere appears to be a relationship between soils with higher AWC and higher infection rates or intensity of stem girdling (Maloney et al. 2012). Higher soil moisture could increase WPBR mycelium growth rates and increase basidiospore production, while also allowing for whitebark pine stomates to remain open longer in the season, increasing the probability of infection (Maloney et al. 2012). This ecological site occurs on shallow to moderately deep sandy-skeletal soils, with lower AWC than the corresponding volcanic ecological site (R022AC200CA), and is likely less susceptible to WPBR infestation. A 2009 inventory of WPBR showed that the whitebark stands occurring on granitic soils had infestation rates ranging from 1 to 56% ( 22% average), while stands on volcanic soils ranged from 34 to 65%(with an average of 49%(Maloney et al. 2012). \r\n\r\nThe main impact of WPBR on whitebark pine is reduction in stand cone production due to die-back of cone bearing branches from cankers girdling the branches. Mortality rates in older trees are low, and may take decades to occur. Younger trees may be killed quickly if main stem girdling causes disruption of water flow (Maloney et al. 2012). A few studies have been conducted on genetic resistance to WPBR, and results range from no resistance (Maloney, personal communication), to 26 to 47 percent in the Rocky Mountains and the Pacific Northwest (Keane et al. 2012). \r\n\r\nReduced seed production affects the presence and abundance of Clark’s nutcracker, and thus the number and distribution of seed caches (Tomback and Resler 2007, Keane et al. 2012). This can lead to recruitment below the threshold required to sustain populations (McKinney et al. 2009). \r\n\r\nMountain hemlock is not susceptible to WPBR or MPB, but trees over 80 years old are very susceptible to laminated root rot (Phellinus weirii). Laminated root rot can rapidly spread by root contact and kill acres of forests (Tesky 1992). \r\n\r\nReestablishment of mountain hemlock after a fire or other disturbance is often slow, and in some areas growth never regains its tree-like stature (Arno and Hammerly 1984). Mountain hemlock has relatively thick bark, but typically has dense, low branches that make the trees susceptible to canopy fires. Mountain hemlock has higher cone production, seed germination and seedling survival rates during years of higher precipitation. Mountain hemlock can also reproduce by layering. Mountain hemlock seeds are wind dispersed and germinate on the snow or soil surface. Seedlings do best with partial shade from whitebark pine or older mountain hemlocks. \r\n\r\nPredictions about climate change due to global warming suggest that the whitebark pine communities in the Sierra Nevada Mountains may be threatened by rising temperatures and precipitation changes. Recent California based climate models predict a 9 degree F increase in temperature by 2100, and broader models predict a 2 to 4 degree F increase in winter and 4 to 8 degree increase in summer (Safford et al. 2012). Models are more variable for precipitation, but local models for the Sierra Nevada predict similar to slightly less precipitation. Most models agree that summers will become drier, since more of the precipitation is predicted to come as rain, and snow melt-off will occur earlier in spring (Hayhoe et al. 2004, Safford et al. 2012). Presently a severe drought is occurring in the Sierra Nevada, with 10 to 30 percent of average precipitation and very little snow accumulation. Whether this is climate driven, and thus will become more of the future normal, remains to be seen. \r\n\r\nHigh elevation areas with suitable soils and landforms for the upward migration of mountain hemlock and whitebark pine will be important for the sustainability of this community. However, in this region of the central Sierra Nevada, whitebark pine already occurs at the uppermost elevations of the highest mountains in the area, so has little room to move upslope. The southern Sierra Nevada, with its higher mountain peaks, may prove to be an important refugium for this species. Mountain hemlock has more room to migrate as it occurs further to the north, and at lower elevations than whitebark pine. The southern Sierra Nevada is typically too dry for extensive mountain hemlock forest. \r\n\r\nThe historic temperature range for this ecological site is between 34 to 37 degrees F. With moderate warming on these northern aspects California red fir (Abies magnifica) is the most likely conifer to move into the area occupied by this ecological site. \r\n\r\nThe reference state consists of the most successionally advanced community phase (numbered 1.1) as well as other community phases that result from natural and human disturbances. Community phase 1.1 is deemed the phase representative of the most successionally advanced pre-European plant/animal community including periodic natural surface fires that influenced its composition and production. This phase is determined from the oldest modern day remnant forests and/or historic literature. \r\n\r\nAll tabular data listed for a specific community phase within this ecological site description represent a summary of one or more field data collection plots taken in communities within the community phase. Although such data are valuable in understanding the phase (kinds and amounts of ground and surface materials, canopy characteristics, community phase overstory and understory species, production and composition, and growth), it typically does not represent the absolute range of characteristics nor an exhaustive listing of species for all the dynamic communities within each specific community phase."

Related

How to perform a Chi^2 Test with two data tables where x and y have different lengths

So I have the following tables (simplified here):
this is Ost_data
Raumeinheit
Langzeitarbeitslose
Hamburg
22
Koln
45
This is West_data
Raumeinheit
Langzeitarbeitslose
Hamburg
42
Koln
11
Ost_data has 76 rows and West_data has 324 rows.
I am tasked with proving my hypothesis that the Variable "Langzeitarbeitslose" is statistically, significantly higher in Ost_data than in West_data. Because that variable is not normally distributed I am trying to use Pearson's Chi Square Test.
I tried
chisq.test(Ost_data$Langzeitarbeitslose, West_data$Langzeitarbeitslose)
but that just retuns that it can't be performed because x and y differs in length.
Is there a way to navigate around that problem and perform the Chi Square test regardless with my two tables which have varying lengths?
Pearson's ChiSq test is when the rows are measuring the same thing. It sounds like here your rows are just measuring some quantity on repeated samples, so you should use a t-test.
t.test(Ost_data$Langzeitarbeitslose, West_data$Langzeitarbeitslose)
The most important aspect of your variable "Langzeitarbeitslose" (longtime-unemployed)is not whether it is normally distributed but its scale-level. I assume it is a dichotomous variable (either yes or no).
t-Test needs interval-scale
wilcoxon test needs ordinal scale
chi-square test works for nominal (and therefore also for dichotomous) data
If you have both, the number of long-time unemployed and the number of not-longtime-unemployed per city you can compare the probability of being unemployed in the east and west.
l_west <- absolute number of longtime-unemployed in the west
l_ost <- absolute number of longtime-unemployed in the east
n_west <- absolut number of observed people (unemployed or not) in the west
n_ost <- absolut number of observed people (unemployed or not) in the east
N <- n_west + n_ost # absolut number of observations
chisq.test(c(l_west,l_ost),p=c(n_west/N, n_ost/N))
# this tests whether the relative frequency of unemployment in the east (l_ost / n_ost)
# differs from the equivalent rel. frequency in the west (l_west / n_west)
# while considering the absolut number of observeations in east and west
I know the words Ost (east), West, and Langzeitarbeitslose (unemployed). I know that Hamburg and Köln are in the west and not in the east (therefore they should not appear in your "Ost_data"). Somebody who does not know this cannot help you. --> Bear this in mind in the future.
Best,
ajj

Why am I unable to load "Groceries" data set in R?

I am unable to load Groceries data set in R.
Can anyone help?
> data()
Data sets in package ‘datasets’:
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales) Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock Indices,
1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers' Ratings of State Judges in the US Superior Court
USPersonalExpenditure Personal Expenditure Data
UScitiesD Distances Between European Cities and Between US Cities
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World's Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines, 1937-1960
airquality New York Air Quality Measurements
anscombe Anscombe's Quartet of 'Identical' Simple Linear
Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of Australian
Residents
beaver1 (beavers) Body Temperature Series of Two Beavers
beaver2 (beavers) Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student's 3000 Criminals Data
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
euro.cross (euro) Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and Between US Cities
faithful Old Faithful Geyser Data
fdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
freeny Freeny's Revenue Data
freeny.x (freeny) Freeny's Revenue Data
freeny.y (freeny) Freeny's Revenue Data
infert Infertility after Spontaneous and Induced Abortion
iris Edgar Anderson's Iris Data
iris3 Edgar Anderson's Iris Data
islands Areas of the World's Major Landmasses
ldeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
lh Luteinizing Hormone in Blood Samples
longley Longley's Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at Nottingham, 1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their Sons
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US Presidents
pressure Vapor Pressure of Mercury as a Function of Temperature
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator RANDU
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student's Sleep Data
stack.loss (stackloss) Brownlee's Stack Loss Plant Data
stack.x (stackloss) Brownlee's Stack Loss Plant Data
stackloss Brownlee's Stack Loss Plant Data
state.abb (state) US State Facts and Figures
state.area (state) US State Facts and Figures
state.center (state) US State Facts and Figures
state.division (state) US State Facts and Figures
state.name (state) US State Facts and Figures
state.region (state) US State Facts and Figures
state.x77 (state) US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to "Present"
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic Indicators (1888) Data
treering Yearly Treering Data, -6000-1979
trees Diameter, Height and Volume for Black Cherry Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland's Maunga Whau Volcano
warpbreaks The Number of Breaks in Yarn during Weaving
women Average Heights and Weights for American Women
Use ‘data(package = .packages(all.available = TRUE))’
to list the data sets in all *available* packages.
> head(Groceries)
Error in head(Groceries) : object 'Groceries' not found
> groceries <- data(Groceries)
Warning message:
In data(Groceries) : data set ‘Groceries’ not found
> library(datasets)
> groceries <- data(Groceries)
Warning message:
In data(Groceries) : data set ‘Groceries’ not found
>
Groceries is in the arules package.
install.packages("arules")
library(arules)
data(Groceries)

How do I extract all the text in a bibliography that is within quotation marks in R?

I need to extract the journal titles from a bibliography list. The titles are all within quotation marks.
So is there a way to ask R to extract all text that is within parenthesis?
I have read the list into R as a text file:
"data <- readLines("Publications _ CCDM.txt")"
here are a few lines from the list:
Andronis, C.E., Hane, J., Bringans, S., Hardy, G., Jacques, S., Lipscombe, R., Tan, K-C. (2020). “Gene validation and remodelling using proteogenomics of Phytophthora cinnamomi, the causal agent of Dieback.” bioRxiv. DOI: https://doi.org/10.1101/2020.10.25.354530
Beccari, G., Prodi, A., Senatore, M.T., Balmas, V,. Tini, F., Onofri, A., Pedini, L., Sulyok, M,. Brocca, L., Covarelli, L. (2020). “Cultivation Area Affects the Presence of Fungal Communities and Secondary Metabolites in Italian Durum Wheat Grains.” Toxins https://www.mdpi.com/2072-6651/12/2/97
Corsi, B., Percvial-Alwyn, L., Downie, R.C., Venturini, L., Iagallo, E.M., Campos Mantello, C., McCormick-Barnes, C., See, P.T., Oliver, R.P., Moffat, C.S., Cockram, J. “Genetic analysis of wheat sensitivity to the ToxB fungal effector from Pyrenophora tritici-repentis, the causal agent of tan spot” Theoretical and Applied Genetics. https://doi.org/10.1007/s00122-019-03517-8
Derbyshire, M.C., (2020) Bioinformatic Detection of Positive Selection Pressure in Plant Pathogens: The Neutral Theory of Molecular Sequence Evolution in Action. (2020) Frontiers in Microbiology. https://doi.org/10.3389/fmicb.2020.00644
Dodhia, K.N., Cox, B.A., Oliver, R.P., Lopez-Ruiz, F.J. (2020). “When time really is money: in situ quantification of the strobilurin resistance mutation G143A in the wheat pathogen Blumeria graminis f. sp. tritici.” bioRxiv, doi: https://doi.org/10.1101/2020.08.20.258921
Graham-Taylor, C., Kamphuis, L.G., Derbyshire, M.C. (2020). “A detailed in silico analysis of secondary metabolite biosynthesis clusters in the genome of the broad host range plant pathogenic fungus Sclerotinia sclerotiorum.” BMC Genomics https://doi.org/10.1186/s12864-019-6424-4
try something like this:
library(stringr)
str_extract_all(x, "“.*?”") %>% .[[1]]
if you want to remove quotation from result add this at the end of pipeline:
str_remove_all("[“”]")
Output:
[1] "Gene validation and remodelling using proteogenomics of Phytophthora cinnamomi, the causal agent of Dieback."
[2] "Cultivation Area Affects the Presence of Fungal Communities and Secondary Metabolites in Italian Durum Wheat Grains."
[3] "Genetic analysis of wheat sensitivity to the ToxB fungal effector from Pyrenophora tritici-repentis, the causal agent of tan spot"
[4] "When time really is money: in situ quantification of the strobilurin resistance mutation G143A in the wheat pathogen Blumeria graminis f. sp. tritici."
[5] "A detailed in silico analysis of secondary metabolite biosynthesis clusters in the genome of the broad host range plant pathogenic fungus Sclerotinia sclerotiorum."

Using Regular Expression in R

Hi I am trying to extract a single sentence from a paragraph in R
"[report_beginning]
101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center
_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.
ALLERGIES:  none
SOCIAL HISTORY:  The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.
PHYSICAL EXAMINATION:  Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air.  General:  This is a patient in severe distress.  
EMERGENCY DEPARTMENT COURSE:  I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.
DISPOSITION:  The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication.
And so this is one cell. I have a column full of data like this and I want to extract a single line. "PHYSICAL EXAMINATION:  Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air."
How can I do this with Regular expression in R?
I have been using the following code but it doesn't work. It gives me an empty dataset
x=grep("Blood pressure .+ air. ", ed_dia, value = TRUE)
I'm assuming that "[report begiinning is not actually in the data file, so opening a text connection to read the file should succeed:
txt <- "101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center
_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.
ALLERGIES: Â none
SOCIAL HISTORY: Â The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.
PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â
EMERGENCY DEPARTMENT COURSE: Â I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.
DISPOSITION: Â The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. "
inp <- readLines( textConnection(txt))
So after data input it only remains to use grep to identify the lines with "PHYSICAL EXAMINATION" (I wasn't sure if the space may needed special regex-handling) in them and then use "[" to extract from the multiple lines:
inp[ grep("PHYSICAL[ ]EXAMINATION", inp)]
#[1] "PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â "

Association between point distribution and a continuous variable using R

I have a data set consisting of the location of trees and measurements of soil C (SOC). All points (trees and SOC) have x(0,50), y(0,50) coordinates. First, I would like to check to see if the proximity of trees (points) influences the amount of SOC (continuous variable). Second, I’d like to do the same analysis using only a subset of the trees (say, only trees with a diameter of >20 cm (dbh)). Can this be done in R using package 'spatstat' or 'ads'? I’ve looked around, but haven’t been able to find any solution to this problem yet. Any pointers would be greatly appreciated!
Example from (Simon et al. 2013): http://postimg.org/image/goks26xr5/
Data:
library(spatstat)
soc<-data.frame(x=c(0,5,5,5,5,5,10,10,10,10,10,10,10,15,15,15,15,15.1,15.9,15,15,15,20,20,20,20,21,20,20,20,25,25,25,25,23,25,25,25,25,30,30,31.5,30,33,30,30,30,30,35,35,35,35,35,35,35,35,35,40,40,40,40,40,40,40,45,45,45,45,45,50),
y=c(25,35,30,25,20,15,40,35.2,30,25,20,15,10,45,40,35,30,25,20,15,10,5,45,40,35,30,25,20,15,10,5,50,45,40,35,30,25,20,15,10,5,0,45,40,35,30,25,20,15,10,5,40,35,30,25,20,15.5,40,35,30,25,20,15,10,35,30,25,20,15,25),
zsoc=c(2,3,4,5,6,1,2,3,4,5,2,3,4,5,3,5,6,3,4,5,3,4,5,6,8,3,4,1,3,2,5,3,2,4,6,2,4,1,1,1,1,1,1,2,3,4,1,2,3,8,1.5,2,3,4,2.3,4,5,3,4,5,6,7,8,2,1,1,1,1,1,2))
tree<-data.frame(x=c(24,18,11,9,7,6,11,11,15,13,15,22,27,29,22,20,27,28,36,34,33,32,33,42,47,47,46,46,46,43,41,35,36,37,35,35,35,34,34,33,34,34,34,33,31,29,30,29,29),
y=c(28.8,31.2,32.0,24.0,18.4,17.6,13.1,11.9,11.1,5.8,3.6,1.5,8.3,13.3,15.7,17.3,19.0,19.1,14.4,10.8,6.1,4.9,2.7,2.7,11.3,11.8,12.3,10.1,19.9,24.4,23.0,25.6,31.0,34.6,36.5,36.9,36.8,38.4,35.6,37.0,39.6,39.5,41.6,41.8,39.7,41.1,35.9,35.8, 35.0),
zdbh=c(15,49,53,53,43,32,34,46,50,32,56,32,48,42,53,52,34,47,39,48,38,36,17,33,25,21,10,11,50,36,47,50,47,12,7,8,6,6,9,16,23,8,8,21,6,10,6,21,11))
soc <- ppp(soc[,1], soc[,2], c(0,50), c(0,50), marks=soc[3], unitname=c("meter"))
tree <- ppp(tree[,1], tree[,2], c(0,50), c(0,50), marks=tree[3], unitname=c("meter"))
Hope this works!
Example reference: Simón, N., Montes, F., Díaz-Pinés, E., Benavides, R., Roig, S., Rubio, A., 2013. Spatial distribution of the soil organic carbon pool in a Holm oak dehesa in Spain. Plant and Soil 366(1-2), 537-549.

Resources