querying_resources.Rmd
The getting started page covers a relatively simple query where we’re trying to just get a single list of results. More complex queries may require a bit more thought.
NB I won’t show some actual data outputs as there are some hideously long ID strings in there.
What if we wanted to find projects linked to a specific institution? Let’s (completely randomly) say we’d like to take a look at the University of Exeter. To start with, we need to know the ID for Exeter to specify which institution we’re interested in when we send the request. First we need to know how we can potentially narrow down results to make searching easier.
configuration <- get_configs(resource = "organisation") #because we want details about an organisation
#now we want to see field details for that resource:
configuration$fields
#> code description searchable sortable searchedByDefault
#> 1 org.n Organisation Name TRUE TRUE TRUE
#> 2 org.pro.t Project Titles TRUE FALSE FALSE
#> 3 org.orcidId ORCID iD TRUE FALSE TRUE
#> 4 org.pro.a Project Abstracts TRUE FALSE FALSE
#> 5 score Relevance FALSE TRUE FALSE
Great - we can see from the above that the org.n
field can be searched and that this corresponds to the organisation name. We can also see org.n
is searched by default so we don’t need to specify it using the s_fields
argument.
data <- get_resources(
resource = "organisation", #because we're looking for an organisation
search_term = "Exeter" # to try and narrow down the field of results without being too specific
#no more arguments as we don't want to narrow down the results any further
)
Luckily, here, the University is in the first few results. But it might not have been and we got >1,200 rows of data. This could’ve been narrowed down by exploring fields, eg, perhaps using dplyr::filter()
to search for ‘University’ in the name
field. But the important thing is the id
column gives us what we’re after.
To find projects we need to specify an output
as well as a resource (if you’re only interested in a single resource/type of entity, you don’t need to specify output
or resource_id
arguments).
#there are a LOT of projects related to Exeter so this query will take a while to
data <- get_resources(
resource = "organisation", #because the University of Exeter is an organisation
output = "project", #because we want to find projects associated with UoE
resource_id = "961756BF-E31F-4A13-836F-0A09BA02385C", #we found this ID in the 'getting started' example,
page_nums = 1, # only specifying one page number as to keep number of results small
size = 10 #as above
#we won't worry about other arguments as we're happy to leave them blank on this occasion
)
And you have your results!
There are limitations to the above process. For example, you may have your list of projects, but Exeter is involved in a lot. If you’re only interested in a specific thing, this might be a lot of unnecessary data.
Of course, you could use other functions to filter/subset the dataframe you get to land on the bits you’re interested in. But alternatively, you could use your results to run the projects back through get_resources()
, but not apply a project-specific search-term.
In some instances you may have no option but to take this approach. Specifically, if you want to see the outcomes of a set of projects. You can’t directly look at ‘outcomes from UoE projects’, you have to find the projects first, then get their outcomes in a separate step.
#get the project ids you're interested in
project_ids <- data$id
#now run them back through `query_combination_all()`
#we're using `purrr::safely()` as some projects may not have publications, in which case the url we're trying doesn't exist
#this way we will 'catch' the error rather than see the function fail
publications <- purrr::map(
.x = project_ids,
.f = purrr::safely(function(x) get_resources(resource = "project", output = "publication", resource_id = x))
) |>
purrr::map_df("result")