Written by Hilary Kraus and Zahra Premji
Why is proximity searching valuable?
In systematic searching, there is an inherent tension between sensitivity and precision. According to the Cochrane Handbook, "Searches for systematic reviews aim to be as extensive as possible in order to ensure that as many of the relevant studies as possible are included in the review. It is, however, necessary to strike a balance between striving for comprehensiveness and maintaining relevance when developing a search strategy." (Chapter 4, Section 4.4.3: Sensitivity versus precision)
One strategy for achieving this balance is the use of proximity operators. As explained in the Cochrane Handbook's Technical Supplement to Chapter 4, "Use of proximity operators helps to ensure that searches are more sensitive than would be the case with direct adjacency or phrase searching, and can also facilitate ease of searching where there are multiple possible variations of a phrase which would otherwise need to be typed in full." (Section 3.5: Proximity operators (NEAR, NEXT, and ADJ, pages 68-69)
The corollary to Cochrane’s point about proximity increasing sensitivity is that it can also increase precision. Requiring search strings to be near one another is less precise than phrase searching but more precise than simply combining the search strings with AND. Thus proximity can contribute in multiple ways to effective systematic searches.
Proximity searching 101
What does a proximity operator do?
Proximity operators require a permissible maximum distance (a number of words, defined as x in this post) between two search strings (in this post, string A and string B). Instead of a phrase ("mobile applications") or a Boolean AND (mobile AND applications), proximity operators indicate that mobile and applications may be near, but not necessarily next, to one another. Most database platforms offer some form of proximity searching.
What operators are available and how are they executed?
A platform may have multiple operators that execute different types of proximity. Two common examples are fixed proximity (string A only before string B) and flexible proximity (string A before string B or string B before string A). EBSCOhost, for example, uses Wx for fixed proximity (mobile W2 applications) and Nx (mobile N2 applications) for flexible proximity.
Platforms may offer the same proximity operator functionality but use different syntax to execute it. In the Cochrane Library, NEAR/x (mobile NEAR/3 applications) is used for flexible proximity, while in Scopus the syntax used is W/x (mobile W/3 applications), but the two operators execute the same function. Platforms may lack certain options altogether. Ovid does not offer a fixed proximity operator that is exactly equivalent in functionality to EBSCOhost's Wx.
How is the distance between strings indicated?
In most, but not all, cases, operators are used in conjunction with an integer, represented in our examples by x, used to define the number of words between strings.
In most databases covered in this post, x is equivalent to that number. In EBSCOhost (both classic and the new user interface), Nx is used for strings near one another in flexible order. So mobile N3 applications finds mobile next to applications, or with one word in between, with two words in between, or with three words in between; applications can be either before mobile or after it, as long as there are no more than three words between them. Thus N3 = a maximum distance of three words between strings.
There are three outliers in this post: the Cochrane Library (Wiley), Embase.com, and Ovid. In these databases, the x is implemented as x-1. In Ovid, ADJx is used for strings near one another in flexible order. mobile ADJ4 applications performs the same function as EBSCOhost's N3, but the integer x must be one number greater.
Some databases have a default value if x is not indicated (the default number varies), while others treat a proximity operator without a stated x value as a simple text word search.
Where do the proximity operators go?
Proximity operators, like Boolean operators, are typically placed between search strings. In Web of Science, the operator goes between the words. A search of the title field for mobile and applications, with a flexible proximity of a maximum of 5 words, would be structured as: TI=(mobile NEAR/5 applications)
The less common placement is one in which the proximity operator comes after the search strings. PubMed is one of the few databases that uses this method. For an equivalent proximity search to the example above, the PubMed search would look like this: "mobile applications"[ti:~5]
Can we get down to the nitty gritty now?
Definitely! Following are examples of how to execute proximity searches in several commonly used databases. Each database name links to its relevant search documentation.
Two important notes:
- Unless otherwise stated, we use x to represent the numerical value applied to the proximity operator.
- In most databases, the operators are not case sensitive, so uppercase ADJ or lowercase adj are equivalent; we have used uppercase letters (e.g., ADJ) to make the operator stand out better. The only database that requires proximity operators to be in uppercase is Scopus.
NEAR/x - string order is flexible; default distance is within 6 words, so NEAR with no /x = NEAR/6; NEAR/0 is an invalid search.
mobile NEAR/4 applications
When an x of one – NEAR/1 – is used, the two strings must be next to one another with zero words in between, but order remains flexible.
NEXT - string order is fixed. No /x is permitted, and zero words may appear in between strings. Thus it can only perform a search for string A string B. However, using NEXT allows the use of truncation or wildcards, which Cochrane’s traditional phase searching ("mobile applications") does not.
Both NEXT and NEAR can be used with truncation, with phrases, and in nested constructions:
mobile NEAR/3 (phone* OR device*)
(mobile OR "smart phone") NEXT (application* OR device*)
Database documentation does not clarify whether stop words are counted when determining distance between strings in proximity searches.
Nx - N stands for Near; string order is flexible.
When an x of zero – N0 – is used, the two strings must be next to one another with zero words in between, but order remains flexible.
Wx - W stands for Within; string order is fixed.
When an x of zero – W0 – is used, the two strings must be next to one another with zero words in between, string A and string B must appear in fixed order. This is useful in cases where one or both strings contain several multiple terms connected by OR.
iphone W0 (application* OR software OR "app store")
Both Nx and Wx can be used with truncation, with phrases, and in nested constructions:
mobile N3 (phone* OR device*)
(mobile OR "smart phone") W3 (application* OR device*)
Stop words are counted when determining distance between strings in proximity searches.
NEAR/x - string order is flexible; x must be at least 1. If x is not included (e.g., NEAR), the operator will be searched as a text word.
mobile NEAR/4 applications
When NEAR/1 is used, the two strings must be next to one another with zero words in between, but order remains flexible.
NEXT/x - string order is fixed; x must be at least 1.
mobile NEXT/4 applications
When NEXT/1 is used, the two strings must be next to one another with zero words in between, still in fixed order.
Both NEAR/x and NEXT/x can be used with truncation, with phrases, and in nested constructions:
mobile NEAR/4 (phone* OR device*)
(mobile OR “smart phone”) NEXT/4 (application* OR device*)
Database documentation does not clarify whether stop words are counted when determining distance between strings in proximity searches.
We include Lens.org here because it's the only open multidisciplinary source that offers a proximity operator.
Lens is one of the two platforms in our list that places the proximity operator at the end, rather than between strings. (PubMed is the other.) As such, it offers less flexibility, since it removes the possibility of phrase searching. All text within the proximal search is treated as a set of individual words.
The syntax for proximity searches is: "search terms"~x
A detailed breakdown of the syntax follows:
"search terms
" - order of words enclosed in quotation marks may be fixed or flexible depending on value of
x; two or more words are permitted; no phrase searching is possible; Lens
automatically stems at the root of a word (unless stemming is turned off in the 'Query tools' tab).
x - maximum number of words appearing between search terms.
The minimum value of x is zero. When ~0 is used, the words enclosed in quotation marks must be next to one another with zero words in between, and the order is fixed. ~0 operates like a phrase search and is equivalent to using no proximity operator. Thus, these two searches are equivalent in Lens:
~ is a flexible proximity operator only if x ≥ 2+2(n-2), where n is the number of words enclosed in quotation marks. For example, both of these expressions have flexible proximity:
"mobile health applications"~4
"smart phone health applications"~6
~ is a fixed proximity operator if x<2+2(n-2), where n is the number of words enclosed in quotation marks. These expressions have fixed proximity:
"mobile health applications"~3
"smart phone health applications"~5
Stop words are ignored when determining distance between strings in proximity searches.
Special considerations:
We have concerns about the consistency of the flexible proximity operator in the Lens. We saw differences in retrieval when searching for the same two words in reverse order, despite applying the correct x value. For this reason, we urge caution when using flexible proximity here.
Please note: while ADJx and ADJ are syntactically similar, they are explicitly defined as two different proximity operators by Ovid's documentation, so we have followed suit.
ADJx - ADJ stands for Adjacency, in this case what Ovid calls "defined adjacency"; string order is flexible; x must be at least 1.
When ADJ1 is used, the two strings must be next to one another with zero words in between, but order remains flexible.
Ovid does not offer a fixed order proximity operator that allows a distance of 1 or more words.
ADJ - ADJ stands for Adjacency; string order is fixed. No x is permitted, and zero words may appear in between strings. Thus it can only perform a search for string A string B. Ovid doesn't require "quotation marks" for phrases, as it automatically inserts an understood ADJ within a string, but ADJ can be used to execute proximity between nested strings.
mobile ADJ applications is equivalent to mobile applications
As x in Ovid is x-1 on most platforms, the following two proximity constructions are equivalent:
Ovid: mobile ADJ applications
EBSCOhost: "mobile" W0 "applications"
Both ADJx and ADJ can be used with truncation, with phrases, and in nested constructions:
mobile ADJ4 (phone* OR device*)
(mobile OR “smart phone”) ADJ (application* OR device*)
NEAR/x - string order is flexible; default distance is within 4 words, so NEAR or NEAR/ with no x = NEAR/4. N/x also works, but requires an x value; N and N/ are both invalid.
mobile NEAR/3 applications
PRE/x - string order is fixed; default distance is within 4 words, so PRE or PRE/ with no x = PRE/4. P/x also works, but requires an x value; P and P/ are both invalid.
mobile PRE/3 applications
To search for near or pre as text word rather than as a proximity operator, enclose it in quotation marks: "near" or "pre"
Both NEAR (or NEAR/ or N/x) and PRE (or PRE/ or P/x) can be used with truncation, with phrases, and in nested constructions:
mobile NEAR/3 (phone* OR device*)
(mobile OR “smart phone”) PRE/3 (application* OR device*)
Database documentation states, "There are no stop words within the ProQuest platform. However, the natural language processing used by the search engine will naturally filter out certain 'overabundant' words as being irrelevant." Based on that, it seems unlikely that stop words will impact proximity searches.
PubMed is one of the two platforms in our list that places the proximity operator at the end, rather than between strings. (Lens is the other.) As such, it offers less flexibility, since it removes the possibility of phrase searching. All text within the proximal search is treated as a set of individual words.
The syntax for proximity searches in PubMed is, per their documentation: "search terms"[field:~N]
The equivalent using our conventions throughout this post would be something like "strings"[fieldcode:~x]. We'll stick with their terminology in this case, for clarity's sake, since it's such an outlier and being able to refer to the documentation without having to translate back to theirs will be beneficial to searchers.
A detailed breakdown of PubMed's syntax follows:
"search terms" - order of words enclosed in quotation marks is flexible; two or more words are permitted; no phrase searching is possible; no wildcards or truncation may be used. There is no fixed proximity search in PubMed.
N - maximum number of words appearing between search terms.
"mobile applications"[ti:~3]
When an N of zero – ~0 – is used, the two strings must be next to one another with zero words in between, but order remains flexible.
"mobile applications"[ti:~0]
This use of
~0 has an added benefit: it works around the inadequacies of phrase searching in PubMed (see Kate Saylor's
Tip #24: PubMed's Phrase Index). In cases where a phrase is not yet in the phrase index, ~0 can be used to find the search terms directly next to one another.
In the [Affiliation] field only, an N of 1000 or less searches for the strings in the same affiliation, rather than all affiliations in the record.
"brown providence warren"[ad:~20]
Stop words within the quotation marks are treated as text words.
"mobile apps for exercise"[tiab:~4]
Stop words are also counted as part of N, so a search intended to find cancer of the bladder must have an N of at least 2. (Thanks to Tracy Shields for this tip and the example!)
"cancer bladder"[ti:~2]
Special considerations:
Because wildcards and truncation are not permitted in PubMed's proximity searching, an thorough search could require a lot of individual constructions. Paije Wilson has developed a technique to automate this process, which reduces the difficulty and tedium of trying to build such a search manually. You can use this Google Sheet, based on her work, which uses a formula to combine two lists of keywords, one each on the x and y axes, to autofill combinations of all the desired terms. The instructions for then combining the terms into a single string are for Notepad++ (PC), but should be adaptable to something like BBEdit (Mac).
W/x - W stands for Within; string order is flexible; an x value is required.
When an x of zero – W/0 – is used, the two strings must be next to one another with zero words in between, and string order remains flexible.
PRE/x - PRE stands for Preceding; string order is fixed; an x value is required.
mobile PRE/3 applications
When an x of zero – PRE/0 – is used, the two strings must be next to one another with zero words in between, string A and string B must appear in fixed order.
Both W/x and PRE/x can be used with truncation and wildcards, and in nested constructions.
mobile W/3 (phone* OR device*)
(mobile OR "smart phone") PRE/3 (application* OR device*)
Database documentation does not clarify whether
stop words are counted when determining distance between strings in proximity searches.
Special considerations:
Scopus has some additional detailed and complicated rules about proximity.
Scopus differentiates between an expression, or individual nested combination of strings, and a search, which may include multiple expressions. Remember that as you consider the additional rules below.
*An expression can only contain a single type of operator with matching x values. For example, TITLE-ABS-KEY(mobile W/3 health W/3 application*) is valid. An invalid expression combining different operators would be: TITLE-ABS-KEY(mobile PRE/3 health W/3 application*) An invalid expression using the same operator but different x values would be: TITLE-ABS-KEY(mobile W/3 health W/6 application*)
Neither W/x nor PRE/x can be combined in an expression containing AND or AND NOT. For example, the expression TITLE-ABS-KEY(mobile W/3 (applications AND health)) is invalid.
However, expressions with different proximity operators or x values can appear in the same search. For example, TITLE-ABS-KEY((mobile PRE/3 applications) AND (exercis* W/5 track*)) is valid.
*Finally, Scopus searches differently for strings with no quotation marks
(mobile applications AND health), with quotation marks
("mobile applications" AND health), and with curly brackets
({mobile applications} AND {health}). That's a whole other post on our blog: Kate Saylor's
Tip #9: Scopus - Loose vs. Exact Phrases! The important factor when it comes to proximity searching is that Scopus's documentation says you can't mix strings that use {curly brackets} in the same expression as strings that use either "quotation marks" or no quotation marks. Thus,
TITLE-ABS-KEY({mobile applications} W/25 {health}) is valid, but
TITLE-ABS-KEY("mobile applications" W/25 {health}) is invalid.
Extra special considerations:
*Two of the above special considerations for Scopus have been starred. This is because test searches found that at least one expression documented as having invalid parameters would, in fact, run. We reached out to Elsevier and received the following response:
"The Scopus search engine is designed to be robust and may attempt to interpret or process more complex queries rather than returning errors. However, only the behaviors and results described in the official documentation are tested, supported, and guaranteed. For critical or systematic searches, we recommend following the documented search syntax to ensure your results are consistent and reproducible."
NEAR/x - string order is flexible; default distance is within 15 words, so NEAR with no /x = NEAR/15; a search field must be specified (cannot be used in an "All Fields" search).
mobile NEAR/3 applications
NEAR cannot be combined in a query with AND. For example, the search TI=(mobile NEAR (applications AND health)) is invalid.
NEAR can be used with truncation, with phrases, and in nested constructions:
mobile NEAR/3 (phone* OR device*)
SAME - used in the Address field (AD) only; returns results in which all strings appear in the same address.
AD=(brown SAME providence SAME warren)
To search the word near or same as a text word rather than as a proximity operator, enclose it in quotation marks: "near" or "same"
Final Thoughts
Translating proximity is tricky for a number of reasons. x may not mean the same thing on two different platforms. Proximity operators may appear in the middle or at the end of an expression. Fixed adjacency may or may not be supported. A systematic searcher can only do their best to make a search on one platform as close as possible to the one they execute on another platform and clearly document the differences.
This is so much information. Can you just lay it out in a table?
References
Lefebvre C, Glanville J, Briscoe S, Featherstone R, Littlewood A, Metzendorf M-I, Noel-Storr A, Paynter R, Rader T, Thomas J, Wieland LS. Chapter 4: Searching for and selecting studies [last updated March 2025]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5.1 Cochrane, 2025. Available from cochrane.org/handbook.
Lefebvre C, Glanville J, Briscoe S, Featherstone R, Littlewood A, Metzendorf M-I, Noel-Storr A, Paynter R, Rader T, Thomas J, Wieland LS. Technical Supplement to Chapter 4: Searching for and selecting studies [last updated September 2024]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from cochrane.org/handbook.
Comments
Post a Comment