What the OpenURL data are
The OpenURL data consist of user logs. They contain information about what content has been requested by a user within a single session. They do not contain any information that may identify an individual and, the openly available data, contain no information that may identify an institution.
Data about requests not outcomes
All OpenURL requests made by end users via the Router at openurl.ac.uk are logged. In many cases they indicate which article(s) the user was attempting to find via their local OpenURL resolver. (This depends on which metadata are included in requests by the referring services).
The OpenURL Router redirects the user's request to the appropriate local resolver. This means that the OpenURL Router logs only requests – the outcome (i.e. whether the end user obtained a copy of the article, and from where) is unknown to the Router.
The OpenURL Router supports various types of requests other than links direct to local resolvers. These include the "lookup" requests (registry searches) and requests for the preferred button image to be used for each resolver. These requests are all logged, but they are not OpenURL requests and do not contain bibliographic metadata.
The OpenURL Router Data are thus data about traffic flowing through the UK OpenURL Router, and are sometimes known as activity data. The data are made publicly available so that other service providers may use them.
What data are made available
Before being made available in any form, the data are anonymised to remove data that may identify an individual institution or individual person. Then the data file is made available 'as is'. We refer to this file as the Level 1 data file. It includes resolver redirect requests and those "lookup" requests where no institution is identified. It excludes the button requests as these identify an institution. It may be used by anybody for any purpose that they believe will be useful, such as for analysis or to create services for UK Higher and Further Education.
The Level 2 data file contains data that have been processed further, i.e. all extraneous data have been removed leaving only redirection data. EDINA uses these data as the basis of a prototype recommender service for UK HE and makes them available for others to use.
The following table indicates the levels of data file generated and whether or not these are openly available.
Level | What's this? | What has been processed? | Is it available? |
---|---|---|---|
0 | Original log file Data | No processing (contains identifiable IP addresses and institutions) | No |
1 | Anonymised Data | IP addresses are encrypted using an algorithm and institutional identifiers are anonymised | No |
2 | Anonymised Redirect Data | A subset of the Level 1 data, containing only entries that redirect to a resolver | Yes |
Specific data captured
The data captured vary from request to request, since different users enter different information into requests. In some cases very little data is captured.
Log-specific request data (based on OpenURL Router log entries):
- logDate (Date the record was logged)
- logTime (Time the record was logged in format HH:MM:SS)
- encryptedUserIP (Anonymised IP address/session identifier)
- institutionResolverID (Anonymised institutional identifier)
- routerRedirectIdentifier (Redirect identifier passed as part of the URL)
Request-specific data (based on the OpenURL standard):
- aulast (Last author)
- aufirst (First author)
- auinit (First author's first and middle initials)
- auinit1 (First author's first initial)
- auinitm (First author's middle initial)
- au (Full name of a single author)
- aucorp (Organization or corporation that is the author or creator of the document)
- atitle (Article title)
- title (Journal title, for compatibility with version 0.1)
- jtitle (Journal title)
- stitle (Short journal title)
- date (Date of publication)
- ssn (Season (chronology). Legitimate values are spring, summer, fall, winter)
- quarter (Quarter (chronology). Legitimate values are 1, 2, 3, 4.)
- volume (Volume designation, usually expressed as a number but could be roman numerals or non-numeric)
- part (Part can be a special subdivision of a volume or it can be the highest level division of the journal. Parts are often designated with letters or names)
- issue (Designation of the published issue of a journal. While usually numeric, it could be non-numeric)
- spage (First page number. Pages are not always numeric)
- epage (Second (ending) page number)
- pages (Start and end pages, e.g 53-58)
- artnum (Article number assigned by the publisher)
- issn (International Standard Serials Number)
- eissn (ISSN for electronic version of the journal)
- isbn (International Standard Book Number)
- coden (Alphanumeric bibliographic code)
- sici (Serial Item Contribution Identifier)
- genre
- btitle (The title of the book. This can also be expressed as title, for compatibility with version 0.1)
- place (International Standard Book Number)
- pub (Publisher name)
- edition (Statement of the edition of the book)
- tpages (Total pages)
- series (The title of a series in which the book or document was issued)
- doi (Digital Object Identifier)
- sid (Service ID, the item(journal, article etc) provider)
As the OpenURL Router Data files are large, sample files are available containing a subset of the data for initial analysis.