Records¶
In addition to the Python modules documented below, note that the directory hepdata/modules/records/static/js
contains the JavaScript code that renders the tables and plots in a web browser using the
D3.js library.
API for HEPData-Records. |
|
Jinja utilities for Invenio. |
|
Blueprint for HEPData-Records. |
|
HEPData Subscribers API. |
|
HEPData Subscribers Model. |
|
Update INSPIRE publication information. |
|
YAML Processing Utils. |
hepdata.modules.records.api¶
API for HEPData-Records.
- hepdata.modules.records.api.format_submission(recid, record, version, version_count, hepdata_submission, data_table=None, observer_view=None)[source]¶
Performs all the processing of the record to be displayed.
- Parameters:
recid
record
version
version_count
hepdata_submission
data_table
observer_view
- Returns:
- hepdata.modules.records.api.format_tables(ctx, data_record_query, data_table, recid)[source]¶
Finds all the tables related to a submission and formats them for display in the UI or as JSON.
- Returns:
- hepdata.modules.records.api.format_resource(resource, contents, content_url)[source]¶
Gets info about a resource ready to be displayed on the resource’s landing page
- Parameters:
resource – DataResource object to be displayed
contents – Resource file contents
- Returns:
context dictionary ready for the template
- hepdata.modules.records.api.should_send_json_ld(request)[source]¶
Determine whether to send JSON-LD instead of HTML for this request
- hepdata.modules.records.api.get_commit_message(ctx, recid)[source]¶
Returns a commit message for the current version if present. Will return the highest ID of a version-recid pairing.
- Parameters:
ctx
recid
- hepdata.modules.records.api.create_breadcrumb_text(authors, ctx, record)[source]¶
Creates the breadcrumb text for a submission.
- hepdata.modules.records.api.submission_has_resources(hepsubmission)[source]¶
Returns whether the submission has resources attached.
- Parameters:
hepsubmission – HEPSubmission object
- Returns:
bool
- hepdata.modules.records.api.render_record(recid, record, version, output_format, light_mode=False, observer_key=None)[source]¶
- hepdata.modules.records.api.create_new_version(recid, user, notify_uploader=True, uploader_message=None)[source]¶
- hepdata.modules.records.api.process_payload(recid, file, redirect_url, synchronous=False)[source]¶
Process an uploaded file
- Parameters:
recid – int The id of the record to update
file – file The file to process
redirect_url – string Redirect URL to record, for use if the upload fails or in synchronous mode
synchronous – bool Whether to process asynchronously via celery (default) or immediately (only recommended for tests)
- Returns:
JSONResponse either containing ‘url’ (for success cases) or ‘message’ (for error cases, which will give a 400 error).
- hepdata.modules.records.api.check_and_convert_from_oldhepdata(input_directory, id, timestamp)[source]¶
Check if the input directory contains a .oldhepdata file and convert it to YAML if it happens.
- hepdata.modules.records.api.assign_or_create_review_status(data_table_metadata, publication_recid, version)[source]¶
If a review already exists, it will be attached to the current data record. If a review does not exist for a data table, it will be created.
- Parameters:
data_table_metadata – the metadata describing the main table.
publication_recid – publication record id
version
- hepdata.modules.records.api.process_data_tables(ctx, data_record_query, first_data_id, data_table=None)[source]¶
- hepdata.modules.records.api.get_all_ids(index=None, id_field='recid', last_updated=None, latest_first=False)[source]¶
Get all record or inspire ids of publications in the search index
- Parameters:
index – name of index to use.
id_field – id type to return. Should be ‘recid’ or ‘inspire_id’
- Returns:
list of integer ids
Queries the database for all HEPSubmission objects contained in this object’s related record ID list. (All submissions this one is relating to)
- Returns:
[list] A list of HEPSubmission objects
Queries the database for all records in the RelatedRecId table that have THIS record’s id as a related record. Then returns the HEPSubmission object marked in the RelatedRecid table. Returns only submissions marked as ‘finished’
- Returns:
[list] List containing related records.
Queries the database for all DataSubmission objects contained in this object’s related DOI list. (All submissions this one is relating to)
- Parameters:
data_submission – The datasubmission object to find related data for.
- Returns:
[list] A list of DataSubmission objects
Get the DataSubmission Objects with a RelatedTable entry where this doi is referred to in related_doi. Only returns where associated HEPSubmission object is finished, OR where it is within the same HEPSubmission
- Parameters:
data_submission – The datasubmission to find the related entries for.
- Returns:
[List] List of DataSubmission objects.
- hepdata.modules.records.api.get_record_data_list(record, data_type)[source]¶
Generates a dictionary (title/recid) from a list of record IDs. This must be done as the record contents are not stored within the hepsubmission object.
- Parameters:
record – The record used for the query.
data_type – Either the related, or related to this data.
- Returns:
[list] A list of dictionary objects containing record ID and title pairs
- hepdata.modules.records.api.get_table_data_list(table, data_type)[source]¶
Generates a list of general information (name, doi, desc) dictionaries of related DataSubmission objects. Will either use the related data list (get_related_data_submissions) OR the related to this list (generated by get_related_to_this_datasubmissions)
- Parameters:
table – The DataSubmission object used for querying.
data_type – The flag to decide which relation data to use.
- Returns:
[list] A list of dictionaries with the name, doi and description of the object.
hepdata.modules.records.ext¶
Jinja utilities for Invenio.
hepdata.modules.records.views¶
Blueprint for HEPData-Records.
- hepdata.modules.records.views.metadata(recid)[source]¶
Queries and returns a data record.
- Parameters:
recid – the record id being queried
- Returns:
renders the record template
- hepdata.modules.records.views.get_latest()[source]¶
Returns the N latest records from the database.
- Parameters:
n
- Returns:
- hepdata.modules.records.views.get_table_data(data_recid, version)[source]¶
Gets the table data only for a specific recid/version.
- Parameters:
data_recid – The data recid used for retrieval
version – The data version to retrieve
- Returns:
- hepdata.modules.records.views.get_table_details(recid, data_recid, version, load_all=1)[source]¶
Get the table details of a given datasubmission.
- Parameters:
recid
data_recid
version
load_all – Whether to perform the filesize check or not when loading (1 will always load the file)
- Returns:
- hepdata.modules.records.views.get_coordinator_view(recid)[source]¶
Returns the coordinator view for a record.
- Parameters:
recid
- hepdata.modules.records.views.get_observer_data(recid, as_url=None)[source]¶
Returns the observer url for a record, if it exists, and the user has permission.
- Parameters:
recid – The publication recid for requested observer key
as_url – Default: None - Whether to return as url (when set to 1), or just key
- Returns:
JSON object with observer url and recid/status, or failure message.
- hepdata.modules.records.views.get_data_reviews_for_record()[source]¶
Get the data reviews for a record.
- Returns:
json response with reviews (or a json with an error key if not)
- hepdata.modules.records.views.get_data_review_status()[source]¶
- Get the data review status and any associated messages for a data record,
given a recid and optional version.
- hepdata.modules.records.views.add_data_review_messsage(publication_recid, data_recid)[source]¶
Adds a new review message for a data submission.
- Parameters:
publication_recid
data_recid
- hepdata.modules.records.views.get_all_review_messages(publication_recid)[source]¶
Gets the review messages for a publication id.
- Parameters:
publication_recid
- Returns:
- hepdata.modules.records.views.get_resources(recid, version)[source]¶
Gets a list of resources for a publication, relevant to all data records.
- Parameters:
recid
- Returns:
json
- hepdata.modules.records.views.process_resource(reference)[source]¶
For a submission resource, create the link to the location, or the image file if an image.
- Parameters:
reference
- Returns:
dict
- hepdata.modules.records.views.get_resource(resource_id)[source]¶
Attempts to find any HTML resources to be displayed for a record in the event that it does not have proper data records included.
- Parameters:
resource_id – Resource id
- Returns:
json dictionary containing any HTML files to show.
- hepdata.modules.records.views.cli_upload()[source]¶
Used by the hepdata-cli tool to upload a submission.
- Returns:
- hepdata.modules.records.views.revise_submission(recid)[source]¶
This method creates a new version of a submission.
- Parameters:
recid – record id to attach the data to
- Returns:
For POST requests, returns JSONResponse either containing ‘url’ (for success cases) or ‘message’ (for error cases, which will give a 400 error). For GET requests, redirects to the record.
- hepdata.modules.records.views.consume_data_payload(recid)[source]¶
This method persists, then presents the loaded data back to the user.
- Parameters:
recid – record id to attach the data to
- Returns:
For POST requests, returns JSONResponse either containing ‘url’ (for success cases) or ‘message’ (for error cases, which will give a 400 error). For GET requests, redirects to the record.
- hepdata.modules.records.views.attach_information_to_record(recid)[source]¶
Given an INSPIRE data representation, this will process the data, and update information for a given record id with the contents.
- Returns:
- hepdata.modules.records.views.consume_sandbox_payload()[source]¶
Creates a new sandbox submission with a new file upload.
- Parameters:
recid
hepdata.modules.records.importer.api¶
- hepdata.modules.records.importer.api.import_records(inspire_ids, synchronous=False, update_existing=False, allow_old_schema=True, base_url='https://hepdata.net', send_email=False, coordinator_id=1, files_url=None)[source]¶
Import records from HEPData or another configured source.
- Parameters:
inspire_ids – array of inspire ids to load (in the format insXXX).
synchronous – if should be run immediately rather than via celery
update_existing – whether to update records that already exist
allow_old_schema – whether to allow validation against old schema
base_url – override default base URL
send_email – whether to send emails on finalising submissions
coordinator_id – user ID to assign as Coordinator (defaults to 1)
files_url – if given, download files from this URL using pattern
{files_url}/ins{inspire_id}.tar.gzinstead of the default HEPData download endpoint
- Returns:
None
- hepdata.modules.records.importer.api.get_inspire_ids(base_url='https://hepdata.net', last_updated=None, n_latest=None, ids_url=None)[source]¶
Get inspire IDs from hepdata.net or from an alternate URL
- Parameters:
last_updated – get IDs of records updated on/after this date (ignored when
ids_urlis provided)n_latest – get the n most recently updated IDs
base_url – override default base URL (ignored when
ids_urlis provided)ids_url – explicit URL of a JSON file containing INSPIRE IDs (e.g.
https://example.com/hepdata/inspire.json). When provided,base_urlandlast_updatedare ignored.
- Returns:
list of integer IDs, or False in the case of errors
hepdata.modules.records.subscribers.api¶
HEPData Subscribers API.
hepdata.modules.records.subscribers.models¶
HEPData Subscribers Model.
- class hepdata.modules.records.subscribers.models.Subscribers(**kwargs)[source]¶
WatchList is the main model for storing the query to be made for a watched query and the user who is watching it.
- publication_recid¶
- subscribers¶
- query: t.ClassVar[Query]¶
A SQLAlchemy query for a model. Equivalent to
db.session.query(Model). Can be customized per-model by overridingquery_class.Warning
The query interface is considered legacy in SQLAlchemy. Prefer using
session.execute(select())instead.
hepdata.modules.records.subscribers.rest¶
hepdata.modules.records.utils.analyses¶
hepdata.modules.records.utils.common¶
- hepdata.modules.records.utils.common.find_file_in_directory(directory, file_predicate)[source]¶
Finds a file in a directory. Useful for say when the submission.yaml file is not at the top level of the unzipped archive but one or more levels below.
- Parameters:
directory
file_predicate – a lambda that checks if it’s the file you’re looking for
- Returns:
- hepdata.modules.records.utils.common.truncate_string(string, max_words=None, max_chars=None)[source]¶
- hepdata.modules.records.utils.common.get_record_contents(recid, status=None)[source]¶
Tries to get record from OpenSearch first. Failing that, it tries from the database.
- Parameters:
recid – Record ID to get.
status – Status of submission. If provided and not ‘finished’, will not check opensearch first.
- Returns:
a dictionary containing the record contents if the recid exists, None otherwise.
- hepdata.modules.records.utils.common.load_table_data(recid, version)[source]¶
Loads a specfic data file’s yaml file data.
- Parameters:
recid – The recid used for the query
version – The data version to select
- Return table_contents:
A dict containing the table data
- hepdata.modules.records.utils.common.file_size_check(file_location, load_all)[source]¶
- Decides if a file breaks the maximum size threshold
for immediate loading on the records page.
- Parameters:
file_location – Location of the data file on disk
load_all – If the check should be run
- Return bool:
Pass or fail
hepdata.modules.records.utils.data_processing_utils¶
- hepdata.modules.records.utils.data_processing_utils.pad_independent_variables(table_contents)[source]¶
Pads out the independent variable column in the event that nothing exists.
- Parameters:
table_contents
- Returns:
- hepdata.modules.records.utils.data_processing_utils.fix_nan_inf(value)[source]¶
Converts NaN, +inf, and -inf values to strings.
- Parameters:
value
- Returns:
- hepdata.modules.records.utils.data_processing_utils.process_independent_variables(table_contents, x_axes, independent_variable_headers)[source]¶
- hepdata.modules.records.utils.data_processing_utils.process_dependent_variables(group_count, record, table_contents, tmp_values, independent_variables, dependent_variable_headers)[source]¶
- hepdata.modules.records.utils.data_processing_utils.generate_table_data(table_contents)[source]¶
Creates a renderable data table structure.
- Parameters:
table_contents
- Returns:
A dictionary containing the table headers/values
hepdata.modules.records.utils.doi_minter¶
- class hepdata.modules.records.utils.doi_minter.LicenseData(name, url, description)[source]¶
Simple class to hold license data for template rendering
- hepdata.modules.records.utils.doi_minter.get_license_for_datacite(license_id)[source]¶
Get license data for DataCite XML generation. Returns a LicenseData object with either the specified license or default CC0.
- Parameters:
license_id – License ID or None
- Returns:
LicenseData object
- hepdata.modules.records.utils.doi_minter.reserve_doi_for_hepsubmission(hepsubmission, update=False)[source]¶
- hepdata.modules.records.utils.doi_minter.reserve_dois_for_data_submissions(*args, **kwargs)[source]¶
Reserves a DOI for a data submission and saves to the datasubmission object.
- Parameters:
data_submission – DataSubmission object representing a data table.
- Returns:
- hepdata.modules.records.utils.doi_minter.reserve_dois_for_resources(publication_recid, version, resources=None)[source]¶
Reserves a DOI for a data submission and saves to the datasubmission object.
- Parameters:
resources – list of DataResource objects
- Returns:
hepdata.modules.records.utils.old_hepdata¶
hepdata.modules.records.utils.records_update_utils¶
Update INSPIRE publication information.
hepdata.modules.records.utils.submission¶
- hepdata.modules.records.utils.submission.remove_submission(record_id, version=1)[source]¶
Removes the database entries and data files related to a record.
- Parameters:
record_id
version
- Returns:
True if Successful, False if the record does not exist.
- hepdata.modules.records.utils.submission.cleanup_submission(recid, version, to_keep)[source]¶
Removes old datasubmission records from the database. This ensures that when users replace a submission, previous records are not left behind in the database.
- Parameters:
recid – publication recid of parent
version – version number of record
to_keep – an array of names to keep in the submission
- Returns:
- hepdata.modules.records.utils.submission.cleanup_data_resources(data_submission)[source]¶
Removes additional resources for a datasubmission from the database to avoid duplications. This ensures that when users replace a submission, old resources are not left behind in the database.
- Parameters:
data_submission – DataSubmission object to be cleaned
- Returns:
- hepdata.modules.records.utils.submission.cleanup_data_keywords(data_submission)[source]¶
Removes keywords from the database to avoid duplications. This ensures that when users replace a submission, old keywords are not left behind in the database.
- Parameters:
data_submission – DataSubmission object to be cleaned
- Returns:
Deletes all related record ID entries of a HEPSubmission object of a given recid
- Parameters:
recid – The record ID of the HEPSubmission object to be cleaned
- Returns:
- hepdata.modules.records.utils.submission.process_data_file(recid, version, basepath, data_obj, datasubmission, main_file_path, tablenum, overall_status)[source]¶
Takes a data file and any supplementary files and persists their metadata to the database whilst recording their upload path.
- Parameters:
recid – the record id
version – version of the resource to be stored
basepath – the path the submission has been loaded to
data_obj – Object representation of loaded YAML file
datasubmission – the DataSubmission object representing this file in the DB
main_file_path – the data file path
tablenum – This table’s number in the submission.
overall_status – Overall status of submission to use for sandbox filtering.
- Returns:
- hepdata.modules.records.utils.submission.process_general_submission_info(basepath, submission_info_document, recid)[source]¶
Processes the top level information about a submission, extracting the information about the data abstract, additional resources for the submission (files, links, and html inserts) and historical modification information.
- Parameters:
basepath – the path the submission has been loaded to
submission_info_document – the data document
recid
- Returns:
- hepdata.modules.records.utils.submission.parse_additional_resources(basepath, recid, yaml_document)[source]¶
Parses out the additional resource section for a full submission.
- Parameters:
basepath – the path the submission has been loaded to
recid
yaml_document
- Returns:
- hepdata.modules.records.utils.submission.parse_modifications(hepsubmission, recid, submission_info_document)[source]¶
- hepdata.modules.records.utils.submission.process_submission_directory(basepath, submission_file_path, recid, update=False, old_schema=False)[source]¶
Goes through an entire submission directory and processes the files within to create DataSubmissions with the files and related material attached as DataResources.
- Parameters:
basepath
submission_file_path
recid
update
old_schema – whether to use old (v0) submission and data schemas (should only be used when importing old records)
- Returns:
- hepdata.modules.records.utils.submission.package_submission(basepath, recid, hep_submission_obj)[source]¶
Zips up a submission directory. This is in advance of its download for example by users.
- Parameters:
basepath – path of directory containing all submission files
recid – the publication record ID
hep_submission_obj – the HEPSubmission object representing the overall position
- hepdata.modules.records.utils.submission.clean_error_message_for_display(error_message, dir)[source]¶
- hepdata.modules.records.utils.submission.get_or_create_hepsubmission(recid, coordinator=1, status='todo')[source]¶
Gets or creates a new HEPSubmission record.
- Parameters:
recid – the publication record id
coordinator – the user id of the user who owns this record
status – e.g. todo, finished.
- Returns:
the newly created HEPSubmission object
- hepdata.modules.records.utils.submission.create_data_review(data_recid, publication_recid, version=1)[source]¶
Creates a new data review given a data record id and a publication record id.
- Parameters:
data_recid
publication_recid
version
- Returns:
- hepdata.modules.records.utils.submission.do_finalise(recid, publication_record=None, force_finalise=False, commit_message=None, send_tweet=False, update=False, convert=True, send_email=True)[source]¶
Creates record SIP for each data record with a link to the associated publication.
- Parameters:
recid (int) – publication_recid of HEPSubmission to finalise
publication_record (HEPSubmission) – HEPSubmission object to finalise
force_finalise (bool) – Whether to force finalisation. If False, will only finalise if logged-in user is the submission coordinator. Should only be set to True for admin tasks/testing.
commit_message (str) – Version information for updated versions of a submission.
send_tweet (bool) – Whether to tweet about the new submission.
update (bool) – Whether to update the existing data records rather than create new ones (only used for admin/test purposes)
convert (bool) – Whether to convert to (and store) other data formats using hepdata_converter
send_email (bool) – Whether to email the submission participants and coordinator to inform them that the submission is complete
- Returns:
JSON string with keys:
success,recid, (on success)data_count,generated_records, (on failure)errors.- Return type:
hepdata.modules.records.utils.users¶
hepdata.modules.records.utils.workflow¶
- hepdata.modules.records.utils.workflow.create_data_structure(ctx)[source]¶
The data structures need to be normalised before being stored in the database. This is performed here.
- Parameters:
ctx – record information as a dictionary
- Returns:
a cleaned up representation.
- hepdata.modules.records.utils.workflow.update_record(recid, ctx)[source]¶
Updates a record given a new dictionary.
- Parameters:
recid
ctx
- Returns:
hepdata.modules.records.utils.yaml_utils¶
YAML Processing Utils.
- hepdata.modules.records.utils.yaml_utils.write_submission_yaml_block(document, submission_yaml, type='info')[source]¶
- hepdata.modules.records.utils.yaml_utils.split_files(file_location, output_location)[source]¶
- Parameters:
file_location – input yaml file location
output_location – output directory path
- hepdata.modules.records.utils.yaml_utils.cleanup_data_yaml(yaml)[source]¶
Casts strings to numbers where possible.
- Parameters:
yaml
- Returns: