Installation¶

Options for installing

Running services locally

Running via docker-compose

Options for installing¶

There are two ways to get HEPData running locally: either install and run all the services on your local machine, or run it via Docker Compose.

Using docker-compose is the quickest way to get up-and-running. However, it has some disadvantages:

It requires more resources on your local machine as it runs several Docker containers.
It can be slightly trickier to run commands and debug.
The tests take longer to run, particularly the end-to-end tests.

Running services locally¶

Prerequisites¶

HEPData runs with Python 3.14. It also uses several services, which you will need to install before running HEPData.

These services can be installed using the relevant package manager for your system, for example, using yum or apt-get for Linux or brew for macOS:

PostgreSQL (version 16) database server

Redis for caching

OpenSearch (version 3.4.0) for indexing and information retrieval. See below for further instructions.

Node.js (version 18) JavaScript run-time environment and its package manager npm.

OpenSearch v3.4.0¶

We are currently using OpenSearch v3.4.0. Here, you can find the download instructions.

There are some examples below:

MacOS

Install the latest version (currently, v3.6.0) with brew install opensearch. Alternatively, to install a specific version like v3.4.0 via Homebrew (if the latest version is newer), run:

$ brew tap homebrew/core --force
$ brew tap-new opensearch/tap
$ brew extract --version=3.4.0 opensearch opensearch/tap
$ brew install opensearch/tap/opensearch@3.4.0
$ brew services restart opensearch/tap/opensearch@3.4.0

Linux

You can see the tarball instructions on the OpenSearch installation webpage.

To execute, run these commands within the extracted folder.

$ export OPENSEARCH_INITIAL_ADMIN_PASSWORD=<custom-admin-password>
$ ./opensearch-tar-install.sh -E "plugins.security.disabled=true"

The custom admin password is required for OpenSearch 2.12 or greater. The requirements for <custom-admin-password> are “a minimum 8 character password and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character that is strong”.

Docker

Alternatively, run OpenSearch after installing Docker with:

$ docker pull opensearchproject/opensearch:3.4.0
$ docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=<custom-admin-password>" opensearchproject/opensearch:3.4.0

You can test that the container is running with:

$ curl http://localhost:9200 -ku admin:<custom-admin-password>

Installation¶

Python¶

The HEPData code is only compatible with Python 3.14 (not Python 2 or other 3.x versions).

There are instructions for installing with pip or uv below to set up a Python 3.14 virtual environment and install the required Python packages.

Using pip

First install all requirements in a Python virtual environment. The instructions below use the Python module venv directly with a target directory also called venv (change it if you prefer).

$ git clone https://github.com/HEPData/hepdata.git
$ cd hepdata
$ python3.14 -m venv venv
$ source venv/bin/activate
(venv)$ pip install --upgrade pip
(venv)$ pip install -e ".[all]" --upgrade -r requirements.txt

Using uv

If you prefer to use uv, first install it by following their installation instructions (if required). Then follow the instructions below to clone the repository, create a virtual environment with Python 3.14, and then install the required packages.

You may change the target directory for the virtual environment (currently venv) if you prefer. For example: uv venv virtualenv.

$ git clone https://github.com/HEPData/hepdata.git
$ cd hepdata
$ uv venv venv --python 3.14
$ source venv/bin/activate
$ uv pip install -e ".[all]" --upgrade -r requirements.txt

Check PyYAML¶

Check that PyYAML has been installed with LibYAML bindings:

(venv)$ python -c "from yaml import CSafeLoader"

If LibYAML is already installed (e.g. brew install libyaml) but CSafeLoader cannot be imported, you may need to reinstall PyYAML to ensure it’s built with LibYAML bindings, e.g. on an M1 MacBook:

(venv)$ LDFLAGS="-L$(brew --prefix)/lib" CFLAGS="-I$(brew --prefix)/include" pip install --global-option="--with-libyaml" --force pyyaml==5.4.1

Environment variables¶

The next lines set environment variables to switch Flask to run in development mode, and turns on RemovedIn20Warning deprecation warnings for SQLAlchemy 1.4. You may want to set these automatically in your bash or zsh profile.

(venv)$ export FLASK_ENV=development
(venv)$ export FLASK_DEBUG=1
(venv)$ export SQLALCHEMY_WARN_20=1

Use of config_local.py¶

The hepdata/config.py contains default configuration options, which often need to be overridden in a local instance. For example, DOI minting should be switched off in a non-production instance, otherwise finalising a new record will give an error message due to a lack of DataCite authorisation credentials. Rather than edit hepdata/config.py, it is more convenient to define custom options in a separate file hepdata/config_local.py that will be ignored by Git. For example, to switch off email, DOI minting, Twitter, use a local converter URL, and specify custom temporary and data directories:

SERVER_NAME = "localhost:5000"
SITE_URL = "http://" + SERVER_NAME
TESTING = True
NO_DOI_MINTING = True
USE_TWITTER = False
CFG_CONVERTER_URL = 'http://localhost:5500'
CFG_TMPDIR = '/Users/watt/tmp/hepdata/tmp'
CFG_DATADIR = '/Users/watt/tmp/hepdata/data'

An example file hepdata/config_local.local.py is provided, which can be copied to hepdata/config_local.py. Replace the CFG_TMPDIR and CFG_DATADIR directory values with a suitable path for your system.

With TESTING=True emails will be output to the terminal, but links are suppressed preventing some functionality such as clicking on confirmation links when a new user is created (see HEPData/hepdata#493). With TESTING=False you will need to configure an SMTP server to send emails such as SMTP2GO that offers a free plan with a limit of 1000 emails/month. An alternative is to install Mailpit (e.g. brew install mailpit followed by brew services start mailpit on macOS) where you just need to add these lines to hepdata/config_local.py:

MAIL_SERVER = '127.0.0.1'
MAIL_PORT = 1025

The Mailpit web UI can then be accessed from http://localhost:8025 .

JavaScript¶

Next, build assets using webpack (via invenio-assets).

(hepdata)$ ./scripts/clean_assets.sh

Celery¶

Run Celery and ensure the redis-server service is running (-B runs celery beat):

(hepdata)$ celery -A hepdata.celery worker -l info -E -B -Q celery,priority,datacite

On macOS, you might also need an option -P solo to use a solo pool and avoid errors related to forking. An alternative is to add export no_proxy=* to your .zshrc file or os.environ["no_proxy"] = "*" to your hepdata/config_local.py file.

PostgreSQL¶

See YUM Installation and First steps. On Linux you might need sudo su - postgres before executing the steps below. On macOS you can install with brew install postgresql@16.

$ createuser hepdata --createdb --pwprompt
Enter password for new role: hepdata
Enter it again: hepdata
$ createdb hepdata -O hepdata
$ createdb hepdata_test -O hepdata

Next, create the database and database tables. Also create a user and populate the database with some records. Make sure that Celery is running before proceeding further. Pass an email address and any password as an argument to the script:

(hepdata)$ ./scripts/initialise_db.sh your@email.com password

Inspect the hepdata database from the command line as the hepdata user and add email confirmation:

$ psql hepdata -U hepdata -h localhost
Password for user hepdata: hepdata

hepdata=> select publication_recid, inspire_id, last_updated from hepsubmission order by publication_recid;
 publication_recid | inspire_id |    last_updated
-------------------+------------+---------------------
                 1 | 1245023    | 2013-12-17 10:35:06
                 2 | 1283842    | 2014-08-11 17:25:55
                 3 | 1311487    | 2016-02-12 18:45:16
                58 | 1299143    | 2014-08-05 17:55:54
(4 rows)

Set email confirmation for the test user within the database.

hepdata=> update accounts_user set confirmed_at=NOW() where id=1;
UPDATE 1

If you’re having problems with access permissions to the database (on Linux), a simple solution is to edit the PostgreSQL Client Authentication Configuration File (e.g. /var/lib/pgsql/16/data/pg_hba.conf) to trust local and IPv4/IPv6 connections (instead of peer or ident), then restart the PostgreSQL server (e.g. sudo systemctl restart postgresql-16).

Recreate the OpenSearch index¶

You may need to recreate the OpenSearch data, for example, after switching to a new OpenSearch instance.

(hepdata) $ hepdata utils reindex -rc True

Run a local development server¶

Now start the HEPData web application in debug mode:

(hepdata)$ hepdata run --debugger --reload

Then open your preferred web browser (Chrome, Firefox, Safari, etc.) at http://localhost:5000/ .

On macOS Monterey (and later) you might find that ControlCenter is already listening to port 5000 (check with lsof -i -P | grep 5000). If this is the case, turn off AirPlay Receiver.

Running the tests¶

Some of the tests run using Selenium on Sauce Labs. Note that some of the end-to-end tests currently fail when run individually rather than all together. The end-to-end tests may not work on macOS (issue #929). If you have a local development server running, shut it down before running the tests.

To run the tests locally you have several options:

Run a Sauce Connect tunnel (recommended). This is used by GitHub Actions CI.
1. Create a Sauce Labs account, or ask for the HEPData account details.
2. Log into Sauce Labs, and go to the “Tunnel Proxies” page.
3. Follow the instructions there to install Sauce Connect.
4. Create the variables SAUCE_USERNAME and SAUCE_ACCESS_KEY in your local environment (and add them to your bash or zsh profile). Also set SAUCE_REGION=eu-central, SAUCE_TUNNEL_NAME=${SAUCE_USERNAME}_tunnel_name and SAUCE_PROXY_LOCALHOST=direct.
5. Start a tunnel with the command sc run and wait for the message “Sauce Connect is up, you may start your tests”.
Run Selenium locally using ChromeDriver. (Some tests are currently failing with this method.)
1. Install ChromeDriver (matched to your version of Chrome).
2. Include RUN_SELENIUM_LOCALLY = True in your hepdata/config_local.py file.
3. You might need to close Chrome before running the end-to-end tests.
Omit the end-to-end tests when running locally, by running pytest tests -k 'not e2e' instead of run-tests.sh.

Once you have set up Selenium or Sauce Labs, you can run the tests using:

(venv)$ ./run-tests.sh

Note that the end-to-end tests require the converter (specified by CFG_CONVERTER_URL) to be running.

NOTE: To test changes to ci.yml locally, you can use act. A .secrets file should be created in the project root directory with the variables SAUCE_USERNAME and SAUCE_ACCESS_KEY set in order to run the end-to-end tests. Only one matrix configuration will be used to avoid problem with conflicting ports. Running act -n is useful for dryrun mode.

Building the docs¶

If any changes were to be made to the installation docs, to check docs can be locally built use:

(venv)$ cd docs
(venv)$ make html
(venv)$ open _build/html/index.html

Docker for hepdata-converter-ws¶

To get the file conversion working from the web application (such as automatic conversion from .oldhepdata format), you can use the default CFG_CONVERTER_URL = https://converter.hepdata.net even outside the CERN network. Alternatively, after installing Docker, you can run a local Docker container:

docker pull hepdata/hepdata-converter-ws
docker run --restart=always -d --name=hepdata_converter -p 0.0.0.0:5500:5000 hepdata/hepdata-converter-ws hepdata-converter-ws

then specify CFG_CONVERTER_URL = 'http://localhost:5500' in hepdata/config_local.py (see above).

Running via docker-compose¶

The Dockerfile is used by GitHub Actions CI to build a Docker image and push to DockerHub ready for deployment in production on the Kubernetes cluster at CERN.

For local development you can use the docker-compose.yml file to run the HEPData Docker image and its required services.

First, ensure you have installed Docker and Docker Compose.

Copy the file config_local.docker_compose.py to config_local.py.

In order to run the tests via Sauce Labs, ensure you have the variables $SAUCE_USERNAME and $SAUCE_ACCESS_KEY set in your environment (see Running the tests) before starting the containers.

Add export OPENSEARCH_INITIAL_ADMIN_PASSWORD=<custom-admin-password> to your bash or zsh profile for OpenSearch.

If using an M1 MacBook, also add export SAUCE_OS=linux.aarch64 to your bash or zsh profile. This is necessary to download the correct Sauce Connect Proxy client.

Start the containers:

$ docker-compose up

(This starts containers for all the 6 necessary services. See Tips if you only want to run some containers.)

In another terminal, initialise the database:

$ docker-compose exec web bash -c "hepdata utils reindex -rc True"  # ignore error "hepsubmission" does not exist
$ docker-compose exec web bash -c "mkdir -p /code/tmp; ./scripts/initialise_db.sh your@email.com password"
$ docker-compose exec db bash -c "psql hepdata -U hepdata -c 'update accounts_user set confirmed_at=NOW() where id=1;'"

Now open http://localhost:5000/ and HEPData should be up and running. (It may take a few minutes for Celery to process the sample records.)

To run the tests:

$ docker-compose exec web bash -c "/usr/local/var/sauce-connect-5.5.1_${SAUCE_OS:-linux.x86_64}/sc -u $SAUCE_USERNAME -k $SAUCE_ACCESS_KEY --region eu-central -i ${SAUCE_USERNAME}_tunnel_name --proxy-localhost direct & ./run-tests.sh"

Tips¶

If you see errors about ports already being allocated, ensure you’re not running any of the services another way (e.g. hepdata-converter via Docker).
If you want to run just some of the containers, specify their names in the docker-compose command. For example, to just run the web server, database and OpenSearch, run:
```
$ docker-compose up web db os
```
See docker-compose.yml for the names of each service. Running a subset of containers could be useful in the following cases:
- You want to use the live converter service, i.e. CFG_CONVERTER_URL = 'https://converter.hepdata.net' instead of running the converter locally.
- You want to run the container for the web service by pulling an image from Docker Hub instead of building an image locally.
- You want to run containers for all services apart from web (and maybe converter) then use a non-Docker web service.
If using Docker Desktop, you need to use host.docker.internal instead of localhost when connecting from a container to a service on the host.
To run the containers in the background, run:
```
$ docker-compose up -d
```
To see the logs you can then run:
```
$ docker-compose logs
```
To run a command on a container, run the following (replacing <container_name> with the name of the container as in docker-compose.yml, e.g. web):
```
$ docker-compose exec <container_name> bash -c "<command>"
```
If you need to run several commands, run the following to get a bash shell on the container:
```
$ docker-compose exec <container_name> bash
```
If you switch between using docker-compose and individual services, you may get an error when running the tests about an import file mismatch. To resolve this, run:
```
$ find . -name '*.pyc' -delete
```

Installation¶

Options for installing¶

Running services locally¶

Prerequisites¶

OpenSearch v3.4.0¶

Installation¶

Python¶

Check PyYAML¶

Environment variables¶

Use of config_local.py¶

JavaScript¶

Celery¶

PostgreSQL¶

Recreate the OpenSearch index¶

Run a local development server¶

Running the tests¶

Building the docs¶

Docker for hepdata-converter-ws¶

Running via docker-compose¶

Tips¶

HEPData

Navigation

Related Topics