Installation¶
pip¶
Install using pip:
pip install cirrocumulus
Launch via the command line:
cirro launch <path_to_dataset>
You can see the full list of command line options by typing cirro launch –help.
Server Mode¶
Cirrocumulus can also be run in server mode in order to serve multiple users and datasets securely. The cirrocumulus server can be deployed on a cloud VM, an on-premise machine, or on Google App Engine.
Install cirrocumulus using pip or docker
Optional additional setup to enable authentication and authorization via Okta or Google:
Install additional libraries:
Google: pip install google-auth
Okta: pip install okta-jwt-verifier
Set environment variables:
CIRRO_AUTH_CLIENT_ID: to your Okta or Google client id
CIRRO_AUTH_PROVIDER: to either okta or Google.
CIRRO_AUTH_ISSUER (okta only). The URL of the authorization server that will perform authentication. All Developer Accounts have a “default” authorization server. The issuer is a combination of your Org URL (found in the upper right of the console home page) and /oauth2/default. For example, https://dev-1234.oktapreview.com/oauth2/default.
See Okta documentation for creating custom app integrations with Okta.
Visit the Google OAuth 2.0 documentation to obtain OAuth 2.0 credentials for Google.
Please note that https is required if using Okta or Google authentication
Additional libraries needed for cloud storage:
Amazon S3: pip install s3fs
Google Cloud Storage: pip install gcsfs
Microsoft Azure: pip install adlfs
Install MongoDB and start the MongoDB server. Note that MongoDB compatible databases such as DocumentDB can also be used.
Start the server via the command line:
cirro serve
You can see the full list of command line options by typing cirro serve –help.
Use the prepare_data command to freeze an h5ad, loom, or Seurat file in cirrocumulus format. The cirrocumulus format allows efficient partial dataset retrieval over a network (e.g Google bucket) using limited memory.
Add a dataset and optionally share with dataset with collaborators. If you enabled authentication, then no users are allowed to add datasets to cirrocumulus. Set the property “importer” to true on an entry in the users collection to enable that user to import datasets. For example, the following screenshot in MongoDB Compass shows that the user with the email address me@gmail.com, is allowed to add datasets to cirrocumulus:

You can programmatically add a dataset by posting to the /api/dataset endpoint:
curl http://localhost:5000/api/dataset -X POST -F 'name=my_name' -F 'url=data/my_dataset_path' -F 'description=my_desc' -F 'species=Mus musculus'
Additional customization via environment variables:
CIRRO_MOUNT: For mounting a bucket locally. Comma separated string of bucket:local_path. Example s3://foo/bar:/fsx
CIRRO_SPECIES: Path to JSON file for species list when adding new dataset
CIRRO_MIXPANEL: Mixpanel project token for event tracking. Currently, only the open dataset event is supported.
Google App Engine¶
Install the Google Cloud SDK if necessary. Type
gcloud init
in your terminal if this is your first time using the Google Cloud SDK.Clone the cirrocumulus app engine repository:
git clone https://github.com/klarman-cell-observatory/cirrocumulus-app-engine.git
Change your current working directory to cirrocumulus-app-engine:
cd cirrocumulus-app-engine
Create or use an existing GCP project in your Google Console.
Please remember to replace <PROJECT> with your GCP project id in the following instructions.
Create an App Engine application by navigating to App Engine > Dashboard. Select the Python Standard Environment and choose the region where your application is hosted. You can also create an application from the command line:
gcloud app create --project=<PROJECT>
Enable the Cloud Build API.
Optionally set up OAuth 2.0.
Follow the instructions at Google OAuth 2.0 documentation. Add your server URL to the list of “Authorized domains”. Your server URL is https://<PROJECT>.appspot.com.
Replace CIRRO_AUTH_CLIENT_ID in app.yaml with your OAuth client id.
Optionally edit app.yaml to further customize your application settings.
Deploy the application using the command below. Your project is available at https://<PROJECT>.appspot.com.:
gcloud app deploy app.yaml --project=<PROJECT>
Use the prepare_data command to freeze an h5ad, loom, or Seurat file in cirrocumulus format. The cirrocumulus format allows efficient partial dataset retrieval over a network (e.g Google bucket) using limited memory.
- If you enabled OAuth 2.0, no one is allowed to add datasets to your application
Go to https://<PROJECT>.appspot.com in your web browser and login
In Google Console, navigate to Datastore > Entities and click on your email address. Add the property
importer
of typeboolean
and set it totrue
.Go back to https://<PROJECT>.appspot.com and start adding datasets.
Read more about App Engine in the App Engine documentation.
Static Website¶
Clone the cirrocumulus repository:
git clone https://github.com/klarman-cell-observatory/cirrocumulus.git
Change to cirrocumulus directory:
cd cirrocumulus
Install typescript:
yarn global add typescript
Install JavaScript dependencies:
yarn install
Prepare dataset(s) in jsonl format:
cirro prepare_data pbmc3k.h5ad --format jsonl
Build JavaScript:
REACT_APP_STATIC=true yarn build
Create the file datasets.json in the build directory:
[ { "id": "pbmc3k", "name": "pbmc3k", "url": "pbmc3k/pbmc3k.jsonl" } ]
Move your dataset files to build:
mv pbmc3k build
Test locally:
cd build ; npx http-server .
Host the build directory on your static website hosting service (e.g. Amazon S3, Google Cloud Storage)
Terra Cloud Environment¶
Click
Open Terminal
to connect to your running VMInstall cirrocumulus via pip if it was not installed in your docker image
Download your dataset to your running VM using gsutil as in the example below. Alternatively, you can use gcsfuse to mount your Google cloud bucket.
gsutil -m cp gs://fc-000/test.h5ad .
Launch cirrocumulus via the command line in the background:
cirro launch test.h5ad &
Install ngrok:
wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip \ && unzip ngrok-stable-linux-amd64.zip \ && rm -f ngrok-stable-linux-amd64.zip
Use ngrok to expose cirrocumulus publicly:
./ngrok http 5000
After you start ngrok, it will display a UI in your terminal with the public URL of your tunnel:

Navigate to your public URL in your browser (https://383bc396cc0b.ngrok.io in previous example)
Developer Instructions¶
Create a new conda environment:
conda create --name cirrocumulus-dev
Clone the cirrocumulus repository:
git clone https://github.com/klarman-cell-observatory/cirrocumulus.git
Change to cirrocumulus directory:
cd cirrocumulus
Install:
pip install --upgrade pip pip install -e .[dev,test] pre-commit install yarn global add typescript yarn install
Install additional optional Python dependencies:
pip install s3fs
Create an example h5ad file in ./data/pbmc3k_processed.h5ad:
import scanpy as sc sc.datasets.pbmc3k_processed()
Launch cirrocumulus with the –no-open flag:
cirro launch ./data/pbmc3k_processed.h5ad --no-open
Alternatively, launch the cirrocumulus server (use cirro prepare_data to convert the h5ad file to cirrocumulus format for server mode):
cirro serve
Run JavaScript server in development mode:
yarn start
Navigate to http://localhost:3000
In order to run End to End tests (yarn e2e), please install GraphicsMagick (brew install graphicsmagick on Mac)
Testing:
yarn e2e yarn test pytest
Build JavaScript front-end for deployment:
yarn build