Installation

pip

  • Install using pip:

    pip install cirrocumulus
    
  • Launch via the command line:

    cirro launch <path_to_dataset>
    

You can see the full list of command line options by typing cirro launch –help.

Server Mode

Cirrocumulus can also be run in server mode in order to serve multiple users and datasets securely. The cirrocumulus server can be deployed on a cloud VM, an on-premise machine, or on Google App Engine.

  • Install cirrocumulus using pip or docker

  • Optional additional setup to enable authentication and authorization via Okta or Google:

    • Install additional libraries:

      • Google: pip install google-auth

      • Okta: pip install okta-jwt-verifier

    • Set environment variables:

      • CIRRO_AUTH_CLIENT_ID: to your Okta or Google client id

      • CIRRO_AUTH_PROVIDER: to either okta or Google.

      • CIRRO_AUTH_ISSUER (okta only). The URL of the authorization server that will perform authentication. All Developer Accounts have a “default” authorization server. The issuer is a combination of your Org URL (found in the upper right of the console home page) and /oauth2/default. For example, https://dev-1234.oktapreview.com/oauth2/default.

    • See Okta documentation for creating custom app integrations with Okta.

    • Visit the Google OAuth 2.0 documentation to obtain OAuth 2.0 credentials for Google.

    • Please note that https is required if using Okta or Google authentication

  • Additional libraries needed for cloud storage:

    • Amazon S3: pip install s3fs

    • Google Cloud Storage: pip install gcsfs

    • Microsoft Azure: pip install adlfs

  • Install MongoDB and start the MongoDB server. Note that MongoDB compatible databases such as DocumentDB can also be used.

  • Start the server via the command line:

    cirro serve
    

You can see the full list of command line options by typing cirro serve –help.

  • Use the prepare_data command to freeze an h5ad, loom, or Seurat file in cirrocumulus format. The cirrocumulus format allows efficient partial dataset retrieval over a network (e.g Google bucket) using limited memory.

  • Add a dataset and optionally share with dataset with collaborators. If you enabled authentication, then no users are allowed to add datasets to cirrocumulus. Set the property “importer” to true on an entry in the users collection to enable that user to import datasets. For example, the following screenshot in MongoDB Compass shows that the user with the email address me@gmail.com, is allowed to add datasets to cirrocumulus:

_images/mongodb.png
  • You can programmatically add a dataset by posting to the /api/dataset endpoint:

    curl http://localhost:5000/api/dataset -X POST -F 'name=my_name' -F 'url=data/my_dataset_path' -F 'description=my_desc'  -F 'species=Mus musculus'
    
  • Additional customization via environment variables:

    • CIRRO_MOUNT: For mounting a bucket locally. Comma separated string of bucket:local_path. Example s3://foo/bar:/fsx

    • CIRRO_SPECIES: Path to JSON file for species list when adding new dataset

    • CIRRO_MIXPANEL: Mixpanel project token for event tracking. Currently, only the open dataset event is supported.

Google App Engine

  • Install the Google Cloud SDK if necessary. Type gcloud init in your terminal if this is your first time using the Google Cloud SDK.

  • Clone the cirrocumulus app engine repository:

    git clone https://github.com/klarman-cell-observatory/cirrocumulus-app-engine.git
    
  • Change your current working directory to cirrocumulus-app-engine:

    cd cirrocumulus-app-engine
    
  • Create or use an existing GCP project in your Google Console.

  • Please remember to replace <PROJECT> with your GCP project id in the following instructions.

  • Create an App Engine application by navigating to App Engine > Dashboard. Select the Python Standard Environment and choose the region where your application is hosted. You can also create an application from the command line:

    gcloud app create --project=<PROJECT>
    
  • Enable the Cloud Build API.

  • Optionally set up OAuth 2.0.

    • Follow the instructions at Google OAuth 2.0 documentation. Add your server URL to the list of “Authorized domains”. Your server URL is https://<PROJECT>.appspot.com.

    • Replace CIRRO_AUTH_CLIENT_ID in app.yaml with your OAuth client id.

  • Optionally edit app.yaml to further customize your application settings.

  • Deploy the application using the command below. Your project is available at https://<PROJECT>.appspot.com.:

    gcloud app deploy app.yaml --project=<PROJECT>
    
  • Use the prepare_data command to freeze an h5ad, loom, or Seurat file in cirrocumulus format. The cirrocumulus format allows efficient partial dataset retrieval over a network (e.g Google bucket) using limited memory.

  • If you enabled OAuth 2.0, no one is allowed to add datasets to your application
    • Go to https://<PROJECT>.appspot.com in your web browser and login

    • In Google Console, navigate to Datastore > Entities and click on your email address. Add the property importer of type boolean and set it to true.

    • Go back to https://<PROJECT>.appspot.com and start adding datasets.

  • Read more about App Engine in the App Engine documentation.

Static Website

  • Clone the cirrocumulus repository:

    git clone https://github.com/klarman-cell-observatory/cirrocumulus.git
    
  • Change to cirrocumulus directory:

    cd cirrocumulus
    
  • Install typescript:

    yarn global add typescript
    
  • Install JavaScript dependencies:

    yarn install
    
  • Prepare dataset(s) in jsonl format:

    cirro prepare_data pbmc3k.h5ad --format jsonl
    
  • Build JavaScript:

    REACT_APP_STATIC=true yarn build
    
  • Create the file datasets.json in the build directory:

    [
        {
            "id": "pbmc3k",
            "name": "pbmc3k",
            "url": "pbmc3k/pbmc3k.jsonl"
        }
    ]
    
  • Move your dataset files to build:

    mv pbmc3k build
    
  • Test locally:

    cd build ; npx http-server .
    
  • Host the build directory on your static website hosting service (e.g. Amazon S3, Google Cloud Storage)

Terra Cloud Environment

  • Click Open Terminal to connect to your running VM

  • Install cirrocumulus via pip if it was not installed in your docker image

  • Download your dataset to your running VM using gsutil as in the example below. Alternatively, you can use gcsfuse to mount your Google cloud bucket.

    gsutil -m cp gs://fc-000/test.h5ad .

  • Launch cirrocumulus via the command line in the background:

    cirro launch test.h5ad &
    
  • Install ngrok:

    wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip \
    && unzip ngrok-stable-linux-amd64.zip \
    && rm -f ngrok-stable-linux-amd64.zip
    
  • Use ngrok to expose cirrocumulus publicly:

    ./ngrok http 5000
    

After you start ngrok, it will display a UI in your terminal with the public URL of your tunnel:

_images/ngrok.png
  • Navigate to your public URL in your browser (https://383bc396cc0b.ngrok.io in previous example)

Developer Instructions

  • Create a new conda environment:

    conda create --name cirrocumulus-dev
    
  • Clone the cirrocumulus repository:

    git clone https://github.com/klarman-cell-observatory/cirrocumulus.git
    
  • Change to cirrocumulus directory:

    cd cirrocumulus
    
  • Install:

    pip install --upgrade pip
    pip install -e .[dev,test]
    pre-commit install
    yarn global add typescript
    yarn install
    
  • Install additional optional Python dependencies:

    pip install s3fs
    
  • Create an example h5ad file in ./data/pbmc3k_processed.h5ad:

    import scanpy as sc
    sc.datasets.pbmc3k_processed()
    
  • Launch cirrocumulus with the –no-open flag:

    cirro launch ./data/pbmc3k_processed.h5ad --no-open
    
  • Alternatively, launch the cirrocumulus server (use cirro prepare_data to convert the h5ad file to cirrocumulus format for server mode):

    cirro serve
    
  • Run JavaScript server in development mode:

    yarn start
    
  • Navigate to http://localhost:3000

  • In order to run End to End tests (yarn e2e), please install GraphicsMagick (brew install graphicsmagick on Mac)

  • Testing:

    yarn e2e
    yarn test
    pytest
    
  • Build JavaScript front-end for deployment:

    yarn build