This page contains all necessary information for you to install and use a KNIME workflow that first trains and later uses a cascaded random forest to classify stained membrane and membrane crossings in given 2D input images.
This is actually a central step in the workflow described here.
Table of Contents
1. System Requirements
- MacOS X or Linux
- KNIME (installation instructions below)
- More then 4 GB RAM recommended
2. Installing KNIME + Extensions
2.1 Installing KNIME
- Download KNIME from here. Choosing the minimal version without all free extensions is the way to go for us. If this is your first time working with KNIME, you can use their QuickStart guide to familiarize yourself with the interface.
- The Mac version comes with an .dmg installer that will offer you to put KNIME into your /Application folder. You may choose any other place if you prefer and you can have multiple KNIME installations as well.
- The first time you start KNIME it will ask you for a knime-workspace location. You can pick any existing KNIME workspace or direct KNIME to a preferably empty folder of your choice. This is the place you will store all your KNIME workflows like the one we will install in the next section. Remember the location of this folder, you will need it below in Section 3!
- In order to ensure enough resources for KNIME, please go to the KNIME installation folder and edit the file ‘KNIME.ini‘.
- Set the Xmx property to a higher value. We suggest to set -Xmx6g if you have 8GB RAM on your machine, otherwise -Xmx4g should suffice.
- Set the -XX:MaxPermSize=256m line to read like this: -XX:MaxPermSize=512m
- See also: https://tech.knime.org/faq#q4_2
- Congratulations, you installed KNIME… now we need some extensions…
2.2 Installing Required Extensions
- In KNIME launch ‘Help – Install New Software…’ via the menu.
- Here you can pick from various predefined update sites and install various existing extensions. For installing the workflow you have to add an update site that is not preconfigured.
- Click on the link ‘Available Software Sites’ at the top right of the installer window. Yet another dialog will open.
- Click on ‘Add…’ and enter ‘MPI-CBG’ as the new update site’s name, and ‘https://community.knime.org/download/de.mpicbg.knime.ip.update’ as it’s location.
- Confirm by hitting ‘Ok’.
- IMPORTANT: in case your KNIME lacks the default update sites, please ensure that your list of updates sites contains at least the following ones:
- Once you have selected the right update sites, confirm by hitting ‘Ok’.
- Now select ‘All Available Update Sites’ from the dropdown menu and install at least the following items (which you will find distributed all over the place):
- NOTE: ‘External Tool Support (Labs)’ is NOT the same as ‘External Tool Support’.
- NOTE: ‘KNIME Nodes to create KNIME Quick Forms’ is no longer available as of KNIME 3.0. Instead, please install:
- ‘KNIME Quick Forms’
- ‘KNIME Quick Forms (legacy)’
- TIP: uncheck ‘Group items by category’ to be able to search through an alphabetically ordered list of all tools. This makes it easy to find the required tools.
- Start the installation and accept all licenses as well as the eventually popping up question.
- Restart KNIME as suggested after all packages got installed.
3. Downloading and Importing the Workflow
There are two workflows available on this page:
1. A workflow to perform pixelwise classification of membrane-stained tissue slices, with the option of running the workflow in headless mode on Mac (tested on Mac OS X 10.11) or Linux (tested on CentOS Linux release 7.2.1511).
2. A workflow capable of training a new classifier given a set of membrane and vertex labeled training data, as well as making predictions with that classifier (Headed mode only. Mac OS X 10.11 only.)
Both workflows come with a pretrained classifier, and can provide the images necessary for use with the rest of the pipeline described here.
- Please download the workflow below appropriate for your version of KNIME: [see also the release notes.]
- KNIME version: 2.1 [deprecated]
- KNIME version: 3.1
- KNIME version: 3.1 — Prediction Only (Linux and Batch mode enabled.)
Start KNIME and import the workflow via ‘File – Import KNIME Workflow…‘.
Warning: this does not work with this workflow (usually, of course, it does)!- Instead you must unzip the downloaded file yourself and copy the thereby created folder ‘MS-ECS-2D_2.0‘ into your knime-workspace (this is the folder you selected when first starting KNIME).
- You should now have a workflow called ‘MS-ECS-2D_2.0’ listed in your KNIME Explorer on the left under the tab ‘LOCAL (Local Workspace)’.
- Once you open this workflow you will see something very close to this:
4. Running the Workflow
Example data can be downloaded from here.
This data is sufficient for running all parts of the workflow (training as well as creating probability maps). Simply follow the instructions below.
Step 1.1 – loading the training data
- Note: we ship this workflow such, that even without ever performing steps 1.1 and 1.2, the prediction steps 2.x can be performed. A set of pre-trained random forests will be used. In case you perform a training run they will be replaced.
- In order to train the cascaded random forest you need to feed two sequences of images.
- Gray scale images, as they come from the scope (see folder ‘exemplary-data/train/grayscale/’ in the example data downloadable from above).
- Composite label images containing pixels with values 0, 1, and 2 that label background, membrane, and membrane crossing points (vertex points) respectively (see folder ‘exemplary-data/train/labels/composite/’ in the example data downloadable from above).
- Configure the ‘Image Reader’ nodes by double clicking on them and pointing them to the corresponding files. (This requires you to first also remove the files that are currently configured in the config dialog of the ‘Image Reader’ node (click on button to remove all and then add the files you want to use.)
Step 1.2 – train cascaded random forest
- Click on the meta-node ‘Train cascaded RF’ and execute it, e.g. by hitting first F8, then F7. (Hitting F8 is unfortunately necessary and can currently not be performed automatically.)
- By default it is configured to randomly look at 30% of the training samples. This does usually take several hours, but has only to be done once.
- You can change this fraction by configuring this meta-node. Right click on ‘Train cascaded RF’ and choose ‘Configure’. Then enter a different fraction, e.g. ‘0.01’ which is good enough for test purposes and will take 20 or maybe 40 minutes depending on your computer. Once the training is done, a green check mark will appear in the center of the node.
- By default the input data is down-sampled to 66% of the original size to avoid that you run out of memory. If you want to change this because you have less (or more) memory, double click on the ‘Train cascaded RF’ meta-node. This will open the interior of the meta-node in a new tab. In there, you will find two nodes called ‘Image Resizer’. Double click on them and change the values ‘X’, ‘Y’ and ‘Z’ to something smaller than .66 for more down-sampling, or larger than .66 (at most 1) for less down-sampling.
- Heads up: If you change the amount of down-sampling in ‘Train cascaded RF’, you have to do the same thing in ‘Run cascaded RF’! (see below)
- NOTE: you can also configure the training fraction to be higher than 0.3. This will give better results, but will also increase the training time significantly. 0.3 works well for us.
- Optional: if you want to see what the training is up to you can have a look into the KNIME log file (open it e.g. via the menu ‘View – Open KNIME Log’).
Step 2.1 – loading the data to apply the trained RF on
- Configure the Image Reader to open the microscopy images you want to apply the trained random forest to. (If you are testing this workflow with the exemplary data downloaded from above you can choose the two files from the folder ‘exemplary-data/predict/grayscale/’.)
Step 2.2 + 2.3 – run trained RF and check/save results
- Click on one of the nodes in Step 2.2 and hit first F8 (to fully reset it), then F7 (to execute it).
- Once this node is done you can either visually inspect the outcome using the Table Cell Viewer of Step 2.3 or configure and run the Image Writer to store the results on disk.
- These stored files can then be opened in Fiji and used for cell segmentation using the Pathfinder tool from the MS-ECS-2D package.
- Remember: If you have changed the amount of down-sampling in ‘Train cascaded RF’, you have to do the same thing in ‘Run cascaded RF’! Just configure the ‘Image Resizer’ node within the ‘Run cascaded RF’ meta-node with the same parameter that you used in ‘Train cascaded RF’. Double-click it to open the interior of the meta-node. There is only one ‘Image Resizer’ node in there to be configured, as opposed to two ‘Image Resizer’ nodes in ‘Train cascaded RF’.
Good to know (Expert Tips):
- Training can take a really long time. If you don’t want your computer to go to sleep during training, make sure that you set your system preferences accordingly. Also… if you train on a lot of data and are sitting on a slow computer your training might need more then 3 full days to complete. This can and usually will cause problems when saving the workflow after training. The reason for this is that your operating system is ‘cleaning up’ old temporary files. This is usually useful, but in our case a problem. You can easily fix this by redirecting knime to a user-defined folder to store temporary data into.
- In KNIME open via menu: KNIME – Preferences
- Then choose ‘KNIME’ on the left side, and
- enter a folder name of your choice in the text field with the label ‘Directory for temporary files’.
- Restart KNIME – done!
- After training the RF (this is after having done Step 1.2) you might want to save the workflow in KNIME (File – Save). In this way you will not have to wait another time many hours for the cascaded RF to be trained and you can directly start at Step 2.1 next time (unless your images start looking very different than the ones you trained the RF for).
- You want to save yourself the time to retrain the cascaded random forest but still switch back and forth between different trained forests? You can actually copy the trained RF files in a place of your choice and copy them back later on…
- You find these files in your knime workspace (where also the workflow is stored) under ‘knime-workspace/MS-ECS-2D_2.0/ExternalToolRoot/data/train/Results‘.
- Copy this entire folder somewhere so you can recover it later on.
- To activate this trained forest later on, simply open a workflow, copy the Results-Folder from the previous step into the same place you got it from, and start directly with ‘Step 2.1’! 🙂
5. Trouble Shooting
Problem: When I import and open the workflow I get an error message presented telling me that there where errors during loading.
Solution: This can have multiple reasons. Most likely you forgot to install a KNIME package that is needed in this workflow. Be sure to carefully perform all steps of installing KNIME packages from various update sites as explained in Section 2.2 of this page.
Problem: Step 1.2 fails. If I look inside, I see that the node called ‘Fiji Trainable Segmentation Features 2D‘ reports ‘Execution Failed: Java Heap Space‘.
Solution: Go to the installation folder of KNIME and change the content of ‘KNIME.ini‘ according to this post.
Problem: Step 1.2 fails. If I look inside, I see that the Image Writer node(s) fail(s) mentioning that they are configured not to overwrite existing files.
Solution: Delete temporary files created in a previous execution of the workflow by re-evaluating all nodes in the central (gray) area.
Problem: Step 1.2 takes virtually for ever!
Solution: Yes! If you are only interested in testing the workflow you can change the fraction of pixels used for training from 0.3 to e.g. 0.01. (As described above!) This will make things faster, but results will be less nice.
Problem: Step 2.2 was run on say 1 image, but the results contain the output of other images as well!
Solution: You did not hit the F8 key on the ‘Prediction’ metanode in 2.2 before starting it (e.g. by hitting F7). Unfortunately this is currently required and cannot be automated.