Tutorial

'

Please note, due to the evolution and updates of Wegan, many screenshot illustrations may be outdated. Therefore, we ask that users do not take those steps verbatim. Instead, users should focus on the the analysis concepts and the workflow for ecological data analysis using this tool.

Data Upload

The initial step entails uploading community abundance data. Within this tutorial, we showcase functionalities using example datasets such as Dune, Aravo, Ursine Aquatic Prey, etc. Data should be uploaded as a tab-delimited (.txt) or comma-separated values file (.csv). Select the orientation for samples (sites in rows or columns) and whether to use column names/row names. If it is community abundance data, feature names are designed to be species but can be any taxonomic level or any other variable appropriate for your data analysis.

1.1 Upload your data

Follow the provided instructions to upload custom data. Refer to the user interface screenshot (Figure 1.1.1) for updating the custom data.

Figure 1.1.1: Web page for data uploading in the modules.

  1. Upload Data: Click the "Choose File" button in the "Data Table" column to upload custom correlation data (.csv or .txt files). For detailed instructions, hover over the question mark icons next to input fields.

  2. Specify Data Format: Choose the format of the uploaded community abundance data from the dropdown labeled "Data Format".

  3. Specify Data Labels: Ensure accurate representation by selecting appropriate data labels from the dropdown labeled "Data Labels".

  4. Press the “Submit” button to proceed to the following steps.

The taxonomy module requires additional data with taxonomy data (Figure 1.1.2).

Figure 1.1.2: Web page for data uploading in the taxonomy module.

1.2 Try our test data

Alternatively, users can utilize example data provided within the Cluster and Classification module.

In the panel labeled "Try Our Test Data" (Figure 1.2.1), three example datasets are available: BCI Environmental Data (Environmental data for tree species counts from Barro Colorado Island), Dune (Grass species counts in Dutch dune meadows), and Aravo (Distribution of 82 species of Alpine plants). Users can select their preferred dataset by checking the radio button next to it and proceed to the next steps. Detailed documentation regarding the data format and labels is provided beside the radio buttons. Additionally, users can view and download the dataset by clicking on the hyperlink following the description.

  1. Choose the desired example data: Click one radio button for the respective data.

  2. Press the “Submit” button to proceed to the following steps.

Figure 1.2.1: A panel displaying example data on the data uploading page for the Clustering and Classification module.

Data Processing

2.1 Data Integrity Check of Clustering and Classification Data

After uploading data, integrity checks are conducted on statistical data. In the navigation tree, 'Data Check' is highlighted under 'Processing'. If data meets integrity criteria (Figure 2.1.1), users can proceed; if not, detailed errors are shown in the result panel, prompting users to adjust and reload their data.

The data overview tab shows the data integrity checks and exploratory data plots. The data integrity check shows if the data meets the integrity criteria. The exploratory data plots show the data in graphical form. Two plotting options, PCA and heatmap, can be used to access and explore data before analysis.

The data editor tab can be used to check the status of individual variables, change the type of data, edit the metadata, or delete variables (Figure 2.1.2).

Figure 2.1.1: Web Page for Clustering and Classification Data Integrity Check. The upper panel (green) displays data integrity criteria (presence of missing values and checking sample and variable labels), while the lower panel (blue) shows dynamic results from uploaded data.

Figure 2.1.2: Web Page for Clustering and Classification Data Editor. The blue panel contains the data you have uploaded; identifying the name of the species, the type of data it contains (Continuous, Categorical, Ordinal), whether or not the species passes the integrity check, and buttons to either edit individual entries or remove the species.

2.A Missing Values in Data

If there were missing data detected, then the link to the Missing Value Estimation page becomes enabled. This page provides two methods to deal with missing values. Method 1 is to remove species that have some percentage of their data missing. This percentage is configurable by the user, allowing them to customize which species can be removed (Figure 2.A.1).

Method 2 involves selecting a process to replace the remaining missing values using the radio buttons and menus (Figure 2.A.1). For option 2, the menu may be changed to mean or to median. For option 3, the menu may be changed to Probabilistic (PPCA), Bayesian Principal Component Analysis (BPCA), or Singular Value Decomposition (SVD).

2.2 Filtering of Data

Rare species can be defined based on their prevalence (presence across all samples) using a cutoff for the percent of species present.

Species occurring at a lower prevalence may be the result of sampling (random chance or errors), and therefore may not represent meaningful ecological patterns but noise in the data. However, rare species can be of interest for some analyses.

Users are encouraged to explore the application of thresholds based on their analysis goals. (Figure 2.2.1).

Figure 2.2.1: Web page for data filtering of the correlation data uploaded.

2.3 Normalization of Data

Normalization procedures manipulate data so that samples and variables are more comparable. Choose "Automatic Normalization" to obtain the optimal normalization based on Pearson Mode Skewness. Choose "Manual Normalization" to use your selected method of normalization. These options will bring up plots to view the effect of your selections. The green panel presents various method options for normalization and scaling (Figure 2.3.1).

Figure 2.3.1: Web page for Normalization processing with automatic normalization.

Upon normalization, users can visualize the data distribution before and after the process. When the "Automatic Normalization" or "Manual Normalization" button is active, a dialog window titled "Normalization Result" appears. The boxplots show at most 50 variables/samples due to space limitation; the density plots are based on all data. Users can proceed to the next step by clicking "Proceed" if satisfied or return to the method menu page by clicking "Go Back" if unsatisfied (Figure 2.3.3).

Figure 2.3.3: Dialog window for Normalization results.

In the "Normalization Result" dialog window, users can explore data distribution plots for variables and samples by selecting the "Variable Overview" and "Sample Overview" tabs, respectively. To download distribution plots, users can click the "Export" button. This opens a dialog box titled "Graphics Center" allowing adjustments to the "Format," "Resolution," and "Size" of the plots. After confirming these parameters, clicking "Submit" generates a hyperlink for download (Figure 2.3.4). Users can press “Proceed” to finish data normalization and jump into the main page of the desired module.

Figure 2.3.4: Dialog box for Graphics Center.

Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?