Data used to quantify the complexity of the workflow on biodiversity-ecosystem functioning
The dataset represents the data directly derived from the workflow. Each row represents one port in the workflow. As most actors have multiple ports multiple rows represent one actors. The rows belonging to one actor are
further highlighted as they are separated by rows of NA. The dataset contains information about the purpose of the purpose of the actor, the description of the purpose, the position in the workflow, the lines of R code in the actor and the count of R functions used, the information about which additional R package have been used. Furthermore it contains information about the port name, whether the port has been used or not and if the port is an input or output port and the overall count of input and output ports of an actor. The dataset also contains information about the variable the ports handle (header from original dataset) and from which dataset the data comes from. Information about the structure of the input for each port is given as well as the length of the input and a lifecycle (how often has it been used) of the variable.
Summary dataset:
The dataset represents an aggregate by actor which is derived from the full dataset where each line represented an actor. In this dataset each line represents an actor. It contains information such as the ratio from output to input ports of an actor, a count of input and output ports, the actor purpose and R functions used in the actor. It also sums datasets identified by the id an actor deals with (domain_ids). It provides information about the whole line of code and the percent contribution of each actor to the total line of code. It also holds information about the used R packages, the total input and output port count as well as the actor position, a count of R packages used, a total count of datasets (domains), the total of R functions used and the the values for complexity (absolute and relative).