CropGrowth.jl

CropGrowth.jl is a Julia package for modelling crop growth curves using the generalised logistic function:

Quickstart

using Pkg
Pkg.add(CropGrowth)
using CropGrowth
df = simulate()
df_out = fitgrowthmodels(df, verbose=true)

Model

\[y(t) = {A + {{K-A} \over {C + (Qe^{-Bt})^{1/v}}}}\]

where:

  • $y(t)$: biomass at time $t$
  • $A$: lower asymptote (initial or minimum biomass)
  • $K$: positively affects the upper asymptote. This is the final or maximum biomass if:
    • $C = 1.00$, since:
    • $y_{max} = A + (K-A)/C^{1/v}$, then
    • $y_{max} = A + K - A$, therefore:
    • $y_{max} = K$
  • $C$: negatively affects the final or maximum biomass
  • $Q$: negatively affects initial or minimum biomass
  • $e$: Euler's number (~2.71828)
  • $B$: growth rate
  • $v$: asymmetry parameter ($v ≥ 0$; small values: fast growth early; large values: fast growth later)

To solve for $t$ at specific $y$:

\[t(y) = -{{1} \over {B}} \log {\left( { {{1} \over {Q}} \left( \left( {K - A} \over {y - A} \right)^v - C \right) } \right) }\]

Installation

Please install the latest version of Julia (we aim to update CropGrowth.jl for the latest Julia release):

curl -fsSL https://install.julialang.org | sh
type -a julia

Install the CropGrowth.jl library in Julia:

  • Stable version:
using Pkg
Pkg.add(CropGrowth)
  • Development version:
using Pkg
Pkg.add("https://github.com/jeffersonfparil/CropGrowth.jl")

Test the installation in Julia:

using CropGrowth
df = simulate()
df_out = fitgrowthmodels(df, verbose=true)

Implementation details

The main function, fitgrowthmodels fits generalized logistic growth models to the data provided in the input DataFrame and returns a DataFrame containing the fitted parameters, fit statistics, and time to reach specified fraction of the final value. The complete function signature is:

fitgrowthmodels(
  df::DataFrame;
  A = Dict(
      :init=>minimum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
      :lower=>0.0,
      :upper=>maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
  ),
  K = Dict(
      :init=>maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
      :lower=>0.0,
      :upper=>2*maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
  ),
  C = Dict(:init=>1.0, :lower=>1.0, :upper=>1.0),
  Q = Dict(:init=>1.0, :lower=>1.0, :upper=>1.0),
  B = Dict(:init=>1.0, :lower=>0.0, :upper=>10.0),
  v = Dict(:init=>1.0, :lower=>1e-5, :upper=>10.0),
  min_t::Int64 = 3,
  frac_of_final::Vector{Float64} = [0.5, 0.9],
  fit_statistic::String = "R²",
  maxiters::Int64 = 10_000,
  seed::Int64 = 42,
  show_plots::Bool = false,
  verbose::Bool = false,
)::Tuple{DataFrame, Vector{String}}

Arguments

  • df::DataFrame: Input data containing the required columns specified in REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"] and at least one trait column.
  • A::Dict: Search space for the parameter A (lower asymptote). Contains :init, :lower, and :upper keys. Defaults to the minimum and maximum of the trait column with init=minimum.
  • K::Dict: Search space for the parameter K (upper asymptote). Contains :init, :lower, and :upper keys. Defaults to the minimum and 2×maximum of the trait column with init=maximum.
  • C::Dict: Search space for the parameter C. Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1.0, upper=1.0.
  • Q::Dict: Search space for the parameter Q. Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1.0, upper=1.0.
  • B::Dict: Search space for the parameter B (growth rate). Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=0.0, upper=10.0.
  • v::Dict: Search space for the parameter v (asymmetry parameter). Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1e-5, upper=10.0.
  • min_t::Int64: Minimum number of time points required to fit the growth model for a specific combination of entry, site, replication, and growing period. Defaults to 3.
  • frac_of_final::Vector{Float64}: Percentages of the final value for which the time to reach these fraction will be calculated. Defaults to [0.5, 0.9].
  • fit_statistic::String: The fit statistic to be used for evaluating the model. Must be one of ["R²", "RMSE", "MSE", "MAE", "ρ"]. Defaults to "R²".
  • maxiters::Int64: Maximum number of iterations allowed for the optimization process. Defaults to 10_000.
  • seed::Int64: Random seed for reproducibility. Defaults to 42.
  • show_plots::Bool: Whether to show fitted growth curve plots. Defaults to false.
  • verbose::Bool: Whether to display progress and additional information during the fitting process. Defaults to false.

Returns

Tuple{DataFrame, Vector{String}}:

  • The first element is a DataFrame containing the fitted parameters (A, K, C, Q, B, v), fit statistics, value of the growth models at $t=0$ (y_t0), maximum value of the growth model (y_max), and time to reach specified fraction of the final value for each combination of entry, site, replication, and growing period.
  • The second element is a Vector{String} containing the combinations that were skipped due to insufficient data points.

Notes

  • The input DataFrame must contain the required columns specified in the global variable REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"], as well as at least one additional trait column.
  • If the DataFrame contains more than one trait column, only the first trait column will be used.
  • Combinations with fewer than min_t time points will be skipped.
  • The function uses a progress bar to indicate the fitting process if verbose=true.
  • The optimisation is performed using the BBO_adaptive_de_rand_1_bin_radiuslimited() algorithm (details of the optimisation algorithm).
  • The optimisation algorithm minimises the mean squared error between the observed data y and the generalised logistic model.

Examples

using CropGrowth, StatsBase
df = simulate(n_entries=5, n_sites=1, n_replications=1, n_growing_periods=1, n_time_points_per_growing_period=5, seed=123)
df_out_0, skipped_combinations_0 = fitgrowthmodels(df, show_plots=true)
df_out_C, skipped_combinations_C = fitgrowthmodels(df, C=Dict(:init=>1.0, :lower=>0.0, :upper=>100.0), show_plots=true)
df_out_Q, skipped_combinations_Q = fitgrowthmodels(df, Q=Dict(:init=>1.0, :lower=>0.0, :upper=>100.0), show_plots=true)
df_out_CQ, skipped_combinations_CQ = fitgrowthmodels(
    df,
    C=Dict(:init=>1.0, :lower=>0.0, :upper=>5.0),
    Q=Dict(:init=>1.0, :lower=>0.0, :upper=>5.0),
    show_plots=true
)
mean(df_out_0.R²)
mean(df_out_C.R²)
mean(df_out_Q.R²)
mean(df_out_CQ.R²)

Comparison with nplr

TLDR: CropGrowth.jl fits a more general logistic function than nplr.

The R package nplr: n-parameter logistic regressions fits a variant of the logistic model. It fits 5 parameters instead of the 6 parameters in CropGrowth.jl's formulation above. The model is defined as:

\[y(x) = { B + { {T - B} \over {\left [ 1 + 10^{b(x_{mid} - x)} \right ]^s} } }\]

where:

  • $B$: lower asymptote
  • $T$: upper asymptote
  • $b$: Hill slope (related to B or growth rate)
  • $x_{mid}$: x-coordinate at the inflexion point,
  • $s$ asymetry coefficient (related to v)

Fitting log10(dose)-response curve with nplr:

library(nplr)
library(txtplot)
path <- system.file("extdata", "pc3.txt", package="nplr")
pc3 <- read.delim(path)
txtplot::txtplot(log10(pc3$CONC), pc3$GIPROP) # nplr default does log10(x)
np1 <- nplr::nplr(x=pc3$CONC, y=pc3$GIPROP, useLog=TRUE)
y = pc3$GIPROP
y_hat = np1@yFit
print("Fitted parameters:")
print(paste0("K = ", np1@pars$bottom))
print(paste0("A = ", np1@pars$top))
print(paste0("xmid = ", np1@pars$xmid))
print(paste0("B_ish = ", np1@pars$scal))
print(paste0("v_ish = ", np1@pars$scal))
print("Fit statistics:")
print(paste0("mae = ", mean(abs(y - y_hat))))
print(paste0("mse = ", mean((y - y_hat)^2)))
print(paste0("rmse = ", sqrt(mean((y - y_hat)^2))))
print(paste0("cor = ", cor(y, y_hat)))
print(paste0("r2 = ", 1 - (var((y - y_hat)) / var(y))))
# [1] "Fitted parameters:"
# [1] "K = 0.000182956669824204"
# [1] "A = 0.996490622312222"
# [1] "xmid = -6.18254538738321"
# [1] "B_ish = -1.42800873142546"
# [1] "v_ish = -1.42800873142546"
# [1] "Fit statistics:"
# [1] "mae = 0.0186073781104083"
# [1] "mse = 0.000651821232462819"
# [1] "rmse = 0.0255307898910868"
# [1] "cor = 0.997250250732751"
# [1] "r2 = 0.994444954867448"

CropGrowth.jl achieves very similar but more generalised fit:

using DataFrames, CSV, CropGrowth
df = CSV.read(expanduser("~/.conda/envs/GenomicBreeding/lib/R/library/nplr/extdata/pc3.txt"), DataFrame)
growth_model = modelgrowth(
    t = log10.(df.CONC),
    y = df.GIPROP,
    maxiters = 100_000,
    seed = 42,
    verbose = true,
)
# Fitted parameters:
# A = 0.991559343580949
# K = 0.0380737475678395
# C = 1.0
# Q = 1.5219825671137292e-5
# B = 1.5916146320430173
# v = 0.15864685998073122
# y_t0 = 0.03816521536021922
# y_max = 0.038073747568793115
# Fit statistics:
# mse = 0.0006662392521457019
# rmse = 0.0258116108010659
# R² = 0.9943191952067459
# ρ = 0.9971555521616207
# mae = 0.020947684621310015

API

CropGrowth.GrowthModelType
mutable struct GrowthModel
    A::Float64
    K::Float64
    C::Float64
    Q::Float64
    B::Float64
    v::Float64
    y_t0::Float64
    y_max::Float64
    fit_statistics::Dict{String,Float64}

Holds parameters for the generalised logistic growth model:

\[y(t) = A + \frac{K-A}{C + (Qe^{-Bt})^{1/v}}\]

where:

  • $y(t)$: biomass at time $t$ (not part of the struct)
  • $A$: lower asymptote (initial or minimum biomass)
  • $K$: positively affects the upper asymptote (can be the final or maximum biomass if $C = 1.00$, since $y_{max} = A + (K-A)/C^(1/v)$, then $y_{max} = A + K - A$, therefore: $y_{max} = K$)
  • $C$: negatively affects the final or maximum biomass
  • $Q$: negatively affects initial or minimum biomass
  • $e$: Euler's number (~2.71828)
  • $B$: growth rate
  • $v$: asymmetry parameter ($v ≥ 0$; small values: fast growth early; large values: fast growth later)

Additional information are provided in the struct:

  • y_t0: value of the growth model at time $t = 0$
  • y_max: maximum value of the growth model ($y_{max} = A + (K-A)/C^(1/v)$)
  • fit_statistics: a dictionary containing fit statistics such as R², RMSE, MSE, MAE, and Pearson's correlation coefficient (ρ)

Constructor

GrowthModel(;
    A::Float64,
    K::Float64,
    C::Float64,
    Q::Float64,
    B::Float64,
    v::Float64,
    fit_statistics::Dict{String,Float64} = Dict(""=>NaN),
    ϵ::Float64 = 1e-12,
)

Notes:

  • The struct includes a constructor that automatically calculates y_max based on the provided parameters.
  • The fit_statistics field is optional and defaults to a dictionary with a single entry of an empty string key and NaN value.
  • A small epsilon value (ϵ) is added to the calculation of y_max to prevent division by zero errors.

Example

julia> growth_model = GrowthModel(A=0.0, K=10.0, C=1.0, Q=1.0, B=0.75, v=0.1);

julia> (growth_model.y_max - growth_model.K) < 1e-9
true

julia> (growth_model.y_max - 10.0) < 1e-9
true
source
CropGrowth.fitgrowthmodelsMethod
fitgrowthmodels(
    df::DataFrame;
    A = Dict(
        :init=>minimum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
        :lower=>0.0,
        :upper=>maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
    ),
    K = Dict(
        :init=>maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
        :lower=>0.0,
        :upper=>2*maximum(select(df, Not(REQUIRED_COLUMNS))[:, 1]),
    ),
    C = Dict(:init=>1.0, :lower=>1.0, :upper=>1.0),
    Q = Dict(:init=>1.0, :lower=>1.0, :upper=>1.0),
    B = Dict(:init=>1.0, :lower=>0.0, :upper=>10.0),
    v = Dict(:init=>1.0, :lower=>1e-5, :upper=>10.0),
    min_t::Int64 = 3,
    frac_of_final::Vector{Float64} = [0.5, 0.9],
    fit_statistic::String = "R²",
    maxiters::Int64 = 10_000,
    seed::Int64 = 42,
    show_plots::Bool = false,
    verbose::Bool = false,
)::Tuple{DataFrame, Vector{String}}

Fits generalised logistic growth models to the data provided in the input DataFrame and returns a DataFrame containing the fitted parameters, fit statistics, and time to reach specified fractions of the final value.

Arguments

  • df::DataFrame: Input data containing the required columns specified in REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"] and at least one trait column.
  • A::Dict: Search space for the parameter A (lower asymptote). Contains :init, :lower, and :upper keys. Defaults to the minimum and maximum of the trait column with init=minimum.
  • K::Dict: Search space for the parameter K (upper asymptote). Contains :init, :lower, and :upper keys. Defaults to the minimum and 2×maximum of the trait column with init=maximum.
  • C::Dict: Search space for the parameter C. Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1.0, upper=1.0.
  • Q::Dict: Search space for the parameter Q. Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1.0, upper=1.0.
  • B::Dict: Search space for the parameter B (growth rate). Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=0.0, upper=10.0.
  • v::Dict: Search space for the parameter v (asymmetry parameter). Contains :init, :lower, and :upper keys. Defaults to init=1.0, lower=1e-5, upper=10.0.
  • min_t::Int64: Minimum number of time points required to fit the growth model for a specific combination of entry, site, replication, and growing period. Defaults to 3.
  • frac_of_final::Vector{Float64}: Percentages of the final value for which the time to reach these fractions will be calculated. Defaults to [0.5, 0.9].
  • fit_statistic::String: The fit statistic to be used for evaluating the model. Must be one of ["R²", "RMSE", "MSE", "MAE", "ρ"]. Defaults to "R²".
  • maxiters::Int64: Maximum number of iterations allowed for the optimisation process. Defaults to 10_000.
  • seed::Int64: Random seed for reproducibility. Defaults to 42.
  • show_plots::Bool: Whether to show fitted growth curve plots. Defaults to false.
  • verbose::Bool: Whether to display progress and additional information during the fitting process. Defaults to false.

Returns

  • Tuple{DataFrame, Vector{String}}:
    • The first element is a DataFrame containing the fitted parameters (A, K, C, Q, B, v), fit statistics, value of the growth models at $t=0$ (y_t0), maximum value of the growth model (y_max), time to reach specified fractions of the final value (time_to_*p), and number of time-points (number_of_time_points) for each combination of entry, site, replication, and growing period.
    • The second element is a Vector{String} containing the combinations that were skipped due to insufficient data points.

Notes

  • The input DataFrame must contain the required columns specified in the global variable REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"], as well as at least one additional trait column.
  • If the DataFrame contains more than one trait column, only the first trait column will be used.
  • Combinations with fewer than min_t time points will be skipped.
  • The function uses a progress bar to indicate the fitting process if verbose=true.
  • The optimisation is performed using the BBO_adaptive_de_rand_1_bin_radiuslimited() algorithm (details of the optimisation algorithm).
  • The optimisation algorithm minimises the mean squared error between the observed data y and the generalised logistic model.
  • Parallel curve fitting:
    • The per-combination fits are executed in parallel using Threads.@threads over the prepared list of filter combinations. To take advantage of this, start Julia with multiple threads (for example julia -t 4 or set JULIA_NUM_THREADS).
    • Access to the shared fitted_parameters dictionary is protected with a ReentrantLock and @lock to ensure thread-safety when appending results. This avoids race conditions but introduces some serialization for the final writes.

Details

Fits a generalised logistic growth model to the data using the following equation:

\[y(t) = A + \frac{K-A}{C + (Qe^{-Bt})^{1/v}}\]

where:

  • $y(t)$: biomass at time $t$
  • $A$: lower asymptote (initial or minimum biomass)
  • $K$: positively affects the upper asymptote (can be the final or maximum biomass if $C = 1.00$, since $y_{max} = A + (K-A)/C^(1/v)$, then $y_{max} = A + K - A$, therefore: $y_{max} = K$)
  • $C$: negatively affects the final or maximum biomass
  • $Q$: negatively affects initial or minimum biomass
  • $e$: Euler's number (~2.71828)
  • $B$: growth rate
  • $v$: asymmetry parameter ($v ≥ 0$; small values: fast growth early; large values: fast growth later)

Additional information are provided in the output DataFrame:

  • y_t0 is the value of the growth model at time $t = 0$
  • y_max is the maximum value of the growth model ($y_{max} = A + (K-A)/C^(1/v)$)
  • One of the following fit statistic such as R² (default), RMSE, MSE, MAE, and Pearson's correlation coefficient (ρ)

Example

julia> df = simulate(n_entries=5, seed=42);

julia> df_out, skipped_combinations = fitgrowthmodels(df);

julia> length(unique(df.entries)) == length(unique(df_out.entries))
true

julia> length(unique(df.sites)) == length(unique(df_out.sites))
true

julia> length(unique(df.replications)) == length(unique(df_out.replications))
true

julia> length(unique(df.growing_periods)) == length(unique(df_out.growing_periods))
true

julia> all(x -> x ∈ names(df_out), ["A", "K", "C", "Q", "B", "v", "y_max", "R²", "time_to_50p", "time_to_90p"])
true

julia> (var(df_out.A) > 0.0) && (var(df_out.K) > 0.0) && (var(df_out.C) == 0.0) && (var(df_out.Q) == 0.0) && (var(df_out.B) > 0.0) && (var(df_out.v) > 0.0)
true

julia> (var(df_out.y_max) > 0.0) && (var(df_out."R²") > 0.0) && (var(df_out.time_to_50p) > 0.0) && (var(df_out.time_to_90p) > 0.0)
true
source
CropGrowth.fitstatisticsMethod
fitstatistics(; y::Vector{Float64}, ŷ::Vector{Float64})::Dict{String, Float64}

Compute various fit statistics for a given set of observed and predicted values.

Arguments

  • y::Vector{Float64}: Observed data points.
  • ŷ::Vector{Float64}: Predicted data points.

Returns

A dictionary containing the following fit statistics:

  • "R²": Coefficient of determination, a measure of how well the predicted values explain the variance in the observed data.
  • "rmse": Root mean squared error, a measure of the average magnitude of the residuals.
  • "mse": Mean squared error, the average of the squared residuals.
  • "mae": Mean absolute error, the average of the absolute residuals.
  • "ρ": Pearson correlation coefficient between the observed and predicted values.

Notes

  • Ensure that the y and vectors are of the same length and correspond to the same observations.
  • The function assumes that both y and are non-empty vectors.

Example

julia> y1::Vector{Float64} = rand(10); y2::Vector{Float64} = rand(10);

julia> stats0 = fitstatistics(y=y1, ŷ=y1);

julia> stats1 = fitstatistics(y=y1, ŷ=y2);

julia> stats0["R²"] > stats1["R²"]
true

julia> stats0["rmse"] < stats1["rmse"]
true

julia> stats0["mse"] < stats1["mse"]
true

julia> stats0["mae"] < stats1["mae"]
true

julia> stats0["ρ"] > stats1["ρ"]
true
source
CropGrowth.generalisedlogisticMethod
generalisedlogistic(growth_model::GrowthModel; t::Vector{Float64})::Vector{Float64}

Computes the generalised logistic function for a given vector of time points t using the parameters defined in the GrowthModel.

Arguments

  • growth_model::GrowthModel: A struct containing the parameters for the generalised logistic function.
  • t::Vector{Float64}: A vector of time points at which to evaluate the function.

Returns

  • Vector{Float64}: A vector of values representing the generalised logistic function evaluated at each time point in t.

Returns

  • Vector{Float64}: A vector of values representing the generalised logistic function evaluated at each time point in t.

Example

julia> t = collect(0.0:1.0:10.0);

julia> growth_model = GrowthModel(A=0.0, K=10.0, C=1.0, Q=1.0, B=0.75, v=0.1);

julia> y = generalisedlogistic(growth_model, t=t);

julia> (y[1] == minimum(y)) && (y[end] == maximum(y))
true
source
CropGrowth.generalisedlogisticMethod
generalisedlogistic(t::Vector{Float64}; A=0.0, K=1.0, C=1.0, Q=1.0, B=1.0, v=1.0)::Vector{Float64}

Computes the generalised logistic function for a given vector of time points t.

Arguments

  • t::Vector{Float64}: A vector of time points at which to evaluate the function.

Keyword Arguments

  • A::Float64 (default: 0.0): lower asymptote (initial or minimum biomass)
  • K::Float64 (default: 1.0): upper asymptote (can be the final or maximum biomass if $C = 1.00$, since $y_{max} = A + (K-A)/C^(1/v)$, then $y_{max} = A + K - A$, therefore: $y_{max} = K$)
  • C::Float64 (default: 1.0): negatively affects the final or maximum biomass
  • Q::Float64 (default: 1.0): negatively affects initial or minimum biomass
  • B::Float64 (default: 1.0): growth rate
  • v::Float64 (default: 1.0): asymmetry parameter ($v ≥ 0$; small values: fast growth early; large values: fast growth later)

Returns

  • Vector{Float64}: A vector of values representing the generalised logistic function evaluated at each time point in t.

Example

julia> t = collect(0.0:1.0:10.0);

julia> y = generalisedlogistic(t; A=0.0, K=10.0, C=1.0, Q=1.0, B=0.75, v=0.1);

julia> (y[1] == minimum(y)) && (y[end] == maximum(y))
true
source
CropGrowth.modelgrowthMethod
modelgrowth(; 
    y::Vector{Float64}, 
    t::Vector{Float64}, 
    θ_search_space::Dict{String, Dict{Symbol, Float64}}, 
    maxiters::Int64=10_000, 
    seed::Int64=42, 
    verbose::Bool=false
)::GrowthModel

Fit a generalised logistic growth model to the given data y over time t.

Arguments

  • y::Vector{Float64}: The observed data points representing the growth values.
  • t::Vector{Float64}: The corresponding time points for the observed data.
  • θ_search_space::Dict{String, Dict{Symbol, Float64}}: A dictionary defining the search space for each parameter of the generalised logistic model. Each key corresponds to a parameter name ("A", "K", "C", "Q", "B", "v"), and the value is another dictionary with keys :init, :lower, and :upper specifying the initial value, lower bound, and upper bound for that parameter.
  • maxiters::Int64=10_000: The maximum number of iterations for the optimisation algorithm. Defaults to 10,000.
  • seed::Int64=42: The random seed for reproducibility. Defaults to 42.
  • verbose::Bool=false: If true, prints the fitted parameters, fit statistics, and displays a plot of the fitted model. Defaults to false.

Returns

  • GrowthModel: A structure containing the fitted parameters of the generalised logistic growth model and fit statistics.

Details

  • The function uses an optimisation algorithm to minimise the mean squared error between the observed data y and the generalised logistic model.
  • The model parameters are constrained within bounds specified in the θ_search_space dictionary.
  • The optimisation is performed using the BBO_adaptive_de_rand_1_bin_radiuslimited() algorithm (details of the optimisation algorithm).
  • The function also computes fit statistics, which are included in the returned GrowthModel structure.
  • If verbose is set to true, the function prints the fitted parameters, fit statistics, and displays a scatter plot of the observed data along with the fitted curve.

Notes

  • y_max may be unexpectedly high if C is fitted freely which may indicate a simple linear growth rather than a logistic growth pattern.
  • If y_max is desired to be equal to K, set C = 1.0 when defining the θ_search_space, i.e. In θ_search_space define "C" => Dict(:init=>1.0, :lower=>1.0, :upper=>1.0).

Example

julia> t = collect(1.0:10.0);

julia> growth_model_1 = modelgrowth(t=t, y=sort(vcat([0.0, 0.0, 1.0, 1.0], rand(6))));

julia> growth_model_2 = modelgrowth(t=t, y=rand(10));

julia> growth_model_1.fit_statistics["R²"] > growth_model_2.fit_statistics["R²"]
true
source
CropGrowth.readdelimitedMethod
readdelimited(; fname::String, delim::Union{Char,String} = "	", trait_name::String = "biomass")::DataFrame

Reads a delimited file into a DataFrame and validates its structure.

Arguments

  • fname::String: The path to the file to be read.
  • delim::Union{Char,String}: The delimiter used in the file. Defaults to tab (" ").
  • trait_name::String: The name of the trait column to validate. Defaults to "biomass".

Returns

  • A DataFrame containing the data from the file with the following required columns:
    • entries
    • sites
    • replications
    • growing_periods
    • time_points
    • The column specified by trait_name

Throws

  • ArgumentError: If the file does not exist.
  • ArgumentError: If any of the REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"], including trait_name, are missing.

Notes

  • The function expects the file to contain specific REQUIRED_COLUMNS = ["entries", "sites", "replications", "growing_periods", "time_points"], defined in REQUIRED_COLUMNS, along with the column specified by trait_name.

Example

julia> df = simulate();

julia> CSV.write("test.tsv", df, delim="	");

julia> df_read = readdelimited(fname="test.tsv", delim="	", trait_name="biomass");

julia> isequal(df, df_read)
true
source
CropGrowth.simulateMethod
simulate(; 
    n_entries::Int64 = 100, 
    n_sites::Int64 = 2, 
    n_replications::Int64 = 3, 
    n_growing_periods::Int64 = 4, 
    n_time_points_per_growing_period::Int64 = 10, 
    seed::Int64 = 42
)::DataFrame

Simulates crop growth data and returns it as a DataFrame. The function generates synthetic data for multiple entries, sites, replications, growing periods, and time points, with random biomass values.

Keyword Arguments

  • n_entries::Int64: Number of crop entries to simulate (default: 100).
  • n_sites::Int64: Number of sites to simulate (default: 2).
  • n_replications::Int64: Number of replications per site (default: 3).
  • n_growing_periods::Int64: Number of growing periods per replication (default: 4).
  • n_time_points_per_growing_period::Int64: Number of time points per growing period (default: 10).
  • seed::Int64: Random seed for reproducibility (default: 42).

Returns

A DataFrame with the following columns:

  • entries: Crop entry identifiers.
  • sites: Site identifiers.
  • replications: Replication identifiers.
  • growing_periods: Growing period identifiers.
  • time_points: Time points within each growing period.
  • biomass: Simulated biomass values.

Example

julia> df = simulate(n_entries=5, n_sites=1, n_replications=1, n_growing_periods=1, n_time_points_per_growing_period=5, trait_name="some_trait_name", seed=123);

julia> size(df)
(25, 6)

julia> var(df.some_trait_name) > 0.0
true
source
CropGrowth.timetomaxpercMethod
timetomaxperc(growth_model::GrowthModel; p::Vector{Float64} = [0.5])::Vector{Float64}

Calculate the time required for a growth model to reach a proportion p of its carrying capacity K.

Arguments

  • growth_model::GrowthModel: An instance of the GrowthModel type containing the parameters of the growth model.
  • p::Vector{Float64}: A vector of proportions (default is [0.5]) representing the fraction of the carrying capacity ($y_{max}$) to compute the time for.

Returns

  • Vector{Float64}: A vector of times corresponding to each proportion in p.

Details

  • The function computes the maximum value of the growth model ($y_{max} = A + (K-A)/C^(1/v)$) based on its parameters.
  • For each proportion in $p$, the corresponding value of $y$ is calculated as $p * y_{max}$.
  • The time required to reach each $y$ is computed using the generalised logistic growth model formula, which involves logarithmic and complex arithmetic operations.
  • The result is the real part of the computed times.
  • The formula is:

\[t = -\frac{1}{B} \cdot \ln \left( \frac{\left( \left( \frac{K - A}{y - A} \right)^v - C \right)}{Q} \right)\]

Notes

  • The function assumes that the GrowthModel instance contains the parameters A, K, C, Q, B, and v.
  • The computation may involve complex numbers, but only the real part of the result is returned.

Example

julia> growth_model = modelgrowth(t=collect(1.0:10.0), y=sort(vcat([0.0, 0.0, 1.0, 1.0], rand(6))));

julia> p = [0.5, 0.95, 0.99];

julia> time_to_Kp = timetomaxperc(growth_model; p=p);

julia> ŷ = generalisedlogistic(growth_model, t=time_to_Kp);

julia> all( abs.(ŷ .- (p .* growth_model.y_max)) .< 1e-5 )
true
source
CropGrowth.writedelimitedMethod
writedelimited(df::DataFrame; fname::String, delim::Union{Char,String} = "	", overwrite::Bool = false)::Nothing

Writes the contents of a DataFrame to a delimited text file.

Arguments

  • df::DataFrame: The data frame to be written to the file. Must contain the required columns:
    • entries
    • sites
    • replications
    • growing_periods
    • time_points
    • 1 additional column representing the trait data
  • fname::String: The name of the output file, including its path if necessary.
  • delim::Union{Char,String}: The delimiter to use in the output file. Defaults to a tab character (" ").
  • overwrite::Bool: Whether to overwrite the file if it already exists. Defaults to false.

Returns

  • Nothing: This function does not return a value.

Example

julia> df = simulate();

julia> fname = writedelimited(df, fname="test.tsv", delim="	", overwrite=true);

julia> isfile(fname)
true

julia> df_read = readdelimited(fname="test.tsv", delim="	", trait_name="biomass");

julia> isequal(df, df_read)
true
source