Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable creation for Data types #11

Open
JulStraus opened this issue Feb 1, 2024 · 2 comments
Open

Variable creation for Data types #11

JulStraus opened this issue Feb 1, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@JulStraus
Copy link
Member

JulStraus commented Feb 1, 2024

So far, we did not have a functionality for adding variables for Data types except through dispatching on the model type. That may lead to issues with respect to method ambiguities. Hence, a new method is sought after.

Introducing variables for Data types should follow the following three standards:

  1. they should be only introduced for the required nodes that require them,
  2. they should be indexed over the node and not the data, and
  3. they should not require changes to the core code.

Point 2 is the reason we cannot use the same approach as applied for Node types as we would in this case index over the data, and not the node.

A potential solution for this problem, still utilizing the core structure developed for Node types, is outlined below. It requires the introduction of a function variables_data(m, 𝒩, 𝒯, 𝒫, modeltype) to the function create_model given by:

function variables_data(m, 𝒩::Vector{<:Node}, 𝒯, 𝒫, modeltype::EnergyModel)

    # Extract all Data types within all Nodes
    𝒟 = reduce(vcat, [node_data(n) for n  𝒩])
    if isempty(𝒟) return end
    # Vector of the unique data types in 𝒟.
    data_composite_types = unique(typeof.(𝒟))
    # Get all `Data`-types in the type-hierarchy that the nodes 𝒟 represents.
    data_types = collect_types(data_composite_types)
    # Sort the `Data`-types such that a supertype will always come before its subtypes.
    data_types = sort_types(data_types)

    for data_type  data_types
        # All nodes of the given sub type.
        𝒟ˢᵘᵇ = filter(data -> isa(data, data_type), 𝒟)
        # Convert to a Vector of common-type instad of Any.
        𝒟ˢᵘᵇ = convert(Vector{data_type}, 𝒟ˢᵘᵇ)
        try
            variables_data(m, 𝒟ˢᵘᵇ, 𝒩, 𝒯, 𝒫, modeltype)
        catch e
            # Parts of the exception message we are looking for.
            pre1 = "An object of name"
            pre2 = "is already attached to this model."
            if isa(e, ErrorException)
                if occursin(pre1, e.msg) && occursin(pre2, e.msg)
                    # 𝒟ˢᵘᵇ was already registered by a call to a supertype, so just continue.
                    continue
                end
            end
            # If we make it to this point, this means some other error occured. This should
            # not be ignored.
            throw(e)
        end
    end

    # Add the variables for the required nodes.
    for n  𝒩, data  node_data(n)
        variables_data(m, data, n, 𝒯, 𝒫, modeltype)
    end
end

In addition, we have to create two additional functions:

function variables_data(m, 𝒟ˢᵘᵇ::Vector{<:Data}, 𝒩::Vector{<:Node}, 𝒯, 𝒫, modeltype::EnergyModel)
end

function variables_data(m, data::Data, n::Node, 𝒯, 𝒫, modeltype::EnergyModel)
end

The first function is used to create empty variable containers through the application of SparseVariables, while the second would only inserts the variables for the nodes that have the corresponding data.

Another required change is to define the following function.

node_data(n::Availability) = Data[]

This function may be in general dangerous as it requires that if one wants to define a new composite subtype of Availability, one has to define as well the function for this type. However, the throught process is that Availability nodes should not include the data field.

The code runs with these functions added, but I have not tested it for additional variable creation. The function for looping through the types is however working. I have not added SparseVariables either, as it is not yet registered.

@JulStraus JulStraus added the enhancement New feature or request label Feb 1, 2024
@hellemo
Copy link
Member

hellemo commented May 10, 2024

I think this is a promising approach.

  1. Data is a very general term, would it be easier to understand if we call it something that explains that it is for extension packages? I.e. something like ExtensionData perhaps
  2. I was hoping we could avoid the manual type checking and just dispatch on the nodes/links directly. I don't quite understand why we need to sort the types, I guess a subtype may add more or fewer variables/constrains depending on the use case.

Could dispatching on the type itself for generating the (empty) containers using SparseVariables be a solution? That would allow someone writing an extension package to only exploit knowledge about which subtypes use the same variables/constraints.

We would still need to check that the variable has not already been added, as multiple subtypes may define the same variables. I can't think of a way to avoid it completely, but at least we only need to do it once for each type.

See stylized example below:

	# Simplified node structure for illustration
	abstract type GenericNode end
	struct ANode <: GenericNode
		id
	end
	struct BNode <: GenericNode
		id
		val
	end

	# Methods to create (empty) containers based on the node (or link) types
	function variables_extension(m, nt::Type{<:GenericNode}, N, T, P, modeltype)
		return "add variable containers with SparseVariables"
	end
	function variables_extension(m, nt::Type{BNode}, N, T, P, modeltype)
		return "add variable containers for `BNode` with SparseVariables"
	end

	# Methods to create variables for actual nodes
	function variables_extension(m, n::GenericNode, N, T, P, modeltype)
		return "create variables for generic nodes"
	end
	function variables_extension(m, n::BNode, N, T, P, modeltype)
		return "create variables for `BNode`s"
	end

	# Simplified test with two node types
	N = [ANode(1), ANode(2), BNode(3,9), BNode(4,3)]
	m = nothing
	T = 1:2; P = 1:2; modeltype = nothing
	
	# First create containers
	for nt  unique(typeof.(N))
		variables_extension(m, nt, N, T, P, modeltype)
	end
	
	# Then create actual variables where needed
	for n  N
		variables_extension(m, n, N, T, P, modeltype)
	end

If instead of adding the variables (containers) directly, we had methods reporting what variables they would need for each type and then only generating unique ones could be a way, or would that be too complicated? E.g. report the name it wants to add for the type as a symbol along with the desired indices and then create them together at a later point?

@JulStraus
Copy link
Member Author

Regarding your two points:

  1. I am fine with recalling it. If we change the name, it is however a breaking change, as far as I see it, as we change the abstract type for all extension packages. So far, it is only the case for EnergyModelsInvestments, but we also have internal packages where we utilize the abstract supertype Data. In addition, constructors in all packages have to be changed.
  2. The outline you mention is the reason for the loop for Nodes, but I do not see a reason why it should be necessary for the extension data, without thinking through all potential implications. In this case, we use SparseVariables, so we do not need to have the type structure deduced as long as all supertypes are called. Sorting is not necessary, the try catch will be necessary, except if we add the logic for checking if a variable is created within variables_extension. I am not the biggest fan of this approach, mostly as

However, your comment brings up a problem in my initial code I did not consider too much. Consider the following case in which we have an intermediate supertype:

# Intermediate supertype
abstract type ExampleData <: Data end

# First composite type
struct ExampleDataA <: ExampleData
end

# Second composite type
struct ExampleDataB <: ExampleData
end

If we want to define variables for all ExampleData, and additional variables for ExampleDataA, but not for ExampleDataB, then we have two possibilities:

  1. Add additional data to both ExampleDataA and ExampleDataB through the direct function. In this case, we would also need to check whether a variable is already declared and have to repeat the variable addition to the sparse containers.
  2. Go through all supertypes of the declared ExampleDataA to add the variable.

Personally, I would prefer the second approach. This way, we reduce one point of error for users using the functionality as they do not have to consider the logic whether a variable name is already registered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants