Implement Automaton iter for enumerating accepted language? #79

caleb531 · 2022-10-23T16:32:06Z

@eliotwrobson So with all the recent discussion around automata ultimately being representations of finite languages, it got me thinking:

What if we implemented an __iter__ method that produced each string in that automaton's accepted language? My thought it we could generate one string at a time via yield, and leave it up to the user if they want to convert the automaton to a list or set. Ideally, we could start with DFAs/NFAs, and then explore if a similar algorithm is possible with PDAs and TMs.

How difficult do you think this would be to implement from an algorithmic perspective? Initially, my thought is generating all permutations of the machine's input symbols, then filter that set down to only those accepted via accepts_input.

Although the number of iterations could be enormous, so maybe we could filter that set further by analyzing transition symbols from the initial state (first character) and final states (possible last characters). But I'm curious if there is an approach that isn't so brute-force.

The text was updated successfully, but these errors were encountered:

eliotwrobson · 2022-10-23T17:14:11Z

@caleb531 for DFA/NFAs, I think there are algorithms for this based on treating the transitions as a graph and doing graph search (or similar algorithms). I'll have to look through some books to see if they explicitly describe something like this, but here's a stackoverflow post describing something similar:
https://stackoverflow.com/questions/33058142/optimal-algorithm-to-count-the-number-of-strings-a-dfa-accepts

As for the other automation, I'm not sure about PDAs, but for TMs even checking whether a TM accepts any string (let alone enumeration of all of them) is undecidable (equivalent to the halting problem).

EDIT: Now that I think about it, this problem boils down to traversing edges of a graph in lexicographic order and outputting the string corresponding to the current path. Then on the next yield statement, compute the next such string (like BFS but no need for a visited set).

Tagl · 2022-10-24T01:23:54Z

I think I have some code for this already written. I'll take a look tomorrow.

Tagl · 2022-10-24T02:29:34Z

Decided to look at it before I go to bed.
So this is essentially just a dynamic programming problem.
We can define recursive functions and then we just have to store results in memory to cut our time complexity down to $\mathcal{O}(nk)$, and raising space complexity to the $\mathcal{O}(nk)$, where $n$ is the number of states and $k$ is the length of words.
Of course for generating the words instead of just counting them we have to account for constructing each string as well.
This will add at least a factor of $k$ to the time complexity and a factor of $k$ to space complexity, but state space remains $nk$.
I'll finish the code and optimize after work tomorrow but here's essentially the logic:

    def count_words_of_length(self, k):
        """
        Counts the number of words of length k in the language represented by the DFA
        """
        return self._count_words_of_length(self.initial_state, k)

    def _count_words_of_length(self, state, k):
        """
        Counts words of length k assuming state is the initial state
        """
        if k == 0:
            return 1 if state in self.final_states else 0
        return sum(self._count_words_of_length(self.transitions[state][symbol], k-1) for symbol in self.input_symbols)

    def words_of_length(self, k):
        """
        Generates all words of length k in the language represented by the DFA
        """
        for word in self._words_of_length(self.initial_state, k):
            yield word

    def _words_of_length(self, state, k):
        """
        Generator for accepted words of length k assuming state is the initial state
        """
        if k == 0:
            if state in self.final_states:
                yield ''
            return
        for symbol in self.input_symbols:
            for word in self._words_of_length(self.transitions[state][symbol], k-1):
                yield symbol + word

For __iter__ we will need to calculate the maximum word length and iterate till we reach it. Minimum word length to match would be nice to have too but not necessary.

Should we implement __len__ for the cardinality of the language and raise a ValueError for infinite languages? It is only allowed to return non-negative integers.

eliotwrobson · 2022-10-24T04:26:54Z

@Tagl Great that you already have a starting implementation for this! The only issue we might run into is efficiency, since recursion in Python is very slow. Since this algorithm needs to store the results anyway, do you think that this could be done iteratively with a dictionary of results?

I think doing the len as the cardinality of the language is a cool idea. I think it may be better to raise a specialized exception in that case. What do you think @caleb531 ?

caleb531 · 2022-10-24T04:47:58Z

I also dig the idea of using __len__ for the cardinality. And I agree with @eliotwrobson's point that a specialized exception should be raised instead of a ValueError.

Tagl · 2022-10-24T09:55:12Z

Yes, converting to doing it iteratively is no problem at all since the transitions if the DP are simple. It was one of the optimizations I had in mind.

Tagl · 2022-10-24T18:14:11Z

Thoughts on adding __contains__ to support usages of 'text' in dfa ?

caleb531 · 2022-10-24T18:16:02Z

@Tagl I think it would make more sense for __contains__ to look for the same kind of value that's being yielded by __iter__. But not sure how useful that would be

Tagl · 2022-10-24T18:19:08Z

I personally prefer doing word in language rather than language.accepts_input(word)

eliotwrobson · 2022-10-24T18:24:42Z

@Tagl I agree, I think that word in dfa is a nice shorthand for accepts input (even if a little bit awkward semantically).

caleb531 · 2022-10-24T18:24:43Z

@Tagl Oh, I see now. Because this whole conversation is about __iter__ yielding/producing those values anyway. So yeah, that would make a lot of sense, actually.

caleb531 · 2022-10-24T18:25:58Z

@eliotwrobson @Tagl But it will be important that __contains__ makes a direct call to accepts_input rather than enumerating the iterator of words in the accepted language (because the latter could be substantially slower).

Tagl · 2022-10-25T00:12:32Z

Please take a look at the implementation. Changed to iterative, calculates the entire level for length $k$ before yielding the first word of that length.
Some repeated code max word length and is finite check. Think I'll have the logic in max word length function and then isfinite just calls it and checks if the return value is inf

Tagl · 2022-11-01T23:25:36Z

Ideally, we could start with DFAs/NFAs, and then explore if a similar algorithm is possible with PDAs and TMs.

DPDA we should be able to do. Non-determinism makes things a bit trickier.

I can implement the same kind of logic to NFA but the generation will essentially involve partial or full tracking of powersets like DFA.from_nfa does.

eliotwrobson · 2022-11-01T23:44:12Z

@Tagl If the NFA string generation involves tracking powersets, I'm not really sure that it's worth the effort to do for NFAs directly, if users can just determinize and use the existing DFA functions. It seems overall pretty impractical.

eliotwrobson · 2023-04-13T20:15:46Z

@Tagl since this issue has been resolved for FAs but we still want to do this with PDAs, would you be ok with closing this in favor of #102? I don't think this will get dropped and it makes sense to minimize the number of open issues because of this overlap.

Tagl · 2023-04-13T20:19:40Z

@Tagl since this issue has been resolved for FAs but we still want to do this with PDAs, would you be ok with closing this in favor of #102? I don't think this will get dropped and it makes sense to minimize the number of open issues because of this overlap.

Sounds good to me.

caleb531 added the enhancement label Oct 23, 2022

Tagl mentioned this issue Oct 25, 2022

Enumeration and iteration for DFA #80

Merged

eliotwrobson closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Automaton iter for enumerating accepted language? #79

Implement Automaton iter for enumerating accepted language? #79

caleb531 commented Oct 23, 2022 •

edited

Loading

eliotwrobson commented Oct 23, 2022 •

edited

Loading

Tagl commented Oct 24, 2022

Tagl commented Oct 24, 2022

eliotwrobson commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 24, 2022

Tagl commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 24, 2022

eliotwrobson commented Oct 24, 2022

caleb531 commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 25, 2022

Tagl commented Nov 1, 2022

eliotwrobson commented Nov 1, 2022

eliotwrobson commented Apr 13, 2023

Tagl commented Apr 13, 2023

Implement Automaton __iter__ for enumerating accepted language? #79

Implement Automaton __iter__ for enumerating accepted language? #79

Comments

caleb531 commented Oct 23, 2022 • edited Loading

eliotwrobson commented Oct 23, 2022 • edited Loading

Tagl commented Oct 24, 2022

Tagl commented Oct 24, 2022

eliotwrobson commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 24, 2022

Tagl commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 24, 2022

eliotwrobson commented Oct 24, 2022

caleb531 commented Oct 24, 2022

caleb531 commented Oct 24, 2022

Tagl commented Oct 25, 2022

Tagl commented Nov 1, 2022

eliotwrobson commented Nov 1, 2022

eliotwrobson commented Apr 13, 2023

Tagl commented Apr 13, 2023

Implement Automaton iter for enumerating accepted language? #79

Implement Automaton iter for enumerating accepted language? #79

caleb531 commented Oct 23, 2022 •

edited

Loading

eliotwrobson commented Oct 23, 2022 •

edited

Loading