Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Automaton __iter__ for enumerating accepted language? #79

Closed
caleb531 opened this issue Oct 23, 2022 · 17 comments
Closed

Implement Automaton __iter__ for enumerating accepted language? #79

caleb531 opened this issue Oct 23, 2022 · 17 comments

Comments

@caleb531
Copy link
Owner

caleb531 commented Oct 23, 2022

@eliotwrobson So with all the recent discussion around automata ultimately being representations of finite languages, it got me thinking:

What if we implemented an __iter__ method that produced each string in that automaton's accepted language? My thought it we could generate one string at a time via yield, and leave it up to the user if they want to convert the automaton to a list or set. Ideally, we could start with DFAs/NFAs, and then explore if a similar algorithm is possible with PDAs and TMs.

How difficult do you think this would be to implement from an algorithmic perspective? Initially, my thought is generating all permutations of the machine's input symbols, then filter that set down to only those accepted via accepts_input.

Although the number of iterations could be enormous, so maybe we could filter that set further by analyzing transition symbols from the initial state (first character) and final states (possible last characters). But I'm curious if there is an approach that isn't so brute-force.

@eliotwrobson
Copy link
Collaborator

eliotwrobson commented Oct 23, 2022

@caleb531 for DFA/NFAs, I think there are algorithms for this based on treating the transitions as a graph and doing graph search (or similar algorithms). I'll have to look through some books to see if they explicitly describe something like this, but here's a stackoverflow post describing something similar:
https://stackoverflow.com/questions/33058142/optimal-algorithm-to-count-the-number-of-strings-a-dfa-accepts

As for the other automation, I'm not sure about PDAs, but for TMs even checking whether a TM accepts any string (let alone enumeration of all of them) is undecidable (equivalent to the halting problem).

EDIT: Now that I think about it, this problem boils down to traversing edges of a graph in lexicographic order and outputting the string corresponding to the current path. Then on the next yield statement, compute the next such string (like BFS but no need for a visited set).

@Tagl
Copy link
Contributor

Tagl commented Oct 24, 2022

I think I have some code for this already written. I'll take a look tomorrow.

@Tagl
Copy link
Contributor

Tagl commented Oct 24, 2022

Decided to look at it before I go to bed.
So this is essentially just a dynamic programming problem.
We can define recursive functions and then we just have to store results in memory to cut our time complexity down to $\mathcal{O}(nk)$, and raising space complexity to the $\mathcal{O}(nk)$, where $n$ is the number of states and $k$ is the length of words.
Of course for generating the words instead of just counting them we have to account for constructing each string as well.
This will add at least a factor of $k$ to the time complexity and a factor of $k$ to space complexity, but state space remains $nk$.
I'll finish the code and optimize after work tomorrow but here's essentially the logic:

    def count_words_of_length(self, k):
        """
        Counts the number of words of length k in the language represented by the DFA
        """
        return self._count_words_of_length(self.initial_state, k)

    def _count_words_of_length(self, state, k):
        """
        Counts words of length k assuming state is the initial state
        """
        if k == 0:
            return 1 if state in self.final_states else 0
        return sum(self._count_words_of_length(self.transitions[state][symbol], k-1) for symbol in self.input_symbols)

    def words_of_length(self, k):
        """
        Generates all words of length k in the language represented by the DFA
        """
        for word in self._words_of_length(self.initial_state, k):
            yield word

    def _words_of_length(self, state, k):
        """
        Generator for accepted words of length k assuming state is the initial state
        """
        if k == 0:
            if state in self.final_states:
                yield ''
            return
        for symbol in self.input_symbols:
            for word in self._words_of_length(self.transitions[state][symbol], k-1):
                yield symbol + word

For __iter__ we will need to calculate the maximum word length and iterate till we reach it. Minimum word length to match would be nice to have too but not necessary.

Should we implement __len__ for the cardinality of the language and raise a ValueError for infinite languages? It is only allowed to return non-negative integers.

@eliotwrobson
Copy link
Collaborator

@Tagl Great that you already have a starting implementation for this! The only issue we might run into is efficiency, since recursion in Python is very slow. Since this algorithm needs to store the results anyway, do you think that this could be done iteratively with a dictionary of results?

I think doing the len as the cardinality of the language is a cool idea. I think it may be better to raise a specialized exception in that case. What do you think @caleb531 ?

@caleb531
Copy link
Owner Author

I also dig the idea of using __len__ for the cardinality. And I agree with @eliotwrobson's point that a specialized exception should be raised instead of a ValueError.

@Tagl
Copy link
Contributor

Tagl commented Oct 24, 2022

Yes, converting to doing it iteratively is no problem at all since the transitions if the DP are simple. It was one of the optimizations I had in mind.

@Tagl
Copy link
Contributor

Tagl commented Oct 24, 2022

Thoughts on adding __contains__ to support usages of 'text' in dfa ?

@caleb531
Copy link
Owner Author

@Tagl I think it would make more sense for __contains__ to look for the same kind of value that's being yielded by __iter__. But not sure how useful that would be

@Tagl
Copy link
Contributor

Tagl commented Oct 24, 2022

I personally prefer doing word in language rather than language.accepts_input(word)

@eliotwrobson
Copy link
Collaborator

@Tagl I agree, I think that word in dfa is a nice shorthand for accepts input (even if a little bit awkward semantically).

@caleb531
Copy link
Owner Author

@Tagl Oh, I see now. Because this whole conversation is about __iter__ yielding/producing those values anyway. So yeah, that would make a lot of sense, actually.

@caleb531
Copy link
Owner Author

@eliotwrobson @Tagl But it will be important that __contains__ makes a direct call to accepts_input rather than enumerating the iterator of words in the accepted language (because the latter could be substantially slower).

@Tagl
Copy link
Contributor

Tagl commented Oct 25, 2022

Please take a look at the implementation. Changed to iterative, calculates the entire level for length $k$ before yielding the first word of that length.
Some repeated code max word length and is finite check. Think I'll have the logic in max word length function and then isfinite just calls it and checks if the return value is inf

@Tagl
Copy link
Contributor

Tagl commented Nov 1, 2022

Ideally, we could start with DFAs/NFAs, and then explore if a similar algorithm is possible with PDAs and TMs.

DPDA we should be able to do. Non-determinism makes things a bit trickier.

I can implement the same kind of logic to NFA but the generation will essentially involve partial or full tracking of powersets like DFA.from_nfa does.

@eliotwrobson
Copy link
Collaborator

@Tagl If the NFA string generation involves tracking powersets, I'm not really sure that it's worth the effort to do for NFAs directly, if users can just determinize and use the existing DFA functions. It seems overall pretty impractical.

@eliotwrobson
Copy link
Collaborator

@Tagl since this issue has been resolved for FAs but we still want to do this with PDAs, would you be ok with closing this in favor of #102? I don't think this will get dropped and it makes sense to minimize the number of open issues because of this overlap.

@Tagl
Copy link
Contributor

Tagl commented Apr 13, 2023

@Tagl since this issue has been resolved for FAs but we still want to do this with PDAs, would you be ok with closing this in favor of #102? I don't think this will get dropped and it makes sense to minimize the number of open issues because of this overlap.

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants