Minimal DFA
For each regular language, there also exists a minimal automaton that accepts it, that is, a DFA with a minimum number of states and this DFA is unique (except that states can be given different names). The minimal DFA ensures minimal computational cost for tasks such as pattern matching. There are two classes of states that can be removed or merged from the original DFA without affecting the language it accepts. * Unreachable states are the states that are not reachable from the initial state of the DFA, for any input string. These states can be removed. * Dead states are the states from which no final state is reachable. These states can be removed unless the automaton is required to be complete. * Nondistinguishable states are those that cannot be distinguished from one another for any input string. These states can be merged. DFA minimization is usually done in three steps: # remove dead and unreachable states (this will accelerate the following step), # merge nondistinguishable states, # optionally, re-create a single dead state ("sink" state) if the resulting DFA is required to be complete.Unreachable states
The state of a deterministic finite automaton is unreachable if no string in exists for which . In this definition, is the set of states, is the set of input symbols, is the transition function (mapping a state and an input symbol to a set of states), is its extension to strings (also known as extended transition function), is the initial state, and is the set of accepting (also known as final) states. Reachable states can be obtained with the following algorithm:new_states
) and operations on them (such as adding a state or checking whether it is present), this algorithm can be implemented with time complexity , where is the number of states and is the number of transitions of the input automaton.
Unreachable states can be removed from the DFA without affecting the language that it accepts.
Nondistinguishable states
The following algorithms present various approaches to merging nondistinguishable states.Hopcroft's algorithm
One algorithm for merging the nondistinguishable states of a DFA, due to , is based on partition refinement, partitioning the DFA states into groups by their behavior. These groups representif
statement (if Y is in W
) is to patch up ''W'', the set of distinguishers. We see in the previous statement in the algorithm that ''Y'' has just been split. If ''Y'' is in ''W'', it has just become obsolete as a means to split classes in future iterations. So ''Y'' must be replaced by both splits because of the Observation above. If ''Y'' is not in ''W'', however, only one of the two splits, not both, needs to be added to ''W'' because of the Lemma above. Choosing the smaller of the two splits guarantees that the new addition to ''W'' is no more than half the size of ''Y''; this is the core of the Hopcroft algorithm: how it gets its speed, as explained in the next paragraph.
The worst case running time of this algorithm is , where is the number of states and is the size of the alphabet. This bound follows from the fact that, for each of the transitions of the automaton, the sets drawn from that contain the target state of the transition have sizes that decrease relative to each other by a factor of two or more, so each transition participates in of the splitting steps in the algorithm. The partition refinement data structure allows each splitting step to be performed in time proportional to the number of transitions that participate in it. This remains the most efficient algorithm known for solving the problem, and for certain distributions of inputs its Moore's algorithm
Moore's algorithm for DFA minimization is due to . Like Hopcroft's algorithm, it maintains a partition that starts off separating the accepting from the rejecting states, and repeatedly refines the partition until no more refinements can be made. At each step, it replaces the current partition with the coarsest common refinement of partitions, one of which is the current one and the rest of which are the preimages of the current partition under the transition functions for each of the input symbols. The algorithm terminates when this replacement does not change the current partition. Its worst-case time complexity is : each step of the algorithm may be performed in time using a variant of radix sort to reorder the states so that states in the same set of the new partition are consecutive in the ordering, and there are at most steps since each one but the last increases the number of sets in the partition. The instances of the DFA minimization problem that cause the worst-case behavior are the same as for Hopcroft's algorithm. The number of steps that the algorithm performs can be much smaller than , so on average (for constant ) its performance is or even depending on theBrzozowski's algorithm
Reversing the transitions of a non-deterministic finite automaton (NFA) and switching initial and final statesIn case there are several final states in ''M'', we either must allow multiple initial states in the reversal of ''M''; or add an extra state with ε-transitions to all the initial states, and make only this new state initial. produces an NFA for the reversal of the original language. Converting this NFA to a DFA using the standard powerset construction (keeping only the reachable states of the converted DFA) leads to a DFA for the same reversed language. As observed, repeating this reversal and determinization a second time, again keeping only reachable states, produces the minimal DFA for the original language. The intuition behind the algorithm is this: determinizing the reverse automaton merges states that are nondistinguishable in the original automaton, but may merge also states that should ''not'' be merged (i.e., are not merged in the minimal DFA). In such case, after we reverse the automaton for the second time, it may not be deterministic. That is why we need to determinize it again, obtaining the minimal DFA.Proof of correctness
After we determinize to obtain , we reverse this to obtain . Now recognises the same language as , but there's one important difference: there are no two states in from which we can accept the same word. This follows from being deterministic, viz. there are no two states in that we can reach from the initial state through the same word. The determinization of then creates powerstates (sets of states of ), where every two powerstates differ ‒ naturally ‒ in at least one state of . Assume and ; then contributes at least one wordRecall there are no dead states in ''M''Complexity
The worst-case complexity of Brzozowski's algorithm is exponential in the number of states of the input automaton. This holds regardless of whether the input is a NFA or a DFA. In the case of DFA, the exponential explosion can happen during determinization of the reversal of the input automaton;For instance, the language of binary strings whose th symbol is a one requires only states, but its reversal requires states. provides a ternary -state DFA whose reversal requires states, the maximum possible. For additional examples and the observation of the connection between these examples and the worst-case analysis of Brzozowski's algorithm, see . in the case of NFA, it can also happen during the initial determinization of the input automaton.Exponential explosion will happen at most once, not in both determinizations. That is, the algorithm is at worst exponential, not doubly-exponential. However, the algorithm frequently performs better than this worst case would suggest.NFA minimization
While the above procedures work for DFAs, the method of partitioning does not work for non-deterministic finite automata (NFAs)., Section 4.4, Figure labeled "Minimizing the States of an NFA", p. 163. While an exhaustive search may minimize an NFA, there is noSee also
* State encoding for low powerNotes
References
*. * *. *. *. *. See also preliminary versionExternal links