forked from sizhuo-zhang/RiscyOO_design_doc
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathalu_exe_pipe.tex
97 lines (88 loc) · 5.98 KB
/
alu_exe_pipe.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
\subsection{ALU Execution Pipeline}\label{sec:alu-exe-pip}
The ALU execution pipeline executes ALU and branch instructions (including \inst{CSRRW} instructions).
To sustain peak throughput for ALU instructions with back-to-back dependencies, the pipeline contains bypass logic and wakes up dependent instructions aggressively.
\subsubsection{Interface}
\begin{figure}
\begin{lstlisting}[caption={}]
interface AluExeInput;
method RegsReady sbCons_lazyLookup(PhyRegs r);
method Data rf_rd1(PhyRIndx rindx);
method Data rf_rd2(PhyRIndx rindx);
method Data csrf_rd(CSR csr);
method Addr rob_getPC(InstTag t);
method Addr rob_getPredPC(InstTag t);
method Action rob_setExecuted(InstTag t, Maybe#(Data) csrData, ControlFlow cf);
method Action fetch_train_predictors(FetchTrainBP train);
method Action setRegReadyAggr(PhyRIndx dst);
interface Vector#(2, SendBypass) sendBypass;
method Action writeRegFile(PhyRIndx dst, Data data);
method Action redirect(Addr new_pc, SpecTag spec_tag, InstTag inst_tag);
method Action correctSpec(SpecTag t);
method Bool doStats;
endinterface
interface AluExePipeline;
interface Vector#(TMul#(2, AluExeNum), RecvBypass) recvBypass;
interface ReservationStationAlu rsAluIfc;
interface SpeculationUpdate specUpdate;
method Data getPerf(ExeStagePerfType t);
endinterface
module mkAluExePipeline#(AluExeInput inIfc)(AluExePipeline);
// implementation
endmodule
\end{lstlisting}
\caption{Interface of ALU execution pipeline}\label{fig:alu-exe-pipe-ifc}
\end{figure}
Figure~\ref{fig:alu-exe-pipe-ifc} shows the interface of the ALU execution pipeline.
Interface \code{AluExeInput} is the input argument to the module.
It contains the following fields:
\begin{itemize}
\item Methods \code{sbCons\_lazyLookup}, \code{rf\_rd1} and \code{rf\_rd2}: read the conservative scoreboard and physical register file.
\item Method \code{csrf\_rf}: reads the CSR register file.
This is called by the module in case of the \inst{CSRRW} instruction.
\item Methods \code{rob\_getPC} and \code{rob\_getPredPC}: retrieve PC and predicted PC (by the fetch pipeline), respectively, from the corresponding ROB entry.
\item Method \code{rob\_setExecuted}: sets the ROB entry as executed, so the instruction can be committed if it is the oldest in ROB.
\item Method \code{fetch\_train\_predictors}: trains the branch predictors in the fetch pipeline.
\item Method \code{setRegReadyAggr}: wakes up dependent instructions in all the reservation stations, and sets the aggressive scoreboard.
\item Subinterface \code{sendBypass}: is called by the module to send out data bypassing to other execution pipelines and itself.
\item Method \code{writeRegFile}: writes the physical register file.
\item Method \code{redirect}: redirects the fetch pipeline, increments the epoch in the epoch manager, and calls the global speculation updater to kill wrong-path instructions (i.e., call the \code{incorrectSpeculation} method of every module).
This method is called by the ALU execution pipeline in case of branch mispreditions.
\item Method \code{correctSpec}: calls the global speculation updater to release a speculation tag (i.e., call the \code{correctSpeculation} method of every module).
This method is called by the ALU execution pipeline in case of correct branch preditions.
\item Method \code{doStats}: returns whether performance counters should be incremented or not.
\end{itemize}
The module interface \code{AluExePipeline} contains the following fields:
\begin{itemize}
\item Subinterface \code{recvBypass}: receives the data bypassing sent by other execution pipelines and itself.
That is, interface \code{sendBypass} in the module argument will call this interface.
\item Subinterface \code{rsAluIfc}: returns the reservation station inside the pipeline.
\item Subinterface \code{specUpdate}: manipulates speculative states (Section~\ref{sec:specupdate}).
\item Method \code{getPerf}: is for querying performance counters.
\end{itemize}
\subsubsection{Implementation}
\begin{figure}
\centering
\includegraphics[width=\columnwidth]{fig/alu_exe_crop.pdf}
\caption{Internal implementation of ALU execution pipeline}\label{fig:alu-exe-pipe-impl}
\end{figure}
Figure~\ref{fig:alu-exe-pipe-impl} shows the internal implementation of the ALU execution pipeline.
All the pipeline stages are connected by speculation FIFOs (Section~\ref{sec:specfifo}).
Sending and receiving data bypassing are done by using \emph{wires}.
The pipeline contains the following four internal rules:
\begin{itemize}
\item Rule \code{doDispatchAlu}: retrieves a ready instruction from the reservation station.
\item Rule \code{doRegReadAlu}: first checks data bypassing and then checks the conservative scoreboard and physical register if forwarding is not available.
The rule will not fire if a source register is not available.
\item Rule \code{doExeAlu}: executes the instruction by performing the ALU operation or calculating the next PC.
The result for the destination register is sent to a wire which will be exported as the \code{sendBypass} interface.
\item Rule \code{doFinishAlu}: marks the ROB entry as executed, and sends data bypassing to a wire.
In case of a branch, either \code{correctSpec} or \code{redirect} will be called.
To avoid conflict with other rules, this rule is split into two based on whether a branch is mispredicted or not.
The common case without mispredictions will not conflict with other common rules.
\end{itemize}
The use of wires for bypassing does not affect correctness, because the \code{doRegReadAlu} rule will not fire until the source-register values are available.
\subsubsection{Future Improvement}
We should remove the wires for data bypassing.
An alternative way is to add a bypass method to every pipeline-stage FIFO, and sending data bypassing is directly calling this method of all the relevant pipeline-stage FIFOs.
\subsubsection{Source Code}
See module \code{mkAluExePipeline} in \code{//procs/RV64G\_OOO/AluExePipeline.bsv}.