30f35717 | 14-Apr-2025 |
cz4e <[email protected]> |
refactor(DFT): refactor `DFT` IO (#4530) |
8795ffc0 | 10-Apr-2025 |
Sam Castleberry <[email protected]> |
feat: move frontend SRAM read-write conflict handling to SRAMTemplate (#4445)
Hello, this change set is to remove the SRAM read-write conflict handling logic in the frontend, after OpenXiangShan/Uti
feat: move frontend SRAM read-write conflict handling to SRAMTemplate (#4445)
Hello, this change set is to remove the SRAM read-write conflict handling logic in the frontend, after OpenXiangShan/Utility#110 has been merged, which adds this logic to the SRAMTemplate. See that pull request and also #4242 for more context.
After this change, I see microbench IPC change 1.397 -> 1.413 and coremark IPC change 2.136 -> 2.147. The branch mispredictions also decreased slightly in both.
This probably cannot be merged automatically, since the utility submodule should point to the new revision after merging instead of the revision in my branch.
Thanks, Sam
show more ...
|
1dca281c | 03-Apr-2025 |
Haojin Tang <[email protected]> |
feat(Mbist): use ClockMux module for clock multiplexing |
602aa9f1 | 02-Apr-2025 |
cz4e <[email protected]> |
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=1`
show more ...
|
d7ff1926 | 12-Mar-2025 |
zhou tao <[email protected]> |
feat(ITTage,Tage): split ITTage SRAM and Tage SRAM (#4376) |
4b2c87ba | 27-Feb-2025 |
梁森 Liang Sen <[email protected]> |
feat(dfx): integerate dfx components (#4312) |
1eb8dd22 | 24-Feb-2025 |
Kunlin You <[email protected]> |
submodule(utility), XSDebug: support collecting missing XSDebug (#4251)
Previous in PR#3982, we support collecting XSLogs to LogPerfEndpoint.
However with --enable-log, we should also collect some
submodule(utility), XSDebug: support collecting missing XSDebug (#4251)
Previous in PR#3982, we support collecting XSLogs to LogPerfEndpoint.
However with --enable-log, we should also collect some missing XSDebug.
This change move these missing XSDebug outside WhenContext, and add
WireInit to LogUtils' apply, to enable probing some subaccessed data,
like a vec elem with dynamic index.
show more ...
|
4ba1d457 | 26-Jan-2025 |
Kunlin You <[email protected]> |
submodule(utility): introduce XSPerfLevel for performance counter (#4238)
This change introduce XSPerfLevel, including `VERBOSE`/`NORMAL`/`CRITICAL`. Only counters with level greater or equal than t
submodule(utility): introduce XSPerfLevel for performance counter (#4238)
This change introduce XSPerfLevel, including `VERBOSE`/`NORMAL`/`CRITICAL`. Only counters with level greater or equal than threhold will be instantiated, which will reduce utilization and compile time on Pallaium.
PerfLevel therhold can be set in command line, `VERBOSE` by default to apply all counters. An example usage as follows: SIM_ARGS="--perf-level CRITICAL" or PLDM_ARGS="--perf-level CRITICAL" PLDM=1
PerfLevel param is also `VERBOSE` by default, which means all counters will be ignored now if threhold greater than that. User can explicitly set params to keep some important counters instantiated, as follows: XSPerfAccumulate(xx, yy, perfLevel = XSPerfLevel.CRITICAL)
show more ...
|
cee2c096 | 10-Jan-2025 |
pengxiao <[email protected]> |
feat(SRAMTemplate): add param separateGateClock for independent RCLK and WCLK gating (#4125)
In some cases, a dual-port SRAM has only a single clock interface.
Without separate clock interfaces fo
feat(SRAMTemplate): add param separateGateClock for independent RCLK and WCLK gating (#4125)
In some cases, a dual-port SRAM has only a single clock interface.
Without separate clock interfaces for RCLK and WCLK,
the clock gating can only be achieved by logically `ren || wen` signals.
Changes:
* Add the `separateGateClock` param to allow independent gating
of RCLK and WCLK.
* When `separateGateClock` is set to true, read operations are gated
using `ren`,
and write operations are gated using `wen`.
* When `separateGateClock` is set to false, both RCLK and WCLK are gated
together using `ren || wen`, and RCLK is kept equal to WCLK.
show more ...
|
a035c20d | 02-Jan-2025 |
Yanqin Li <[email protected]> |
fix(LQUncache): fix a potential deadblock when enqueue (#4096)
**Old design**: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first.
**Bug scene:** LQUncacheBuffer is small. T
fix(LQUncache): fix a potential deadblock when enqueue (#4096)
**Old design**: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first.
**Bug scene:** LQUncacheBuffer is small. The enqueue `robIdx` of ldu0-1 is [57, 56, 55], the [57, 56] can enqueue, and [55] can not because buffer is full. 57/56 send the `NC` request after enqueuing. 55 is rollbacked. In principle, 57 and 56 need be flushed. But to ensure the correspondence between requests and responses of uncache, 57 is flushed when getting the uncache response. So when the same sequence [57, 56, 55] is coming, there is still no space to allocate 55, which causes that it is rollbacked again. Then a deadblock emerged. This bug is triggered after cutting `LoadUncacheBufferSize` from 20 to 4.
**One way to fix**: When enqueuing, it is in the order of `robIdx`, i.e. the oldest is allocated first.
show more ...
|
12c5a998 | 13-Dec-2024 |
klin02 <[email protected]> |
submodule(utility), transforms: collect XSLogs to SimTop.LogPerfEndpoint
XSLog depends on LogPerfCtrl declared at Top Module. Previous we annotate such signal as dontTouch, and accessed through Hier
submodule(utility), transforms: collect XSLogs to SimTop.LogPerfEndpoint
XSLog depends on LogPerfCtrl declared at Top Module. Previous we annotate such signal as dontTouch, and accessed through Hierarchical name like SimTop.xx by dummy LogPerfHelper.
However, as XSLog is called in many spaces in DUT, which are not visible to each other, especailly some in WhenContext. XS will generate thousands of LogPerfHelper to get same LogPerfCtrl. Too many module instantiations greately slow down compilation speed, especailly in Palladium (more than 5 times slower than same DUT without Log).
This change collect all XSLog to SimTop.LogPerfEndpoint, with LogPerfCtrl directly passed by IO. Some tips as follows:
1. Not call XSLog inside whenContext. To collect XSLogs, we should access Cond and Data from other module, but data in WhenContext is not accessible even through tap. Use XSLog(cond, pable) instead of when(cond) {XSLog(pable)}. We also add chisel Internal API currentWhen to check that.
2. Generate Hierarchical Module path through FIRRTL transforms. Sometimes we want to append module path for better debugging. XSCompatibility add a hacky way to use Chisel internal API to get tag of current Module. Then we will replace these tag with path during ChiselStage. Note path can only be acessed after circuit elaboration.
3. Register and invoke caller of XSPerf and related object. As XSPerf depends on LogPerfCtrl such as dump. We should deferred apply() until collect. So we regirster collect() method when firstly apply XSLog, then XSLog will automatically call XSPerf.collect() method during collection. Note deferred apply is called in another module, so original module tag should be recorded for path generation.
4. Concat XSLogs with same condition. Too many fwrites in same module will cause UPOPTTHREADS warning with 16-threads Verilator. Consider many XSLogs have same condition (especailly XSPerfs), we reuse same condition and concat their printables to reduce fwrites. Note we also limit size of concatation to 1000 to avoid segmentation fault caused by too long printf.
show more ...
|
d84c5151 | 10-Dec-2024 |
Yuandongliang <[email protected]> |
fix(tage): avoid read/write to the same address in the tage bt table. (#4002) |
e9e6cd09 | 27-Nov-2024 |
Yanqin Li <[email protected]> |
perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2 |
39d55402 | 19-Nov-2024 |
pengxiao <[email protected]> |
feat(frontend): add ClockGate at frontend SRAMTemplate (#3889)
* Add param `withClockGate` at SRAMTemplate
* when SRAM is single-port, use maskedClock for both array\.read\(\) and
array\.write\(\)
feat(frontend): add ClockGate at frontend SRAMTemplate (#3889)
* Add param `withClockGate` at SRAMTemplate
* when SRAM is single-port, use maskedClock for both array\.read\(\) and
array\.write\(\) to ensure single-port SRAM access.
* when SRAM is multi-port, the read and write ports of the multi-port
SRAM are gated using different clocks.
show more ...
|
85a8d7ca | 01-Nov-2024 |
Zehao Liu <[email protected]> |
feat(dbltrp) : add support for critical error (#3793) |
44467224 | 26-Sep-2024 |
Zhaoyang You <[email protected]> |
fix(csr): intermediate data should be stored when output not fire (#3634)
* Normal csr instrctions could fire by one cycle, while support IMSIC now.
* IMSIC and CSR have different clocks.
* Theref
fix(csr): intermediate data should be stored when output not fire (#3634)
* Normal csr instrctions could fire by one cycle, while support IMSIC now.
* IMSIC and CSR have different clocks.
* Therefore, CSR interacts with IMSIC through asynchronous reading.
* Implementd by fsm, and its state includes idle, waitIMSIC, finish.
* Output can fire when NewCSR requests an IMSIC response, and the
intermediate data should be stored.
---------
Co-authored-by: lewislzh <[email protected]>
show more ...
|
44f2941b | 24-Sep-2024 |
Jiru Sun <[email protected]> |
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in Utility.
See also [Pull Request in
CoupledL2](https://github.com/OpenXiangShan/CoupledL2/pull/251#discussion_r1770738535).
show more ...
|
478bf92c | 23-Sep-2024 |
Yuandongliang <[email protected]> |
fix(tage): tage bt sram read and write the same addr at the same time (#3606) |
7ff4ebdc | 19-Sep-2024 |
Tang Haojin <[email protected]> |
feat(Synchronizer): use unified AsyncResetSynchronizerShiftReg (#3609) |
b4b02e56 | 03-Sep-2024 |
xiaofeibao-xjtu <[email protected]> |
submodule(utility): bump utility (#3479) |
2f9ea954 | 06-Aug-2024 |
Tang Haojin <[email protected]> |
XSNoCTop, StandAloneDevice: add async signal handling (#3321) |
bb2f3f51 | 12-Jul-2024 |
Tang Haojin <[email protected]> |
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies th
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies them and put them in Utility repository.
show more ...
|
8f9f96d0 | 04-Jul-2024 |
Tang Haojin <[email protected]> |
ClockGate: use `VERILATOR_LEGACY` for verilator version < 5 (#3133) |
5adc4829 | 16-Jun-2024 |
Yanqin Li <[email protected]> |
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: CharlieLiu <[email protected]>
show more ...
|
d855ea69 | 07-Jun-2024 |
xiaofeibao <[email protected]> |
bump utility: fix bug of QPtrMatchMatrix |