#
1e7e38e2 |
| 22-Apr-2025 |
Anzo <[email protected]> |
chore(Parameters): remove the incorrect parameter description (#4391)
This is a misrepresentation; currently, there is only one item in the fofbuffer.
|
#
a25f1ac9 |
| 15-Apr-2025 |
Guanghui Cheng <[email protected]> |
fix(trace): fix parameters of trace (#4561)
|
#
4a02bbda |
| 15-Apr-2025 |
Anzo <[email protected]> |
fix(LSU): misalign writeback aligned raw rollback (#4476)
By convention, we need to make `rollback` and `writeback` happen at the same time, and not make `writeback` earlier than `rollback`.
Curren
fix(LSU): misalign writeback aligned raw rollback (#4476)
By convention, we need to make `rollback` and `writeback` happen at the same time, and not make `writeback` earlier than `rollback`.
Currently, the `rollback` generated by raw occurs at `s4`. A normal store would take an extra N beats after the end of s3 (based on the number of RAWQueue entries, which is now 1 beat), which is equivalent to `writeback` at `s4` And misaligned would `writeback` at `s2`, then `writeback` after switching to `s_wb` state, which is equivalent to `writeback` at `s3`
---
This pr adjusts the misaligned `writeback` logic to align with the `StoreUnit`. At the same time, it unified the way to calculate the number of beats.
show more ...
|
#
30f35717 |
| 14-Apr-2025 |
cz4e <[email protected]> |
refactor(DFT): refactor `DFT` IO (#4530)
|
#
4c0658ae |
| 04-Apr-2025 |
Tang Haojin <[email protected]> |
feat(backend): make wfi timeout configurable (#4491)
|
#
602aa9f1 |
| 02-Apr-2025 |
cz4e <[email protected]> |
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=1`
show more ...
|
#
eaf14747 |
| 06-Mar-2025 |
cz4e <[email protected]> |
fix(LoadUnit): enable EnableAccurateLoadError (#4363)
|
#
4b2c87ba |
| 27-Feb-2025 |
梁森 Liang Sen <[email protected]> |
feat(dfx): integerate dfx components (#4312)
|
#
a7904e27 |
| 24-Feb-2025 |
Anzo <[email protected]> |
fix(StoreQueue): fix threshold condition for fore write sbuffer (#4306)
Previously, `ForceWrite` was conditioned to write dead (60, 55), which no longer applies after we adjusted `StoreQueueSize`.
fix(StoreQueue): fix threshold condition for fore write sbuffer (#4306)
Previously, `ForceWrite` was conditioned to write dead (60, 55), which no longer applies after we adjusted `StoreQueueSize`.
---
Now a more reasonable parameterized setting is used. However, the conditions for optimal performance still need to be tested.
show more ...
|
#
8882eb68 |
| 21-Feb-2025 |
Xin Tian <[email protected]> |
feat(bitmap/memenc): support memory isolation by bitmap checking and memory encrpty used SM4-XTS (#3980)
- Add bitmap module in MMU for memory isolation - Add memory encryption module based on AXI p
feat(bitmap/memenc): support memory isolation by bitmap checking and memory encrpty used SM4-XTS (#3980)
- Add bitmap module in MMU for memory isolation - Add memory encryption module based on AXI protoco - Can don't using these modules by setting the option `HasMEMencryption` & `HasBitmapCheck` to false
show more ...
|
#
914bbc86 |
| 20-Feb-2025 |
xiaofeibao-xjtu <[email protected]> |
chore(dispatch): remove useless code and files (#4288)
|
#
b1d76493 |
| 27-Jan-2025 |
Tang Haojin <[email protected]> |
chore(Parameters): add zawrs, zihintntl and ziccamoa into isa string (#4219)
|
#
4ba1d457 |
| 26-Jan-2025 |
Kunlin You <[email protected]> |
submodule(utility): introduce XSPerfLevel for performance counter (#4238)
This change introduce XSPerfLevel, including `VERBOSE`/`NORMAL`/`CRITICAL`. Only counters with level greater or equal than t
submodule(utility): introduce XSPerfLevel for performance counter (#4238)
This change introduce XSPerfLevel, including `VERBOSE`/`NORMAL`/`CRITICAL`. Only counters with level greater or equal than threhold will be instantiated, which will reduce utilization and compile time on Pallaium.
PerfLevel therhold can be set in command line, `VERBOSE` by default to apply all counters. An example usage as follows: SIM_ARGS="--perf-level CRITICAL" or PLDM_ARGS="--perf-level CRITICAL" PLDM=1
PerfLevel param is also `VERBOSE` by default, which means all counters will be ignored now if threhold greater than that. User can explicitly set params to keep some important counters instantiated, as follows: XSPerfAccumulate(xx, yy, perfLevel = XSPerfLevel.CRITICAL)
show more ...
|
#
74050fc0 |
| 26-Jan-2025 |
Yanqin Li <[email protected]> |
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`?
* **Old Design**: Always **reject** the new request. * **New Desig**n: Consider **merging** requests.
## Merge Scenarios
‼️If the new one can be merge into the existing one, both need to be `NC`.
1. **New Store Request:** 1. **Existing Store:** Merge (the new store is younger). 2. **Existing Load:** Reject.
2. **New Load Request:** 1. **Existing Load:** Merge (the new load may be younger or older. Both are ok to merge). 2. **Existing Store:** Reject.
# What this PR do?
## 1. Entry Actions
1. **Allocate** a new entry and mark as `valid` 1. When there is no matching address. 2. **Allocate** a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. **Merge** into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is **not** selected to issue or issued. 4. **Reject** the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are **different**.
**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "**same attributes**" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`.
## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)`
> `mid`: master id > > `sid`: slave id
**Old Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req`, records the **`mid`**. - `S` sends a `resp` with the **`mid`**. - `M` receives the `resp` and matches it with the recorded **`mid`**.
**New Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the **`mid`** and updates its record with the received **`sid`**. - `S` sends a `resp` with the its **`sid`**. - `M` receives the `resp` and matches it with the recorded **`sid`**.
**Benefit:** The new design allows `S` to merge requests when new request enters.
## 3. Forwarding Mechanism
**Old Design:** Each address in the `ubuffer` is **unique**, so forwarding is straightforward based on a match.
**New Design:**
* A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one.
## 4. Bug Fixes
1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "**able to send requests**" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`.
<img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" />
# Evaluation
- ✅ timing - ✅ performance
| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO | | -------------- | ------- | ----------- | ------- | ----------- | | IO | 51026 | 1 | 208149 | 1.00 | | NC | 42343 | 1.21 | 169248 | 1.23 | | NC+OT | 20379 | 2.50 | 160101 | 1.30 | | NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 | | cache | 1298 | 39.31 | 4410 | 47.20 |
show more ...
|
#
5bd65c56 |
| 14-Jan-2025 |
Tang Haojin <[email protected]> |
feat(Config): add yaml parser for complicated parametrization (#4147)
This commit enables complicated parameterization by yaml parsing. We use circe to do this.
In this commit, we implement 6 confi
feat(Config): add yaml parser for complicated parametrization (#4147)
This commit enables complicated parameterization by yaml parsing. We use circe to do this.
In this commit, we implement 6 configurations:
- PmemRanges: physical memory ranges - PMAConfigs - CHIAsyncBridge: set depth to 0 to disable it - L2CacheConfig - L3CacheConfig - DebugModuleBaseAddr
For better human-readability, this commit changes `WithNKBL2/3` to `L2/3CacheConfig`, changing to case classes, and making the first parameter only accept human-readable size configuration like `0.5 MB` or `256kB`.
This commit also changes PMAConfigs and PmemRanges into List of case classes.
show more ...
|
#
552d2d4e |
| 06-Jan-2025 |
Tang Haojin <[email protected]> |
chore(Parameters): add svnapot extension string (#4133)
|
#
6c106319 |
| 30-Dec-2024 |
xu_zh <[email protected]> |
feat(ICache): ECC error injection (#4044)
This PR is part of *RAS(Reliability, Accessibility, Serviceability)* error recovery features.
- Add a series of mmio-mapped CSR to control ICache ECC check
feat(ICache): ECC error injection (#4044)
This PR is part of *RAS(Reliability, Accessibility, Serviceability)* error recovery features.
- Add a series of mmio-mapped CSR to control ICache ECC check & ECC inject features - Implement ICache ECC injection - M-state software can write `eccctrl` to trigger error injection to meta/dataArray, next read can trigger auto-recovery (implemented in #3899) - Remove custom CSR `Sfetchctl`
# Details ## CSR The base address of the added mmio-mapped CSR is `0x38022080` and the registers is defined as below: ``` 64 10 7 4 2 1 0 0x00 eccctrl | WARL | ierror | istatus | itarget | inject | enable |
64 PAddrBits-1 0 0x08 ecciaddr | WARL | paddr | ``` | CSR | field | desp | | --- | --- | --- | | eccctrl | enable | ECC check enable | | eccctrl | inject | ECC inject enable (write 1 to trigger injection, read always 0) | | eccctrl | itarget | ECC inject target<br>0: metaArray<br>1: rsvd<br>2: dataArray<br>3: rsvd | | eccctrl | istatus | ECC inject status (read-only)<br>0: idle: inject controller idle, goes to working when received a inject request (i.e. write 1 to eccctrl.inject)<br>1: working: inject controller working, goes to injected when finished / error when failed<br>2: injected, goes to idle after read<br>3: rsvd<br>4: rsvd<br>5: rsvd<br>6: rsvd<br>7: error: inject failed (check eccctl.ierror for reason), goes to idle after read | | eccctrl | ierror | ECC error reason (read-only, valid only if `eccctrl.istatus==error`)<br>0: ECC check is not enabled (i.e. `!eccctrl.enable`)<br>1: inject target invalid (i.e. `eccctrl.itarget==rsvd`)<br>2: inject addr (i.e. `ecciaddr.paddr`) not in ICache<br>3: rsvd<br>4: rsvd<br>5: rsvd<br>6: rsvd<br>7: rsvd | | ecciaddr | paddr | Physical address of the inject target |
## Inject method ```asm $INJECT_ADDR: # maybe do something else ret
test: la t0, $BASE_ADDR # load icache control base addr la t1, $INJECT_ADDR # load inject addr jalr ra, 0(t1) # jump to injected addr to load it i sd t1, 8(t0) # set inject addr la t2, (target << 2 | 1 << 1 | 1 << 0) # load inject target & inject enable & ecc enable sd t1, 0(t0) # set inject enable & ecc enable loop: ld t1, 0(t0) # get ecc control state andi t1, t1, (0b11 << (4+1)) # get high bits of inject state beqz t1, loop # if is idle, or working, loop
addi t1, t1, -1 # t1 = inject_state[2:1] - 1 bnez t1, error # if is not injected, error or rsvd
jalr ra, 0(t1) # jump to injected addr to trigger error j finish
error: # handle error finish: # finish ``` Or, checkout https://github.com/OpenXiangShan/nexus-am/pull/48
show more ...
|
#
17386530 |
| 25-Dec-2024 |
Anzo <[email protected]> |
fix(LoadQueueRAR): aligning the size of `RARSize` to `VLQSize` (#4086)
For vectors, if the size of the `RAR` is not equal to the `VLQ`, it can lead to a jam. This is because:
A `uop` of a vector sp
fix(LoadQueueRAR): aligning the size of `RARSize` to `VLQSize` (#4086)
For vectors, if the size of the `RAR` is not equal to the `VLQ`, it can lead to a jam. This is because:
A `uop` of a vector splits into multiple Load operations and occupies multiple `VLQ`. The release condition of the `RAR` is that the `VLQ` are dequeue, whereas the vector requires that all Load operations of the `uop` are written back to the `MergeBuffer` before the `VLQ` can be let out of the queue.
Therefore, it may happen that when the deqptr of `VLQ` waits for vector `uop` to write back all, but the `RAR` is already full, the Load operation split by vector `uop` can not enter the `RAR`, so it will wait in the `ReplayQueue` for the `RAR` to be non-full, and the `RAR` can not be released because `VLQ` can not get dequeue, which leads to deadlock.
show more ...
|
#
452b5843 |
| 19-Dec-2024 |
Huijin Li <[email protected]> |
power(MemBlock): power optimization in MemBlock (#4059)
power optimization: (1) use “withClockGate” instead of ClockGate in DCache (2) reduce LSQ entries
|
#
4e7f9e52 |
| 16-Dec-2024 |
xiaofeibao <[email protected]> |
fix(dispatch): fix bug of index vld instr, each uop can be index vld instr
|
#
49f2b250 |
| 10-Dec-2024 |
xiaofeibao <[email protected]> |
timing(backend): each IQ has at least two simple entries
|
#
c22ffc80 |
| 29-Nov-2024 |
xiaofeibao <[email protected]> |
area(backend): reduce a vfcvt for better area
|
#
0d50d631 |
| 29-Nov-2024 |
xiaofeibao <[email protected]> |
Revert "area(Backend): reduce VfScheduler iq num from 3 to 2 and remove a vfcvt fu"
This reverts commit 10b44fa68ead2a8d79ce215b6bb116912f72f3a4.
|
#
11cc8561 |
| 26-Nov-2024 |
xiaofeibao <[email protected]> |
area(Backend): reduce VfScheduler iq num from 3 to 2 and remove a vfcvt fu
|
#
8c6ac5eb |
| 22-Nov-2024 |
xiaofeibao <[email protected]> |
area(backend): reduce 4 fexu to 3 fexu
|