At DAC this year I had a lot of fun doing a live experiment to demonstrate some of the benefits and issues with concurrent design flows. I was at the Cadence Theatre doing a presentation called 'Controlling the costs of SoC integration' and I decided to make the presentation more interactive by creating a design team and seeing some of the effects of getting this team to work concurrently. We demonstrated how a little 'twist' caused a big upset for to team deliveries!
The topic I introduced first was how system design flows are now highly concurrent. In the production of a system within a very tight timescale, it would be normal to have architecture definition, software development, virtual prototype development, RTL design and verification all happening at the same time, be it IP, sub-system or SoC level design. I represented this as a set of rotating, interacting cogs.
Having the teams work concurrently means that the product can be delivered in a compressed timescale. However, there are some downsides to this process. If any of the cogs lock, the whole process is disrupted. This implies high levels of dependency and many things are on the critical path. In the animation it was clear that information was flowing around and across the cogs and this was where I was highlighting a major weakness - If the information flow is not fully automated (manual processes), then this could have severe consequences for the design flow. I decided to conduct an experiment to prove this point.
The goal of the experiment was to mimic a concurrent chip development. I wanted to get a system development team from the audience :
- An architect
- A hardware design engineer
- A verification engineer
- A virtual prototype engineer
- An embedded software developer
I managed to get 5 people (with the promise of a 16GB Memory stick ) and I volunteered as the 6th member of the team - the project manager. The experiment was to complete a specific HW/SW implementation and integration task across the different design teams. The focus was on the HW/SW interface as it is common to all of these teams. I brought along 4 copies of an ARM UART primecell specification. The architect needed to publish (hand out) the specifications to other team members. These team members had to independently implement a single piece of information in these specifications, come together and agree that all implementations were aligned. I highlighted the piece of the specification to be implemented and where to find it in the specification. The 'implementation' was simply to write down this single piece of information which was as follows
The reset value of the UARTCR register in Table 3.1 of Chapter 3.2. Presented as follows:
Now for the fun bit: As project manager I gave the team a schedule to complete the HW/SW integration as follows:
- The Architect had 5 seconds to hand out all the specs
- Each of the implementation teams had 10 seconds to 'implement' the specs (Write down the value)
- The teams then had 10 seconds to agree alignment
- As project manager, I added 5 seconds contingency
.. which I visually represented it as follows:
This was a total of 30 seconds. I got good buy-in from my teams and was ready to start the clock. First though, I had to separate two team members who were sitting close to each other in order to simulate geographically dispersed teams :) I then gave the teams a countdown; shouted 'GO!' and started the stopwatch.
Releasing the specifications. The first thing to do was fsor the architect to hand out the specifications: And here he is, releasing the specs :
The implementation seemed to be delayed slightly as each team member started looking for the correct piece of information. I called out the chapter, table and register name (I also had the register circled on the specs)
You could see the benefit of working concurrently as these 4 teams were working independently and so I shouldn't expect to wait a lot of time. It took however 14 seconds to finish out the implementation at which time I was now complaining that my project has a 10%-15% slip and I wasn't happy. I asked the teams to get together quickly and agree that their implementations were aligned - and to hurry up as the project was already critically late.
After about 8 seconds of the integration phase, I could tell something was up. There was a lot of shaking of heads and a lot of finger pointing. I heard someone say that there were incorrect values in the software, whilst someone else was pointing at the RTL design. Some of the team went back to their chairs and return with the specs to prove their point. Time passed and at this stage, as project manager I was getting exasperated with the schedule slip and 'demanded' to know what was wrong!
"It seems as if someone has a different version of the spec" , I was told by the Virtual Prototype engineer.
'Really?', I asked sarcastically, shaking my head to the audience as if nothing like this had ever happened in projects before. My team looked on not knowing what to do. I said, quite curtly, 'Well use the latest version of the spec!' They looked at version numbers and dates and finally they aligned on the correct values for the register. I thanked the team for their input and sent them back to their seats.
The team's obeservation was correct, there were two different versions of the UART specification in play (PL010 and PL011) . In one specification the reset value was 0x000 and in the other is was 0x300. The effect on my project was devastating- from spec to alignment it took 136 seconds instead of the predicated 30 seconds over a 4x slip in the project schedule. I presented an example slippage and asked the audience to consider that the timescale was days, not seconds and this seemed to show the gravity of misaligned teams working concurrently. (Slides here show a slippage of 18 whereas it was really 106)
At this stage I introduced SID, the 'insidious' bug that can be very prevalent around manual processes and that can actually very quickly contaminate these types of concurrent design flows.
In this experiment SID was lurking behind team misalignments. There weren't any real implementation bugs but when it came to the misalignments, implementation bugs were raised (e.g. the RTL implementation was deemed to be wrong). For document-driven design processes I showed the types of bugs that contaminate the concurrent design flows and effect design quality:
- Specification bugs: Bugs contained in the specifications themselves
- Interpretation bugs: Bugs introduced into an implementation by misinterpreting the spec
- Translation bugs: Bugs involves in translating from a specification to a specific implementation
- Synchronization bugs: Bugs where teams misaligned.
All of these impact on quality and as seen with the experiment can have serious problems in integration schedule and costs.
The proposed solution is ultimately more automation in the front-end of the design flow. The main focus is on the transformation of paper-based specification to machine-readable or executable specifications and the automation of these specifications into the different implementation process. This essentially eliminates the aforementioned types of bugs.
This executable specification not only improves quality and synchronization but provides immediate turn-around-time for spec changes thus increasing productivity.
I gave an example of Duolog's Socrates-Bitwise which can be considered an executable specification of HW/SW interface registers. With Socrates-Bitwise, a user inputs information in a GUI (or imports txt/xml formats) . Coherency checks are run on the specification to ensure all data is coherent. From this specification many different formats can be generated automatically, including documentation, RTL , UVM SystemVerilog, SystemC and C API.
So does this make a difference? Absolutely - for something like the creation of a UART Primecell IP, the graph below the BLUE shows the percentage of Design collateral that is consumed with register implementation.
By automating this blue portion from an executable specification like Socrates-Bitwise we see huge productivity gains, immediate turn-around times for incremental flows and ZERO bugs as these implementations are all aligned to a single source. Automating this at IP and SoC levels has a significant impact on the overall costs of SoC development.
- Thanks to the audience members, my team for 3 minutes, who helped me on the day!
- Thanks to Joseph Hupcey III, Cadence who allowed me to use his photos in this blog