Debugging a Memory blow-up with SystemVerilog
Srinivasan Venkataramanan, CVC Pvt. Ltd.
Ramanathan S, Cisco
We, just like many other readers of this blog code to make our living. We create, maintain, debug complex SystemVerilog based Verification environments for various design blocks. At CVC one of our specializations is to listen to the tough problems, work with end user to arrive at a solution. Not all the time we get to immediate/magic solutions – we do not claim to have solutions for every problem, but we thrive on such challenges. Specifically Debug is an area that fascinates us much more than anything else! We listen to horror stories of tough debug problems, suggest means of isolating the issues, and suggest better debug hooks for future. We are passionate about emerging technologies and better ways of debug as they evolve. Here is one of them, using VCS’s Aspect Oriented Extensions (AOE) to SystemVerilog to debug a Memory blow-up in SV-TB code.
There was a memory-blow up in one of the recently created (read it as “not-well-debugged-yet”) verification environments. It is a typical first-generation SystemVerilog TB, plain language is being used without any bells and whistles of VMM and the likes. There are several customers (internally) for this environment and as with any other customer, every now and then they have a new requirement. The original requirement was for a typical constrained random environment and that’s what was developed (no excuses intended, just plain fact). A new customer started using it for stress testing by sending 10K+ transactions and there was a memory blow-up during the simulation. There came the SOS to the “TB Developer” – hey that’s me L
A quick background on the environment – it had a complex mix of SystemC reference models, SystemVerilog generator, monitor and checkers and some custom C-code integrated as well. The moment the team heard of MemLeak the usual suspects of C/SystemC/HDL boundary, dangling pointers etc. were suggested. Tried using DVE-CBug (C-debugger, first suggestion from Srini) and other available stuff – not so lucky my boy L More debug on the way and it was SNUG India time, didn’t want to miss it, but had to deliver on “Karma first”!
After several hours of trails, the SV-TB code was isolated to be the culprit. Applying some common sense and thought process the suspicion pointed towards dynamic memory consumption. With some more debug, few Mailboxes at the monitor side were identified as the cause of the blow-up. Few of them (2 to be precise) were recently added (for enhanced checking purposes) with their consumers (i.e. checkers) still maturing (under version control) and hence unavailable to end customer. So these turned out to be without any sinks/consumers. While the typical random tests didn’t show up the issue, with stress tests with large packets this became the bottleneck!
The Mailboxes were constructed in default fashion as:
[cpp]class new_mon;
mailbox out_mbx;
function new();
this.out_mbx = new();
endfunction : new
// ..
endclass : new_mon
[/cpp]
Spot anything wrong above? The Mailbox is UNSIZED and hence never blocked. With no “consumer” the monitor pushed packets into this Mailbox tirelessly leading to an eventual memory blow-up.
Gobsmacked! Fix wasn’t that hard, is it? But since the original code didn’t use any base class for the mailbox, had to struggle hard to isolate the issue and this ate up our SNUG India time L
As seasoned Verification professionals, we brainstormed on potential improvements, shortening debug cycles in the future (given that a whole sale change is not likely in the near future) etc. One option is to use SIZED mailboxes, if applicable. i.e. use:
[cpp]function new();
// parameter MBOX_SIZE = 100;
this.out_mbx = new(MBOX_SIZE);
endfunction : new
[/cpp]
A better, scalable option is to use a wrapper class around plain SV Mailbox. That would then allow AOE to aid in debug around the “put”.
Interestingly, we addressed this exact topic in our VMM adoption book (http://www.systemverilog.us/vmm_adoption) in Chapter-8 Advanced Topics (Section 8.3). Let’s do some reality check – first and foremost, this is bad code, let us admit it to ourselves. It is obvious in hindsight. We should have done possibly one of the following:
· Used sized mailboxes
· Used a wrapper class around plain SV Mailbox to aid in debug around the “put”
· Even better, used a VMM_Channel, with tons of built-in features to aid in such cases (will be able to debug if not do it correct by construction).
Refer to Chapter-2 http://www.systemverilog.us/vmm_adoption/vmm_isbn_0970539495.pdf
· Added an “artificial sink” for every such monitor (Something that we at CVC recommend during our regular SV/VMM trainings & consulting).
Also, with VMM 1.2, there are enhanced features built-in to navigate around the whole environment, locate all channels, and probe them at will – all without touching the existing code! It will be fun to debug such a scenario again in near future with such sophistication.
More about it in a near future blog entry!
Comments
Post a Comment