A Case Study of a logic simulation environment using pcieVHost and Third Party PCIe IP: Part 1

A Case Study of a logic simulation environment using pcieVHost and Third Party PCIe IP: Part 1

Introduction

As I write, the first lines of the pcieVHost model were laid down just about 20 years ago. In the article I wrote on the internal architecture of the PCIe C model, I summarised the origins of pcieVHost as a means to teach myself the PCIe specification as I had been tasked to implement a 16 lane Endpoint design (amongst my other objectives) as “the available 3rd party implementations had unacceptably high latencies”, and we were designing supercomputer clusters at the time, so communications between nodes had to be bleeding edge fast. I can’t speak for others, but to learn complex concepts I need to implement and try things to reinforce what I’m learning and I started to write C code, as a home side project, to construct the bits for all the different types of traffic that can occur on a PCIe link, initially sending this to a file. To verify that I’d created the patterns I thought were correct, I wrote more code to read the data in the files and decode the patterns and do a rough display of the different patterns of the decoded bits.

The EP design I was implementing was originally meant to be verified against commercial sign-off quality VIP but it became clear, due to licencing costs, that this would not be available until the lasts stages of development, therefore I needed a means to drive the IP to verify its implementation against what I believed was correct behaviour—it would be verified against the specification using the VIP down the line before committing to tape out. Now my VProc virtual processor co-simulation element (conceived a few years earlier) was already being used in verifying sub-systems, substituting for multiple “thread” processors and it occurred to me to use this to adapt my crude PCIe C code to drive and decode signals on a PCIe link in a logic simulation. I needed to adapt the way VProc was used to do this, and I’ve written about this, and more generally about modelling arbitrary protocols, in another article. Thus was pcieVHost born.

In fact, beyond the PCIe Endpoint test environment, the software team took it and, along with using Verilator as a free alternative to the commercial licenced simulator we were using, connected this up to QEMU running their Linux drivers and drive a network simulation, through the PCIe interface, to do true co-development of application software and RTL. The diagram below summarises this system, though it is from memory, as I do not have access to any of the documentation I made at that time.

Article content

So, the pcieVHost model has been used to verify commercial PCIe designed (even if just my own implementations) which, themselves, have been through sign-off testing for PCIe compliance.

Purpose and Goals

Unfortunately, as for the documentation, I do not have access to any of the test environments and IP from the EP development time and so, over the two articles, I want to explore using pcieVHost to drive third-party commercially available IP in a way that reflects a real development test environment. Instead of verifying the PCIe implementation (likely signed-off for compliance already), I want to look, instead, at driving available PCIe Hard Macro IP that is provided in some commercial FPGAs that would then drive the application IP being developed that will be using the supplied vendor PCIe implementation. This will serve as a case study, and a template, of what is required to put together such a test environment based around the pcieVHost co-simulation model.

For this case study, the general plan is to:

  • Uses the FPGA vendor’s tools to generate the simulation source code
  • Configure the vendor IP suitable for our needs
  • Create a wrapper to abstract away the details of the IP’s instantiation and create a simplified external  interface
  • Connect the pcieVHost  model to the IP’s PCIe link
  • Use the Memory Model co-simulation module to stand in for the developer’s application RTL
  • Go through resetting the IP and bring the link PHY to powered up state (L0)
  • Initialise Virtual Channel 0 (VC0) data link layer flow control
  • Do a basic configuration space exploration and enumeration
  • Transfer, through the PCIe IP, memory data to the memory model and read it back with verification.

This, at first glance, may not look too ambitious as we end up doing a basic couple of memory reads and writes but, due to the nature and complexity of PCIe, an awful lot has to happen to make this possible, and once this point is reached we are communicating with the application IP that’s being developed and have arrived at a starting point for it to be exercised to any level of complexity required. For this article, so as not to obfuscate documenting the process of constructing such a test bench environment, I want to keep everything to a bare minimum consistent with reaching the above stated goals. Hopefully, as we proceed, it will be clear what might be required to add more features or complexity, to meet a particular project’s needs—i.e. the usual textbook cop out of “that’s left as an exercise for the student”.

The article is split over two parts and in this first article the focus is on the third-party IP—with a step-by-step process of choosing the IP to use, configuring the IP for requirements with the EDA tools, generating the source code for simulation and creating wrapper logic to configure and abstract away the PCIe IP details, suitable for easy instantiation in a logic simulation environment. It is in the second article that the test bench will be constructed to use this wrapped IP and to run a test from power up to sending memory read and write transactions, through the PCIe block, to attached logic being developed as the application. So it won’t be until the second article that we will meet pcieVHost, but this first article documents the important process of obtaining the PCIe IP to be used and prepare it for ease of integration into the simulation verification environment.

What’s discussed over two articles is all available in the pcieVHost repository in the vhdl/testlaltpcie directory. This test bench is for VHDL as there is an extra step in instantiating the Verilog PCIe IP top level in the VHDL for a mixed language simulation, but this just as easily can be done in Verilog or SystemVerilog, and there are versions of pcieVHost and mem_model in Verilog for use in the languages.

Choosing 3rd Party PCIe IP

For this case study, there are two obvious sources of FPGA IP that suggest themselves, and potentially other sources from other vendors as well. I have chosen to use the Altera option, but only because I am most familiar with their Quartus tool chain. The general concepts, if not the details, will be the same for Xilinx and Vivado and even for an ASIC development using 3rd Party PCIe IP for their PCIe interface.

It may be, in the future, I will repeat this processor for Xilinx devices and their tool flow, or maybe someone can volunteer to do this and contribute the advancement of the pcieVHost project. Feel free to contact me directly (simon@anita-simulators.org.uk).

In the rest of this article, for brevity, I will have to assume some basic knowledge of PCI Express. If you’re new to this subject, then you can read my PCIe Primer first.

Altera Cyclone V Hard IP for PCI Express

Altera supplies PCIe IP for many of their family of devices, including Agilex, Arria and Stratix, but we will choose a Cyclone V device as being readily available, low cost and power efficient. Not all variants of the Cyclone V have this Hard IP, but the GX, GT, SX and ST devices have GEN1 and GEN 2 implementations.

This IP can be configured for Root Complex or Endpoint configuration, but we’ll stick to an endpoint interface. At the other end, the interface can be either an Avalon streaming (Avalon-ST) or Avalon memory mapped (Avalon-MM) interface, and we need the latter. These are the two main interfaces of the IP, but there’s a lot needed to be done to get them configured as required.

For the purposes of this case study, it doesn’t really matter which Cyclone V device we use as the IP is the same in all device variants that support it. But we will choose, arbitrarily, a 5CGXFC7D6F31C7 device. We can tell this device has PCIe Hard IP as the character after 5CGX is ‘F’ instead of a ‘B’. We will need this later when we generate the IP simulation model.

EDA Tools

Another reason for choosing the Altera flow (and this applies to Xilinx as well) is that, despite being commercial and licenced EDA packages, there are versions available that are free to download which have everything that’s required to generate and use the PCIe hard IP. For many people reading this article they are likely to be using commercial EDA tools in the work environments, and this case study covers using such an EDA flow, even if using free versions.

The Altera tool chain is the Quartus Prime Tool. There is a licence free SC Lite version that only supports the low-end devices, such as Cyclone, and is missing some advance features such as partial reconfiguration and formal verification, but these restrictions need not concern us. Optionally supplied with Quartus is a free edition of the Siemens Questa logic simulation as the Altera Starter FPGA Edition. This does require a licence, but it is available at no cost, though must be renewed each year. Registering an account is required which can be done via the FPGA Licensing Support Center and clicking the “sign in“ icon in the top bar. The EDA tools can be downloaded from the FPGA Software Download Center, where you can choose between Windows and Linux operating systems. When running the installer, make sure that, under Questa-Altera FPGA and Starter Editions that the Starter Edition box is ticked. As a minimum in the Devices section, Cyclone V device support must be ticked. To save disk space you may untick any devices you don’t want and any add-ons (they can be installed at a later date if required).

Generation of Simulation Model

Before we start on this topic it should be noted that the pcieVHost repository has pre-compiled Questa libraries for the Altera PCIe IP simulation model and so the simulation can be run without compiling the Altera source code (the default situation). This step is documented here for completeness to go through the process, and if it is desired to generate and compile the Altera code, removing or moving the vhdl/testaltpcie/libraries directory will generate compile the source code when ‘make’ is next run.

Before we can generate the code for the PCIe EP we need to decide what we want and so a refined specification from our original goals is required. A stated goal was to keep things at a minimum and so we want the simplest valid specification:

  • PCIe GEN 1 specification
  • An Endpoint
  • Completer only (no ability to generate requests)
  • Single Lane PCIe link
  • PIPE interface (no PHY level modelling)
  • Single configured 32-bit single Avalon memory mapped master interface
  • 32-bit address
  • 64-bit data

Some things to note here. Altera provide an Avalon-ST or Avalon-MM interface option, and we will use the latter for this exercise so we can do basic reads and writes to address in memory space. The Avalon interface has 32-bit address, but the choice for data width is 64- or 128-bits. We will choose the 64-bit data and do something about this in the logic to have a 32-bit data bus in the design.

Platform Designer and IP Generation

The Quartus tool suite has a graphical sub-program called Platform Designer, which is GUI interface to put together, configure and connect blocks of design for both the supplied IP and for user designs. In this article it would take too long to put together a system completely in Platform Designer, but we will use it to gain access to the PCIe IP and generate the source code. You can gain access to Platform Designer by firing up Quartus and selecting Tools->Platform Designer. You can also fire up Platform Designer from a terminal with the qsys-edit command. From Quartus there is also an IP Catalog window, and, under the Interfaces section, there is a PCI Express with “Cyclone Hard IP for PCI Express” as an option. However, I could only get access to the Avalon streaming version of the IP and  not the memory mapped version. This isn’t the case with Platform Designer, hence using this method.

When Platform Designer is running there is a single clock block and an IP Catalog box at the top left. There are various selections under Library, including Interface Protocols which is where a PCI Express tab is found and this has two selections to choose from for the Cyclone V devices. We want the Avalon-MM Cyclone V Hard IP for PCI Express, and double clicking this adds the block to the System Contents window and opens up a new configuration window, which we need to configure. Firstly, we want the basic settings for both the PCIe interface and the mappings to the Avalon interface.

Article content

The above diagram shows that a single lane is selected (changed from a default of 4), Gen1 is selected and configured as a native endpoint. It doesn’t really matter by RX Buffer credit allocation is changed from Low to Balanced. This is a trade-off between performance and resource usage. The reference clock value defaults to 100MHz but can be selected to be 125MHz. 125MHz is going to be the frequency of the Avalon interface whichever reference clock is being used, so my thinking was that using 125MHz is likely to take less simulation time in generating the interface clock from it. Of course, in a real design there are other factors that may affect this choice. A check box to use a half rate interface (application) clock is available if the user logic needs a slower clock, but we will not need this. We could also select being able to configure the FPGA logic over the PCIe interface if we want, but we will skip this option.

The base address registers (BARs), in the Configuration Space PCI registers, define the sections of relocatable mapped memory blocks a device can have, up to 6. For each BAR we configure there will be a separate Avalon interface. All the BARs are disabled by default, but at least one must be configured and we will configure just BAR0. We only have to choose between 64-bit prefetchable memory and 32-bit non-prefetchable memory, and we configure for the simpler 32-bit option. All the other BARs are left at their default Disabled configuration. After the BARs we have the Device Identification Registers.

Article content

For the test bench that’s being constructed, these could actually be left at their default values but, so we can verify that we can read these registers in the tests, the values as shown above have been set , with the Vendor IP set for Quadrics (see here) and a class code set for a network controller (0x0002) sub-class “other” (0x80) and programming interface “network controller” (0x01). The next section to configure is the PCI Express/PCI capabilities:

Article content

This section has multiple tabs, but we only need to change a few from the defaults. The Device Tab shown above uses the default values with the smallest setting for a maximum payload size of 128 bytes. The Completion timeout range is not relevant for devices that don’t issue requests and the Implementation completion timeout disable is mandatory for endpoints at GEN2 and higher, so we will leave it even for GEN1. All the other tabs are left at default settings except the Link tab, where the Slot clock configuration check-box is unticked. It doesn’t really matter, and this just says  it is using its own reference clock and not one supplied from the root complex. The next section configures the Avalon -MM interface:

Article content

This is largely left to default values, but we change the Peripheral mode to Completer-Only and uncheck the Control Register access port. This is an additional Avalon MM slave port to allow configuration space register updates via memory reads and writes. In the continued goal of minimalist configuration, none of the other optional features are selected. The final section to configure is the Avalon to PCIe Address Translation Settings.

Article content

A minimalist approach is taken here as well, with the number of (consecutive) pages left at two, and their size set to the smallest value of 4Kbytes.

Simulation Source Generation

With the configuration done and the Finish button pressed the main System Contents window will show the added PCIe block. There will also be an error saying the block needs a clock. For our purposes I don’t know if it’s important, but I connected the refclk pin to the clk output of the clock source block. The Platform Designer should now look like the diagram below.

Article content

Generating HDL is now just a matter of clicking the Generate HDL… button. This brings up a window to select the type of output (Verilog or VHDL) for synthesis and for simulation, where you want it saved and the name you want to give its (e.g. pcie1_ep_avmm).

The simulation model output is what we’re interested in, and it doesn’t really matter if it’s Verilog or VHDL. The top level of the IP will be Verilog anyway, and just a file instantiating the model and a component definition is provided when selecting VHDL. We don’t really need this, and I came across issues where the type of the component generics were different from the underlying module, such as a string where the module had an integer for the corresponding parameter. This was also true if the instantiating module was Verilog, but it didn’t give compilation issues though did appear to cause issues when simulated. Therefore, I bypassed these optional files and created a wrapper for the Altera IP in SystemVerilog, and it is this that will be instantiated in the VHDL environment.

Wrapping the Altera IP

The top level module for the Altera IP (altpcie_cv_hip_avvm_hwtcl) has 257 parameters and 359 ports. That’s quite intimidating, to say the least. For this exercise and the configuration that was set up for the IP, most of these parameters can be left at their default settings and most of the ports will be unused. It will be prudent to abstract away the details of the parameter settings and to only expose the signals the we are going to need in the test bench. The wrapper will be in SystemVerilog and will require a much, much less complicated component definition to instantiate in VHDL than the Altera IP directly. This wrapper module is called pcie1epavmm, and the source code can be found in the vhdl/testaltpcie/svlog/pcie1epavmm.sv file.

Setting of Altera IP Parameters

As mentioned above, many of the parameters can be left at their defaults and so the wrapper does not need to set a value for all the possible 257 parameters. The parameters that are set are shown in the code fragment below:

Article content

In the above code fragment, the parameters have been grouped by function and the comments on the right hand side give the default values where the settings are changed. There are some settings that remain at default values but are included for completeness and documentation for the group they belong to. Note the bar0_size_mask_hwtcl setting where the default value is a string, but it is set to an integer to give the number of bits. Using a string in the code failed to set the correct configuration and I had to reverse engineer some code to find what was required. This is an example of where the auto-generated code did not entirely match the required parameter type.

Most of these parameters should be familiar with the settings made in Platform Designer. Only BAR0 parameters are set and using mostly the default values, except for the aforementioned mask setting to give 4Kbytes.

The VC0 parameters weren’t set directly in Platform Designer, but by the minimum/low/balanced selection, though these can be changed here to suit specific requirements.

The single_rx_detect parameter specifies the number of receiver detect blocks required and should match the number of lanes. Leaving this at the default of 0 or specifying more than the active lanes seems to cause issues. The last two parameters simply reflect the address translation settings from the Platform Designer configuration. All the other 230 parameters remain at their default settings and thus the configuration has been narrowed down to just a handful of required changes. A similar exercise can be done for the ports.

Extraction of Required Signals

Of the 359 ports of the Altera IP module, most are duplicates for each of the possible different lanes in the PCIe link or for Avalon interfaces for each of the six BARs. Since we have one lane and one Avalon interface, we can leave most of these unconnected—with a few exceptions which I’ll return to. Apart from the two main interfaces, there are just a few additional ports for clocks and reset, and some simulation model control and status outputs. The code fragment below shows the port connections used:

Article content

The reset inputs are almost the same function, bit the npor input can also have any local reset signalling combined with the signal into pin_perst. For this exercise, we can use the same reset signal for both. For the clock there is the reference clock input, which is configured for 125MHz operation in this case, and a coreclkout clock that the application logic must use for connection to the Avalon bus, derived from the reference clock and also 125MHz.

The next block of signals is for simulation only and are not present for synthesis. They are provided by Altera to speed up simulation by not modelling the PHY layer. As the objective here is not to test the Alter PCIe implementation, but to set up a test environment that connects to it, we can take advantage of this facility. By setting the simu_pipe_mode input to 1’b1, the interface will be the standardised PIPE interface (PHY Interface for PCI Express) between MAC and PHY logic. The pcieVHost model supports the PIPE interface data signals, but also supports 8b10b encoded output, and has serialiser wrappers for bitstreams on each lane. So it is possible to use the model with the IP configured in full serial mode. If using the PIPE mode, a PIPE clock input is required and, for GEN1, this is 250MHz. A couple of useful outputs are also provided to give the current rate the IP is in (GEN1, GEN2 etc.) and the LTSSM state. For the LTSSM, Questa can be configured to have a radix to decode this output with the following added to a .do file as part of the waveform setup to display signals and a wave.do file can be found in the vhdl/testaltpcie directory with these settings.

Article content

The next set of signals are the inputs to the PIPE interface. The phystatus0, rxelecidle0 and rxstatus0 signals are involved in resetting the PHY (more shortly), and these are followed by the received 8-bit data and K bits and a valid input, set when the receiver is stable and receiving code inputs for the lane.

The PIPE output signals give a set of status signals which I’ve routed to the wrapper modules ports. I won’t go through each of these as they are documented in Part 1 of my PCIe Primer. The outputs also include the 8-bit transmit data and K signals for sending upwards on the PCIe link.

The final signals are for the Avalon Memory Mapped master interface associated with BAR0. Since the data is 64-bits wide the wrapper contains some simple logic to map these to 32-bit wide data and (for writes) 4-bit byte enables. This is designed to allow single data transactions of a word, half-word or byte only. More sophisticated logic is needed for larger transactions, or the wider interface can be connected directly to the application logic. For the minimalist ethos taken here, it’s envisaged that this BAR interface is and example for reading and writing 32-bit registers. If a data channel is needed, then a new BAR can be configured for larger transfers, in which case a wide data bus is probably needed to keep logic clock frequencies down.

In addition to the above-mentioned ports, some of the inputs to the unused Avalon bus ports for the other BARs had to be tied off to 0, to avoid X propagation to the used BAR0 Avalon interface port. In particular, the RxmWaitRequest_<num>_i and RxmReadDataValid_<num>_i inputs. This was not true for the unused PIPE inputs.

PHY Reset Sequencer

In the Cyclone V Hard IP for PCI Express User Guide, is detailed the RX transceiver reset sequence required and the timing diagram from this document is shown below.

Article content

The wrapper logic contains a simple open-loop sequencer to perform this reset signalling. The sequencer drives the phystatus0, rxelecidle0 and rxstatus0 signals as shown, with the key settings to assert phystatus0 for a cycle with rxstatus0 at 3, then de-assert with rxstatus at 0. The sequencer drives PIPE inputs directly to the Altera IP module, disconnecting these from the wrapper’s ports. However, a parameter (DISABLE_INT_RESET_SEQ) is provide to disable this internal sequencer and allow the PIPE signals to be driven externally.

Compiling the IP Source

As mentioned earlier, the test bench supplies pre-compiled libraries for the Altera PCIe model, but the test bench also supplies a pre-configured Platform Designer .qsys file from which the Altera PCIe IP can be generated. As part of the output from this it also supplies a TCL script to compile this source. As well as firing up the Platform Designer GUI and hitting the “Generate HDL…” button, this can also be done from a terminal command, qsys-generate, with a just a few arguments. For example, from the vhdl/testaltpcie directory:

qsys-generate –-simulation=VERILOG             \
              -–output-directory=pcie1_ep_avmm \
              ./qsys/pcie1_ep_avmm.qsys        

This will generate a set of files in a folder pcie1_rp_avmm. Under this directory is a sub-directory called simulation. The source code for the model is in the submodules directory, and there is another sub-directory called mentor that contains a TCL script to compile the source code—msim_setup.tcl. To use this script to compile the code then, again from the vhdl/testaltpcie directory, the following command can be used:

vsim -c -do "set QSYS_SIMDIR "[pwd]/pcie1_ep_avmm/simulation;       \
             source pcie1_ep_avmm/simulation/mentor/msim_setup.tcl; \
             dev_com;                                               \
             com;                                                   \
             quit"        

After this command is run, a libraries directory will be created/updated containing the pcie_cv_hip_avmm0 library that has the compiled code. When running the simulation, we can map this library location to a logical name using vmap:

vmap pcie_cv_hip_avmm_0 libraries/pcie_cv_hip_avmm_0        

This mapping should be done whether compiling from scratch or using the pre-compiled libraries. The logical name can then be used, when compiling the wrapper source code and running the simulation during elaboration, to find the Altera PCIe model library. The wrapper SystemVerilog, then, is compiled as:

vlog -sv -l pcie_cv_hip_avmm_0 svlog/pcie1epavvm.sv -work work        

From this point we can compile all the other HDL source for the test bench and compile the C and C++ for the pcieVHost and VProc. This is all conveniently gathered up into a makefile for compiling with make, including the steps above to compile the Altera PCIe library from scratch or using the pre-compiled libraries.

Conclusions

In this first of a pair of articles, to start, a PCIe IP implementation has been chosen after selection of an FPGA vendor, a target FPGA device, and the ‘flavour’ of PCIe interface on offer for that device. From this point, a step-by-step process was given for using the Vendor’s EDA tools to configure the PCIe block to the stated requirements and the source code generated suitable for simulation.

The top level IP block was found to have are large parameter count and an even larger port count, but with most of these unchanged from defaults or able to be left unconnected. So, a wrapper module was constructed to instantiate the IP configuring the parameters set only for what is required, and only routing signals to external ports that were required. A sequencer process was also added to perform receiver reset after power on. This abstraction made instantiating to PCIe block much simpler, especially as this was to be instantiated in VHDL as a mixed language component.

Once this wrapper logic was constructed, the steps were explored to use the provided TCL script to compile the Altera PCIe IP for the Siemens Questa logic simulator bundled with the Altera Quartus tools. The next step was given for generating a logical library mapping from the resultant compiled library, and the final step of compiling the wrapper SystemVerilog source code.

At this point, then, we have everything we need to use this IP in a pcieVHost based logic simulation environment. This is the subject of the next article, where a test bench is constructed around the pcieVHost VIP, along with mem_model as a stand in for application logic to be developed using the Altera PCIe IP.

Further Reading

Below are gathered all the links made in this article for ease of reference

Co-simulation VIP References

Articles

External References


To view or add a comment, sign in

More articles by Simon Southwell

Others also viewed

Explore content categories