Monday, August 22, 2011

Soft RoCE, an alternative to Soft iWarp

Introduction

The Soft RoCE distribution is available now as a specially patched OFED-1.5.2 distribution, which is known as OFED-1.5.2-rxe. Users familiar with the installation and configuration of OFED software will find this easy to use. It is supported by System Fabric Works. Please refer to the official website for soft-RoCE for the further details.

Features: 

Provide Infiniband-like performance and efficiency to ubiquitous Ethernet infrastructure.
  • Utilize the same transport and network layers from IB
    • Stack and swap the link layer for Ethernet.
    • Implement IB verbs over Ethernet.
  • Not quite IB strength, but it’s getting close.
  • As of OFED 1.5.1, code written for OFED RDMA , auto-magically works with RoCE.
Performance :
(from IMPLEMENTATION &  IMPLEMENTATION & COMPARISON OF  COMPARISON OF RDMA OVER ETHERNET RDMA OVER ETHERNET)


  • RoCE is capable of providing near-Infiniband QDR  performance for :
  • Latency-critical applications at message sizes from  128B to 8KB
  • Bandwidth-intensive applications for messages <1KB.
  • Soft RoCE is comparable to hardware RoCE at message sizes above 65KB.
  • Soft RoCE can improve performance where RoCE-enabled hardware is unavailable.

Installation 

The Soft RoCE distribution contains the entire OFED-1.5.2 distribution, with the addition of the Soft RoCE code.

Download link : http://www.systemfabricworks.com/downloads/roce

Installation of the OFED-1.5.2-rxe distribution works exactly the same as a “standard” OFED distribution.  Installation can be accomplished interactively, via the “install.pl” program, or automatically via “install.pl –c ofed.conf”.  The required new components are “librxe” and “ofa-kernel.” The latter is not new, but in our "rxe" version of the OFED distribution it includes the rxe/Soft RoCE kernel module.
 

Usage

After install OFED-1.5.2-rxe, you can use rxe_cfg command to configure Soft-RoCE. Herein, I list a few most useful commands for us.
# rxe_cfg -h
Usage:
rxe_cfg [options] start|stop|status|persistent|devinfo
rxe_cfg debug on|off| (Must be compiled in for this to work)
rxe_cfg crc enable|disable
rxe_cfg mtu [rxe0]  (set ethernet mtu for one or all rxe transports)
rxe_cfg [-n] add eth0
rxe_cfg [-n] remove rxe1|eth2
Options:
 -n: do not make the configuration action persistent
 -v: print additional debug output
 -l: in status display, show only interfaces with link up
 -h: print this usage information
 -p 0x8916: (start command only) - use specified (non-default) eth_proto_id
1. Enable the Soft-RoCE module
rxe_cfg start
# rxe_cfg start

Name  Link  Driver   Speed   MTU   IPv4_addr        S-RoCE  RMTU
eth0  yes   bnx2             1500  198.124.220.136
eth1  yes   bnx2             1500
eth2  yes   iw_nes           9000  198.124.220.196
eth3  yes   mlx4_en  10GigE  1500  192.168.2.3
rxe eth_proto_id: 0x8915

2. Disable the Soft-RoCE module
rxe_cfg stop

3. Add a Ethernet interface to the Soft-RoCE module
rxe_cfg add [ethx]

4. Remove a Ethernet interface from the Soft-RoCE module
rxe_cfg remove [ethx|rxex] 
 
Tuning for performance:

1) MTU size
The Soft-RoCE interface only support four MTU size: 512, 1024, 2048 and 4096. In order to max the performance, we can choose 4096.
Commands: ifconfig [ethx] mtu 9000 // set the jumbo frame for the original Ethernet interface.
rxe_cfg mtu [rxex] 4096 // set the max MTU to the according rxe interface.


Note: you also need to enable your switch to support jumbo frame

2) CRC checking
To max the performance, we need to disable crc checking.
Commands: rxe_cfg crc disable

3) Ethernet tx queue length
Also, we need to give a large number to the txqueuelen parameter of the original Ethernet interface.
Commands: ifconfig [ethx] txqueuelen 10000

11 comments :

  1. Is there an equivalent perftest for Soft RoCE ? These tests look for an IB device and quit.

    ReplyDelete
  2. In C code you can use :
    "ibv_get_device_list" to get the device list
    "ibv_query_device" to get the device info

    Or in shell if you installed the utility tools (see post on soft-iwarp) :
    ibv_devinfo
    ibv_devices

    These tools will give you info on the IB devices present.
    Then you just have to wrap that aroudn with shell command (grep / sed etc..) to test for wathever device you are looking for

    ReplyDelete
  3. Hi

    Correct me If I am wrong but this means that we could use PCs available in the market with ordinary NIC cards to create a OFED cluster correct?

    ReplyDelete
  4. Yes you can use SoftIwarp or SoftRoCE to create an OFED cluster. However be warned that the performance will much lower than the HW version.
    It is a nice low cost alternative solution for dev /testing.

    ReplyDelete
  5. hi,
    how about the performance of soft RoCE over 1 gigabit ethernet. is it gives better performance than 1GbE

    ReplyDelete
  6. Hi mny, i am not sure i understand your question, do you mean softIwarp over 1 GbE vs softRoCE over 1 GbE ?

    softRoCE tend to provide better performance as there is less software layer to go through. However SoftIwarp is making good progress in term of performance also you are not limited to your local LAN and route the packet over internet since its over TCP/IP..

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. hi,
    thanks for replying.....
    i am a student of M.Tech& i have chosen this topic for my thesis.
    my idea is i am first separately measure the performance of 1 GB ethernet & softRoCE using OMB & IMB Benchmarks.
    then i am going to compare the results of OMB & IMB which one is giving the better performance.

    i have few doubts....
    is it possible to do this???
    or is there an other meaning of of this line " performance evaluation of Soft RoCE over 1 gigabit ethernet."
    if it is possible then is soft RoCE perform well over ethernet in terms of bandwidth ,latency....
    i am bit confused ...i am new to this area. plz help me &guide me whether i have chosen a right path????
    will wait for ur reply...

    ReplyDelete
  9. Yes it is possible . You will be running the MPI program/ Benchmark and comparing standard TCP sockets over 1GbE vs RDMA with softRoCE over 1GbE.

    Note that i would suggest that you run pure network benchmark first in order to compare the performance and also identify potential limitation early.

    ReplyDelete
  10. hi Benoit,
    thanks for replying....if u can provide me few knowledge about soft RoCE.then it will be very helpful for me.

    ReplyDelete