STUART F. OBERMAN
stuart@oberman.net
www.oberman.net
E
DUCATION
- PhD Electrical Engineering, Stanford University, Stanford, California, December, 1996
Dissertation: Design Issues in High Performance Floating Point Arithmetic Units 
- MS Electrical Engineering, Stanford University, Stanford, California, March, 1994
- BSE with Honors and with High Distinction Electrical Engineering, University of Iowa, Iowa City, Iowa, 1992
 
E
MPLOYMENT
October 2002 – Present
 
- Principal Engineer
, NVIDIA, Santa Clara, CA.September 1999 – October 2002
 
- Manager, VLSI Design
, Nishan Systems, San Jose, CA.
– Architect and Team Leader of 64G switch fabric chipset; Led team of 5 engineers in design and verification of switch fabric; Wrote architectural specification for 32G and 64G versions of the chipset; Coded several RTL modules; Participated in chipset verification; Implemented physical design of both chipsets in Virtex-II FPGAs, including all logic synthesis, floorplanning, I/O selection and placement, and individual module placement and routing; Successfully validated 32G chipset in multiprotocol storage switch platform, including signal integrity analysis and full chip-level and system-level diagnostics.
– Architect and Team Leader of Traffic Managers for high density storage / LAN switch; Led team of 10 engineers in design and verification of traffic manager ASICs and FPGAs; Co-wrote architectural specification for entire chassis switch; Wrote micro-architectural specification for 10G Traffic Manager ASIC; Wrote micro-architectural specification for 4G Traffic Manager FPGA; Designed new algorithms for packet scheduling, congestion avoidance, and buffer management; Coded several RTL modules for 4G Traffic Manager; Led physical design of 4G Traffic Manager; Evaluated many 3rd party high density switch fabric chipsets supporting 10G and 40G line cards; Evaluated many 3rd party 10G and 40G network processors chips
 
- Lead designer of switch engine ASIC for medium density storage / LAN switch; Wrote micro-architectural specification; Coded several RTL modules; Participated in physical design; Led verification team
October 1999 – April 2000  
Architecture Consultant, SiByte / Broadcom, Santa Clara, CA.
Consulted on the design and implementation of a system-on-a-chip targeted to set-top boxes and networking applications. 
November 1995 - September 1999 
Senior Member of the Technical Staff, 
AMD, Sunnyvale, CA.
- Architect of multimedia unit; developed 2D, 3D, and video algorithms and hardware for integration into next-generation microprocessor
- Architect of the Athlon/K7 floating point unit; performed algorithm, RTL and logic design for the Athlon FPU
- Architect of the AMD 3DNow! instruction set; co-developed set of single-precision vector FP instructions to enhance 3D graphics and audio; performed algorithm, RTL, and logic design for the K6-2 implementation of these instructions
- Logic designer; improved the algorithms, logic and circuits of the K6 FPU
November 1996 – August 2000  
Consulting Assistant Professor, 
Electrical Engineering Department, Stanford University 
Performed research in algorithms and implementations for high-performance floating-point arithmetic and computer architecture in the Stanford Architecture and Arithmetic Group. 
September 1992 - December 1996 
Research Assistant, 
Electrical Engineering Department, Stanford University 
Performed research in algorithms and implementations for high-performance computer architecture and floating-point arithmetic in the Stanford Architecture and Arithmetic Group under the direction of Professor Michael J. Flynn. 
November 1995 - January 1996  
Floating-Point Consultant, 
Toshiba America, San Jose, CA.
Consulted on the design and implementation of a next-generation high performance floating-point unit. 
April - July 1995  
Logic Design Consultant, Integrated Information Technology (8x8, Inc), Santa Clara, CA.
Performed logic design, control logic synthesis, and critical path design for a high-performance microprocessor. 
April - October 1994  
Floating-Point Consultant, Vertex Semiconductor, San Jose, CA.
Performed logic design, control logic synthesis, and critical path design for a high-performance floating-point adder and multiplier. 
Summer 1993  
Research Intern, DEC Western Research Laboratory, Palo Alto, CA
Designed and implemented a time and event-driven inline simulator for a low power, low cost Alpha processor. Analyzed system performance and power for various datapath and memory system configurations. 
E
NGINEERING SERVICE
- Program Committee Member, ARITH-16 IEEE Symposium on Computer Arithmetic, 2003; ARITH-15 IEEE Symposium on Computer Arithmetic, 2001; ARITH-14 IEEE Symposium on Computer Arithmetic, 1999; SCAN-98 International Symposium on Computer Arithmetic and Validated Numerics, 1998
- Reviewer, IEEE Transactions on Computers; IEEE Transactions on VLSI Systems; IEEE Journal of Solid-State Circuits; International Symposium on Computer Arithmetic; International Symposium on Computer Architecture; International Conference on Architectural Support for Programming Languages and Operating Systems; International Symposium on Microarchitecture
P
ATENTS
- 5,918,062 Microprocessor including an efficient implementation of an accumulate instruction 
- 6,026,483 Method and apparatus for simultaneously performing arithmetic on two or more pairs of operands 
- 6,029,244 Microprocessor including an efficient implementation of extreme value instructions 
- 6,038,583 Method and apparatus for simultaneously multiplying two or more independent pairs of operands and calculating a rounded product 
- 6,085,208 Leading one prediction unit for normalizing close path subtraction results within a floating point unit
- 6,085,212 Efficient method for performing close path subtraction in a floating point arithmetic unit
- 6,085,213 Method and apparatus for simultaneously multiplying two or more independent pairs of operands and summing the products
- 6,088,715 Close path selection unit for performing effective subtraction within a floating point arithmetic unit
- 6,094,668 Floating point arithmetic unit including an efficient close data path
- 6,115,732 Method and apparatus for compressing intermediate products
- 6,115,733 Method and apparatus for calculating reciprocals and reciprocal square roots
- 6,131,104 Floating point addition pipeline configured to perform floating point-to-integer and integer-to-floating point conversion operations
- 6,134,574 Method and apparatus for achieving higher frequencies of exactly rounded results
- 6,144,980 Method and apparatus for performing multiple types of multiplication including signed and unsigned multiplication
- 6,175,911 Method and apparatus for concurrently executing multiplication and iterative operations
- 6,223,192 Bipartite look-up table with output values having minimized absolute error 
- 6,223,198 Method and apparatus for multi-function arithmetic 
- 6,256,653 Multi-function bipartite look-up table
- 6,269,384 Method and apparatus for rounding and normalizing results within a multiplier
- 6,298,367 Floating point addition pipeline including extreme value, comparison and accumulate functions
- 6,370,637 Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria
- 6,374,345 Apparatus and method for handling tiny numbers using a super sticky bit in a microprocessor 
- 6,381,625 Method and apparatus for calculating a power of an operand
- 6,393,554 Method and apparatus for performing vector and scalar multiplication and calculating rounded products
- 6,393,555 Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit
- 6,397,238 Method and apparatus for rounding in a multiplier
- 6,397,239 Floating point addition pipeline including extreme value, comparison and accumulate functions
- 6,408,379 Apparatus and method for executing floating-point store instructions in a microprocessor
- 6,425,074 Method and apparatus for rapid execution of FCOM and FSTSW
+ 	Several patents filed in the areas of packet scheduling, traffic management, and switch fabrics for storage and IP switches
 
P
UBLICATIONS
Book
Journals
- Reducing the Mean Latency of Floating Point Additionby Stuart F. Oberman and Michael J. Flynn, Theoretical Computer Science, vol. 196, no. 1, pages 201-214, April 1998. 
Minimizing the Complexity of SRT Tables by Stuart F. Oberman and Michael J. Flynn, IEEE Transactions on VLSI Systems, vol. 6, no. 1, pages 141-149, March 1998. 
Division Algorithms and Implementations by Stuart F. Oberman and Michael J. Flynn, IEEE Transactions on Computers, vol. 46, no. 8, pages 833-854, August 1997.
- Advances in High Performance Floating Point Unit Designby Stuart F. Oberman, Hesham Al-Twaijry, and Michael J. Flynn, in Proceedings of the 15th IMACS World Congress on Scientifc Computation, Modelling, and Applied Mathematics, August 1997. 
Design Issues in Division and Other Floating-Point Operations by Stuart F. Oberman and Michael J. Flynn, IEEE Transactions on Computers, vol. 46 no. 2, pages 154-161, Februrary 1997. 
Reducing Division Latency with Reciprocal Caches by Stuart F. Oberman and Michael J. Flynn, Reliable Computing, vol. 2, no. 2, pages 147-153, April 1996.
Conferences
Tyrant: A High Performance Storage over IP Switch Engine by Stuart Oberman, Kamran Malik, Rodney Mullendore, Anil Mehta, Keith Schakel, Michael Ogrinc, in Proceedings of Hot Chips 13, August 2001.
Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor by Stuart F. Oberman, in Proceedings of the 14th IEEE Symposium on Computer Arithmetic, pages 106-115, April 1999. 
AMD 3DNow! Technology: Architecture and Implementations by Stuart Oberman, Greg Favor, and Fred Weber, in IEEE Micro, vol. 19, no. 2, pages 37-48, March 1999. 
An Out-of-Order Three-Way Superscalar Floating Point and Multimedia Processor by A. Scherer, M. Golden, N. Juffa, S. Meier, S. Oberman, H. Partovi, and F. Weber, in Digest of Technical Papers, IEEE International Solid-State Circuits Conference, February 1999. 
AMD 3DNow! Technology and the K6-2 Microprocessor by Stuart Oberman, Fred Weber, Norbert Juffa, and Greg Favor, in Proceedings of Hot Chips 10, pages 245-254, August 1998. 
A 0.25um x86 Microprocessor with a 100MHz Socket 7 Interface by R. Khanna, A. Ben-Meir, L. DiGregorio, D. Draper, R. Krishna, R. Maley, A. Mehta, S. Oberman, L. Tsai, and T. Williams, in Digest of Technical Papers, IEEE International Solid-State Circuits Conference, February, 1998.
The SNAP Project: Design of Floating Point Arithmetic Units by Stuart F. Oberman, Hesham Al-Twaijry, and Michael J. Flynn, in Proceedings of the 13th IEEE Symposium on Computer Arithmetic, pages 156-165, July 1997. 
SRT Division Architectures and Implementations by David L. Harris, Stuart F. Oberman, and Mark A. Horowitz, in Proceedings of the 13th IEEE Symposium on Computer Arithmetic, pages 18-25, July 1997.
A Variable Latency Pipelined Floating-Point Adder by Stuart F. Oberman and Michael J. Flynn, Proceedings of Euro-Par'96, Springer LNCS vol. 1124, pages 183-192, August 1996.
The SNAP Project: Towards Sub-Nanosecond Arithmetic by M. J. Flynn, S. Oberman, S. Fu, H. Altwaijry, K. Nowk a, G. Bewick, E. Schwarz, and N. Quach, presented at the NSF/MIPS Conference on Experimental Research on Computer Systems, June 1996.
Implementing Division and Other Floating-Point Operations: A System Perspective by Stuart F. Oberman and Michael J. Flynn, in Proceedings of SCAN-95, International Symposium on Scientific Computing, Computer Arithmetic, and Validated Numerics, pages 18-24, September 1995.
Technical Reports
Fast IEEE Rounding for Division by Functional Iteration by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-96-700, Computer Systems Laboratory, Stanford University, July 1996. 
A Variable Latency Pipelined Floating-Point Adder by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-96-689, Computer Systems Laboratory, Stanford University, February 1996. 
Measuring the Complexity of SRT Tables by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-95-679, Computer Systems Laboratory, Stanford University, November 1995. 
An Analysis of Division Algorithms and Implementations by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-95-675, Computer Systems Laboratory, Stanford University, July 1995. 
On Division and Reciprocal Caches by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-95-666, Computer Systems Laboratory, Stanford University, April 1995. 
Design Issues in Floating-Point Division by Stuart F. Oberman and Michael J. Flynn, Technical Report No. CSL-TR-94-647, Computer Systems Laboratory, Stanford University, December 1994. 
The Design and Implementation of a High-Performance Floating-Point Divider by Stuart Oberman, Nhon Quach, and Michael Flynn, Technical Report No. CSL-TR-94-599, Computer Systems Laboratory, Stanford University, January 1994.