Upgraded SSE (64bit audio + more), v3

Suggest new features, components or other changes to the software

Moderator: electrogear

Upgraded SSE (64bit audio + more), v3

Postby infuzion on Wed Sep 28, 2011 12:55 am

OK, it has been nearly 5 years from the last request, so I'll do this one more time, with a bit more clarity. Everyone has access to CPUs with SSE3 (inc Atom), most SSE4 (AMD K10, Intel Core+, (not Atom) at this posting)

Why are more SSE opcodes are needed?
64bit audio signal path is expected by some for mastering.
64bit audio is needed for some filters to be stable by reducing rounding errors (RBJ, scilab's s-z transform).
Better optimizations (SSE3 allows faster FFT, packed int for fast Sine).

What is needed to do this?
SM2.0 has connectors for double-float (Stream, Poly, & Mono, but not Dual-Packed Mono AFAIK) and Stream 2 Double & Double 2 Stream primitives now already. Here is a priority list for adding more SSE support to SM:
  1. SSE2 & SSE3 instructions opened for the ASM primitive (see below)
  2. + 64bit/Double-Stream connectors for Assembly primitive.
  3. + Double-Stream Mono VST in/out
  4. + converters to & from Poly/Pack4, Mono, & Double-Stream Pack2 (64bit stereo)
  5. + Double-Stream input to primitives (starting with Code, Wave Reader, Wave Table Read)
  6. allow Float Array handle Double-Stream floats, including Preset Parameter Array
Commands needed for Double-Stream audio wrote:addpd //Packed Double-FP Add
andnpd //Bit-Wise Logical And Not (for Double-FP)
andpd //Bit-Wise Logical And (for Double-FP)
cmppd //Packed Double-FP Compare compare
cvtdq2pd //Packed Signed INT32 to Packed Double-FP Conversion
cvtpd2pi //Packed Double-FP to Packed INT32 Conversion
cvtpd2ps //Packed Double-FP to Packed Single-FP Conversion
cvtpi2pd //Packed Signed INT32 to Packed Double-FP Conversion
cvtps2pd //Packed Single-FP to Packed Double-FP Conversion
divpd //Packed Double-FP Divide
maxpd //Packed Double-FP Maximum
minpd //Packed Double-FP Minimum
movapd //Move Aligned Packed Double-FP
mulpd //Packed Double-FP Multiply
orpd //Bit-Wise Logical OR for Double-FP Data
shufpd //Shuffle Double-FP
sqrtpd //Packed Double-FP Square Root
subpd //Packed Double-FP Substract
xorpd //Bit-wise Logical XOR for Double-FP Data
I/O:
s64in/out, s64intin/out, s64boolin/out,


extras we hope to be added also; usually saves an opcode or 2 wrote:movhpd //Move High Packed Double-FP movement
movhlps //Move High Single-FP to Low {faster AMD vs Intel}
movhps //Move High Packed Single-FP
movlpd //Move Low Packed Double-FP movement
movlps //Move Low Packed Single-FP
rsqrtps
unpckhpd //Unpack High Packed Double-FP Data shuffle
unpcklpd //Unpack Low Packed Double-FP Data shuffle
unpcklps //useful for merging 2 mono streams into 1
xorps
SSE3:
addsubpd - Adds the top two doubles and subtracts the bottom two, useful for stereo balance
addsubps - Adds top singles and subtracts bottom singles, useful for stereo balance
haddpd - Top double is sum of top and bottom, bottom double is sum of second operand's top and bottom.
haddps - Horizontal addition of single-precision values.
hsubpd - Horizontal subtraction of double-precision values.
hsubps - Horizontal subtraction of single-precision values.
movshdup - Duplicates the high singles into high and low singles.
movsldup - Duplicates the low singles into high and low singles.

* rounding controlled by MXCSR register (bits 13 and 14)
ldmxcsr //Load MXCSR register
stmxcsr //Save MXCSR register state
(SM default of rounded off?)


I'd prefer to have replies to this post be focused on what opcodes are needed & other 64bit audio improvements are needed.
No whining!
Last edited by infuzion on Sat Nov 12, 2011 2:54 am, edited 4 times in total.
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6169
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: Upgraded SSE (64bit audio + more), v3

Postby trogluddite on Wed Sep 28, 2011 5:13 am

I realise that this is neither 64bit, nor an opcode - but while tinkering with the ASM parser, PLEASE can we have variable name checking (as in the Code module). This would make de-bugging ASM, 1000% easier!
Feel free to use any schematics and algorithms I post on the forum in your own designs - a credit is appreciated (but not a requirement).
Don't stagnate, mutate to create. Without randomness and serendipity the earth would be just another barren rock.
User avatar
trogluddite
smychopath
 
Posts: 3033
Joined: Mon Oct 20, 2008 3:52 pm
Location: Yorkshire, UK

Re: Upgraded SSE (64bit audio + more), v3

Postby mwvdlee on Thu Sep 29, 2011 4:17 pm

Also please include some of the missing SSE instructions: RSQRTPS, CMPPS (and pseudo-ops CMPEQPS, CMPLTPS, CMPLEPS, CMPUNORDPS, CMPNEQPS, CMPNLTPS, CMPNLEPS and CMPORDPS), UNPCKHPS, UNPCKLPS and atleast either ANDNPS or preferably XORPS.
Also, I'd like CPUID and a way to test bits in EDX/ECX so we can test for SSE support ourselves.
Perhaps some nice SSE4.x instructions: DPPS, BLENDPS, ROUNDPS (and less importantly perhaps INSERTPS and EXTRACTPS)
And a larger set of x87 and x86 instructions like JMP, XOR, FLDZ.

The SSE and x87/x86 shouldn't be hard; they require no backwards compatibility with older architectures.

Alternatively, clip out the ASM parser code, hand it to a few trusted forum members under a heavy NDA and sit back and wait while they do your work for you. Companies care about adding new marketable features, users care about improving the boring stuff. This way you can have both.
My current top SynthMaker bug:
    1. MIDI Input issue (showstopper, no workaround)
    2. All my previous bugs in SM1.7, because bug 1 makes SM2 worse than SM1.7
User avatar
mwvdlee
smanatic
 
Posts: 552
Joined: Thu Dec 03, 2009 8:42 am
Location: NL

Re: Upgraded SSE (64bit audio + more), v3

Postby infuzion on Fri Sep 30, 2011 3:48 am

mwvdlee wrote:Also please include some of the missing SSE instructions: RSQRTPS, CMPPS (and pseudo-ops ...), UNPCKHPS, UNPCKLPS and atleast either ANDNPS or preferably XORPS.
Also, I'd like CPUID and a way to test bits in EDX/ECX so we can test for SSE support ourselves.
Perhaps some nice SSE4.x instructions:...
And a larger set of x87 and x86 instructions like JMP, XOR, FLDZ.
You know mwvdlee, you are beginning to frustrate me. You have obviously not read my other post to you about CMPPS & JMP, nor checked the WiKi. FLDZ is a waste since XOR eax,eax is 3 times faster to do the same thing. Those compare pseudo-ops are rhetorical & should not be added due to extra overhead needed. You missed my listing for XORPS.

RSQRTPS is interesting; seems to be calculated much faster than SQRTPS & as fast as RCPPS. I do not know of a formula that needs it though?

I am unsure if we should try for support for anything above SSE3 yet, due to the lack of support in Atom & older CPUs. Would be nice, perhaps at a later version.

I forgot to list testing CPUID; thanks for the reminder.

trogluddite wrote:I realise that this is neither 64bit, nor an opcode - but while tinkering with the ASM parser, PLEASE can we have variable name checking (as in the Code module). This would make de-bugging ASM, 1000% easier!
Pro-tip: "New Topic" button works just as well as "Reply". ;)
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6169
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: Upgraded SSE (64bit audio + more), v3

Postby trogluddite on Sat Oct 01, 2011 12:20 pm

infuzion wrote:Pro-tip: "New Topic" button works just as well as "Reply".

He he, I know you know I know this - just seemed silly to ask Malc to go and change his assembler code to add opcodes, release a new version, and then find another request in his inbox for yet more changes to the same thing. Easier for him to add all his parsing changes in one go.
Feel free to use any schematics and algorithms I post on the forum in your own designs - a credit is appreciated (but not a requirement).
Don't stagnate, mutate to create. Without randomness and serendipity the earth would be just another barren rock.
User avatar
trogluddite
smychopath
 
Posts: 3033
Joined: Mon Oct 20, 2008 3:52 pm
Location: Yorkshire, UK

Re: Upgraded SSE (64bit audio + more), v3

Postby infuzion on Sat Oct 01, 2011 4:42 pm

trogluddite wrote:
infuzion wrote:Pro-tip: "New Topic" button works just as well as "Reply".
He he, I know you know I know this - just seemed silly to ask Malc to go and change his assembler code to add opcodes, release a new version, and then find another request in his inbox for yet more changes to the same thing. Easier for him to add all his parsing changes in one go.
That is true, but your request will be lost anyway; I doubt he will even read past the first post. IMHO it is best to start a new topic of off topic items, so your title can be found with browsing. Then if people will want to comment on your idea, they can do so without polluting the main thread, like we are now.
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6169
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: Upgraded SSE (64bit audio + more), v3

Postby infuzion on Mon Nov 07, 2011 5:25 am

Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6169
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver


Return to Ideas and Requests

Who is online

Users browsing this forum: No registered users and 2 guests

cron