hi

uh i'm for each of which you can reach from the machine learning group at technical university in berlin

and i would present

you a lotta recent work about stationary common patterns

this is joint work with common be dora and able to a key cover now

so here is an overview

i would start with an introduction

it's and tell you something about the common spatial patterns method

and i was stationary this common spatial map headers method

then i was show some results

and concludes that all of a summary

so our target application is brain computer interfacing

and the brain computer interface system

aims to translate the intent of a subject

for example measure

from brain activity

you're in this case by E G

into account for common for a computer application

so it in is this case you a measure ring E G and

you want to control those games is pinball game

but you can also think of other applications like um

controlling a wheelchair or a new row proceed

so a very popular paradigm

from uh for bci is motor imagery

and motor imagery

the subject

imagine some motions with the right hand towards the left hand towards the feet

and is this different emotions lead to different

different patterns in C G

and if your system is able to extract and classify this different patterns

then you can come compared to a computer comment and control an application like

so there are still some challenges

so for example the E G signal is usually high dimension uh

it has a lower spatial resolution

that means you have a volume conduction effect

and sit this noisy and non-stationary

minus one stationary i mean that's is that signal properties change over time

so what usually people do in bci as they apply some efforts

uh some spatial filtering method

for example the csp

in order to reduce the dimensionality

so it's of the goal is to combine electrodes and to like to project a signal to a

to a subspace

and increase the spatial resolution and hopefully the signal-to-noise ratio

and simplified the learning problem

but the problem of csp is that

it's

it's prone to overfitting and it's can be negatively affected by artifacts

and

it doesn't tech as a non tissue issue that means

if you if your computer features

applying csp

then the features may still change

quite a bit and

and usually you classifier assumes

a stable distributions so in machine learning to usually the else assume

that's a

training data and the three test data are comes from the same distribution and if you if you data if

should distribution change too much

then you it doesn't work so the classifier

um

we're not work

all optimal

so therefore we extend

the csp my thought

um

and extract most stationary feature

or like non-stationary at changes of the signal properties of a time

and same may have very different sources and time scale

for example

you you may have changes in the and X road input then as

when the electrodes gets lose all the gel between the scout and the electrode dries out

you may also have muscular activity an eye movements

they made it to artifacts in the data

and

usually also have a

changes in task involve so when subjects could tired

all differences between sessions

so what i can no feedback conditions the calibration session whereas

in the if pick session you provides

so

basically all those non stationarities

a a bad for you because uh as the negative negatively

at um affect you classifier

and so there are two ways to deal with this you can

one way is to extract better features to make your features more troubles and more invariant to this changes

does this is the way we um we propose an our paper our we

target of our paper

the other way is to do adaptation so you can adapt the classifier to double will sustain change

okay so a

common spatial patterns methods

it's and i thought we very popular and brain computer interfacing and and it maximises

the variance from one class while minimizing the variance of the other class

so we if you're you have like to conditions you imagine you have the imagination of the movement of the

right hand and the left hand

and a you you see that these two guys uh down here think maximise the variance of the signal now

to the project signal the maximizer in the

uh right hand

uh condition but minimize the and the

left hand condition

and the two guys a off they do exactly the opposite so them the maximise the variance in the left

condition but many in the right condition

so

why do we want to do so like in in B C i U

goal is to discriminate between mental states

and um

you know that the variance of a band has filtered signal is equal to band power

in is it's frequency but

so and in you can discriminate mental state

and by looking at the power in the specific frequency bands

so when we need to sell

um you can easily

um detect changes uh between the conditions because you're you're looking at the bed power is finally you are looking

at the bed power one specific frequency band a band

and the csp can be solved as uh

generalized eigenvalue problem because

like you can formulate a garrison

here so you want to maximise

um

this

you want to maximise the project variance of one condition

while minimizing the the variance of the common conditional

equally you can also right here you want to minimize the variance of the other condition

of

sigma minus

so we can solve this very easy

it might not work

but our idea is

um we

do not only want the projection

which uh which has this properties but we also want that's a projection

um

if

provide stationary features so we want to penalise non-stationary projection type attack directions

so we introduce the penalty if

P of W

two than denominator also really cool of course for coefficient

you're

so we add this

P of W

here

and then the final goal is to like to

uh to maximise the project variance one condition while minimizing the variance in the other condition and

minimizing this

P a penalty term

so

the penalty term measures somehow non stationarities

so we want to measure the the deviation

between the average case so this is

the sigma C is the average

matrix of all trials from conditions C

um the one condition

and uh the can mark K C is the

uh as

the covariance matrix from the cape chunk a channel maybe

may consist of one trial or more than one trials from the same cloth

so

you want to kind of

to minimize the

and the deviation from the from each trial

of

to the to the average case

so this is like

i don't turn because you want to be stationary

in for for each class separately so you want to do it for each method

hmmm

yeah so the problem is if you

and this quantity to the denominator

then

uh

you want to get this form anymore because you cannot take out as W C outside to some

because of this uh

absolute value function here

so you you want the egg to solve it as the generalized eigenvalue problem anymore

so what

what do we do about this we add a quantity which is related

so we take this W vector outside

the sum

but introduce an operator F

to make this difference matrix

the to be positive definite

because we are only interested in

like in in the

we don't

win the variation

the of both sides and three that in the similar way so we we do not care if

like for example here we we do not care if this guy is big are

oh this guy's bigger we are only interested in the difference after projection

but

here

uh we do kind of the same but

um

we do this before projecting so we we do not do this after projecting up because we take this W

outside the sum

and we can also show that

is this quantity gives an upper bound

of the other quantity which we want that's

to minimize

with

make sense to use it

so we put this guy and the rayleigh coefficient of our objective function

so a lot data set is

we compare

C S P and S E S P on the data set of at at subjects

the foaming a motion meant three

say when you to B C i so they did that for the first time

we selected for each user as a best

binary task combination and the that's parameters on the calibration data

and we we

we this song testing

but test session with feedback back

with three hundred trials

we record that's so i E G from sixty eight three select

electrodes

and use log variance feature and the net the egg classifier uh and error rates to measure up performance

we use a fixed number of fit respect class

and select is the trade of parameter

uh

with cross validation and we also tried different chunk size a

and select it's the best one also by a cross validation

on the calibration date

so if as some performance results that you had you see the scatter plots when using three csp directions back

counts

or using one csp direction class

on the X axis used

the error rate of

csp P and on the Y is error rate of

our approach

and you can you can see that especially specially for subjects which

a which fayer when using csp P like these guys they calm really better

when with our method and

that's the same as can be seen here

and we compute that's um

test statistic and the changes a significance our method works better especially for the subjects

the which have

a red light uh larger than thirty percent

so we we can improve in those cases which which fail in when using

csp we just somehow clear because if

it's csp works

well

then you're

patterns are probably really really good in the signal to noise ratio

it's good so you do not have a lot of room to improve it

but um

as so the question is why does

as C S P perform better

a basically we know that's csp may fail to extract the current patterns when effective by defect

and

as you saw

stationary csp P

it's more robust to as artifacts because it treats artifacts as non-stationary

nonstationary

and it's we uses as non-stationary in the features

and C S P is also known to all buffet

and as csp S P at

you know like this fit with lots not

and produces more it's red uses changes and the features

so for example you hear you see um

the the result that subject performing

left and right to motion imagery

you see that both methods uh a but to extract the colour correct left hand that are

so there activity of the on the right hemisphere this means that

um it's the pattern for the left hand motion imagery

but in the

pose the right hand the csp method fayer

because probably in this electrodes there is an artifact of the um

this is an four gives the noise the signal all that signal

uh it's

kind of nonstationary

and but

scs piece

if they're a bit affected by this

artifacts as this electrode but it's

it's a but to

strike the

more less correct header of the

right hand

and you also see here when you look at the distribution between

uh training feature as and test features

training features uh

uh

of the triangles and test features of the circles

so you see that the distribution is the training phase of

S S of P

look this

usually like like here

but it changes a lot when when you go to the test distribution when when you when you look at

the test features

so that

the distribution is completely difference in the test

that's case

but um

when we use C S P we extract most stable features most stationary features

so the the distribution between training and

and test phase

is um

it's more less the same

so you you can classify in this case to think that if i a lot better

so here's the decision boundary and to see that

a in that that have a case you really fail

to classify

a correct you here

okay so in summary

re

extend that's a popular csp method

to extract stationary features

a S P significantly increase the classification a if especially for subjects

we perform badly with

csp

and unlike other methods like invariant csp

we are completely data-driven

we do not require additional recordings or models of the expected changes

and we also showed that it was not presented in this paper that the combination of stationary features and

unsupervised adaptation can further improve classification performance

so i want to thank you for your attention

we have to and

um can you explain more details about um uh

dot function yeah

in in our town

you mean um

yeah so the function just one yeah

so this function F is the set but it's kind of a heuristic because it makes

you're metrics this difference metrics makes it

positive

definite

so it means it's flits

the sign of all the negative eigenvalue

and it's as i

why you want to do so because

um

we want to use some you what you want to sound of K

of possible value a positive value so you want to

of for example here you some of like

oh what okay of possible uh a positive deviations

and you kind of want to do the same here

so you make this met the difference metrics positive definite

and then we can show that this is an upper bound

on on the other quantity

so so here you did yeah on the operation dot um duh free to sign on the whole new eigen

brazil has you and the expanding this right

uh

so what are we with computers difference metric then we do a eigen decomposition uh_huh and then flipped uh uh

the sign of or negative eigenvalues

okay so you keep on the positive ones unpleasantly

yeah

okay

an exit that they're actually i

eigen vectors like the directions are kind of this

flipped

or like when you have a

eigenvector with a negative

eigenvalues and you few flip it

simply but you do not like

change a lot but you only flip it

because you are only interested in positive contributions

yeah yeah

okay

thing

oh uh while you're

you know i need a lead to the chunks

you know uh really all you have some

uh

no particle you can use clustering to find some similarities as well no you you you can you can simply

use

the channel size of one that means that you use

each trial

that each trial is enters the channel

you can do for example this we can do this uh try to wise

well you can put

the

trials from the same class which a subsequent

together in one chunk

so we do not apply any for clustering we only like put some together

overall we we do it for each trial separate

my question about your

yeah money consuming and that at different me

no this is was only one uh one one test

session

okay

uh the question what the clustering of the chunk sizes

so if you

if you use the chunk size which is not a than one would you could

the look

average old part of you know and stationarity

and yeah so this is what this was the idea to use chunk sizes because

with you use chunk size of one then you like detect

the changes on a small uh times K

if you take that

chunk sizes then

you time scale

we also be bigger because we average out the changes which only a curve for example in one trial

so we we tried different

chunk sizes and like select is the best one using cross-validation

oh