From 0802d1489a2db3d7a9ea9fb904c76e2b40f869b0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Simon=20Ryg=C3=A5rd?= Date: Sun, 9 Jun 2024 17:08:10 +0200 Subject: [PATCH] blog: 240311 dynamic time warping --- .../240311_dynamic_time_warping.md | 59 ++++++++++++++++++ .../dtw_time-length.drawio.png | Bin 0 -> 11149 bytes .../dtw_time-shift.drawio.png | Bin 0 -> 12027 bytes 3 files changed, 59 insertions(+) create mode 100644 docs/blog/240311_dynamic_time_warping/240311_dynamic_time_warping.md create mode 100644 docs/blog/240311_dynamic_time_warping/dtw_time-length.drawio.png create mode 100644 docs/blog/240311_dynamic_time_warping/dtw_time-shift.drawio.png diff --git a/docs/blog/240311_dynamic_time_warping/240311_dynamic_time_warping.md b/docs/blog/240311_dynamic_time_warping/240311_dynamic_time_warping.md new file mode 100644 index 0000000..4b7468e --- /dev/null +++ b/docs/blog/240311_dynamic_time_warping/240311_dynamic_time_warping.md @@ -0,0 +1,59 @@ +# Dynamic Time Warping + +Dynamic Time Warping (DTW) is a powerful algorithm that is used to measure the similarity between two sequences that may vary in time or speed. It is used in a variety of fields such as speech recognition, data mining, and bioinformatics. In this post, I will discuss a variation of DTW that can be useful for particular types of data. + +The standard DTW algorithm is used to measure the similarity between two sequences. It is a dynamic programming algorithm that finds the optimal alignment between two sequences. The algorithm is based on the idea that the optimal alignment between two sequences is the one that minimizes the total distance between the corresponding elements of the sequences. + +The standard DTW algorithm is defined as follows: + +Given two sequences X and Y, where X = {x1, x2, ..., xn} and Y = {y1, y2, ..., ym}, the DTW distance between X and Y is defined as the minimum distance between all possible alignments of X and Y. The distance between two elements xi and yj is defined as the Euclidean distance between xi and yj. + +The standard DTW algorithm has a time complexity of O(nm), where n and m are the lengths of the two sequences. This makes it impractical for large sequences. However, there are several variations of DTW that have been developed to address this issue. + +## Sakoe-Chiba band algorithm + +One such variation is the Sakoe-Chiba band algorithm. This algorithm restricts the search space for the optimal alignment, which reduces the time complexity of the algorithm. The Sakoe-Chiba band algorithm is defined as follows: + +Given two sequences X and Y, where X = {x1, x2, ..., xn} and Y = {y1, y2, ..., ym}, the DTW distance between X and Y is defined as the minimum distance between all possible alignments of X and Y, subject to the constraint that the alignment must lie within a band of width w. The distance between two elements xi and yj is defined as the Euclidean distance between xi and yj. + +The Sakoe-Chiba band algorithm has a time complexity of O(nw), where n is the length of the longer sequence and w is the width of the band. This makes it more practical for large sequences. + +## Fast DTW + +Another variation of DTW that has been developed to address the time complexity issue is the Fast DTW algorithm. This algorithm is based on the idea that the optimal alignment between two sequences can be approximated by finding the optimal alignment between a down-sampled version of the sequences. The Fast DTW algorithm is defined as follows: + +Given two sequences X and Y, where X = {x1, x2, ..., xn} and Y = {y1, y2, ..., ym}, the Fast DTW distance between X and Y is defined as the minimum distance between all possible alignments of a down-sampled version of X and a down-sampled version of Y. The distance between two elements xi and yj is defined as the Euclidean distance between xi and yj. + +## Custom DTW + +In some cases, it may be necessary to develop a custom variation of DTW to address specific requirements. Such is the case with the data that I am working with. I am currently working on a project that involves measuring the similarity between two sequences of time series data and, most importantly, creating the mapping between them. The sequences are of different lengths and may vary in time or speed. The standard DTW algorithm is not suitable for this type of data, so I have developed a custom variation of DTW that is tailored to the specific requirements of the project. + +### Assumptions + +The custom variation of DTW that I have developed is based on the fact that both signals are a measurement of the same phenomenon/physical process. + +1. The sequences are of different lengths. +1. The sequences may be shifted in time. +1. The sequences may be scaled (compressed or stretched) in time. + 1. The scale factor is constant. + 1. The scale factor is not known. + +#### 1. The sequences are of different lengths + +The sequences are of different lengths. This means that the sequences may have different numbers of data points. In the figure below, all three sequences describe the same series with the same start and end points, but they have different number of data points (blue: 5, red: 4, green: 3). +This can, for example, be due to different sampling rates. + +![Assumption: Lengths](./dtw_time-length.drawio.png) + +> ℹ️ Note that the points are offset for better visibility. + +#### 2. The sequences may be shifted in time + +The sequences may be shifted in time. This means that the sequences' data points may be offset relative to each other. In the figure below, all three sequences describe the same series, but they are shifted (red: -1 time unit, green: +2 time units relative blue). +This can, for example, be due to unsynchronized clocks. + +![Assumption: Shift](./dtw_time-shift.drawio.png) + +#### 3. The sequences may be scaled in time + +The sequences may be scaled (compressed or stretched) in time. This means that the sequences' data points may be \ No newline at end of file diff --git a/docs/blog/240311_dynamic_time_warping/dtw_time-length.drawio.png b/docs/blog/240311_dynamic_time_warping/dtw_time-length.drawio.png new file mode 100644 index 0000000000000000000000000000000000000000..87de46a8bba879e36299e24dde29fedef3a46c82 GIT binary patch literal 11149 zcmeHN2|SeR{vQ&hL`P*`I;Ux|3^QXHT7;QMDf<$oG4mReF*A*^uS4Z@k{p$#h*F(W zB$B1FbV{@-CA1k8l8DL@;{S|!r>JwCbI%mk5rTd@s+P`%71II+D0Jg77XfyA2f zuaFpRPlgX0iNzx^7y=C2AS$Xg~5@QOkX;d!=oXjAS7lSv2 z%p&@us0_f4L8J6hx`1Tr$zpf|rd4RP4oVxPtB=)1VQ>au{`wAh86bwlSE``@@=&sYD8k=nZb@M4;*DDptUfnm{4mDpc6~Za9LI6J);C-`>Dm z54YCK#*sw$Qkp=fcYqm-$n>;ikVlP74&aN`LF@2kk^}h!i~*KUps)nXz^n2Iw$vb4 zA2iUFrwhfGN``#+b74j{gF$0cnd6Bh2AvK9X&fVw#bWr6XY*jt;LX9pm|)-ICh-Wi zBqHr=sw0)m2C>8oS{z27uO5%v9HLS@1uLWVfTwwidlLoB@J@U@iDZU9fBqb}>qt*p#0QY6gMKWwQc->H>+pLEbWO`0D7x_x_)Q0c`-&JjVqB zOeDeqNckk2C+?3p3Gl!^o>+fgS^Xju;2fLC{t%sP%*zj8PZEvj<3lBlH~Dz~@QnbG zze<+R@$zRaJ;su{zhOuLjRnz&Y^vY*wEQjOPHZf`vp>7siUE4W7=L476<~ks>V4rx zABHcB1i`cZm?!_qC3SE>EKjmvb8I4u0WM7_bB{*S?{R(=aB!8M zFyLr3P_%i8`#Z@_^kMQVxd%0XS0GFPdY}n%ABn+|i4e|%7p>!~0vORF;UISpF!D`G zgCH^mp`rta%9wx_bg^-j;fsj+riB3j6kcde=%@b`23|$|&ld))_V3i9zd~UE0Q!r7 z1J#yC83P4)ZDlRd9RSC1O`k}kQg|4c1Rx0n`o0N(#cV2oX2$SRwcw+}B415`nKVR(4>K(JGOAC?F<`pr5E474Y#Sbw+9g8L%z1)yXy zcNQF+ViQ1q@VN4~LI4;+dlw)8LGhT(euA_AzbT(E0%RuelfZA2*-ru;|EnA&c)Aui zN`QbSwWHwH{%stk3*pE*0MLB(?1RI&WA*eVf;wX%&7XZ(7rYkv^4SLqx9LgkC;&UZ zz@mJ@SSb20Yf+QKQD3zv^k2bIf8_Pm-_229ls+AUzk;K_E9rgA5l7C;1nk0_GV7_pa z_Ftj&!8(2u)Hhii^;M6WJf&~+>%~9#ZYuZ=^7#Yx*q3>uqk>Q3`~i3Z@4puKpHiin zq8J21Ja9GM*e+mGM{2N@U7gak(i=7_ciq1$FFdDS&saG-!R(58l*aD6NB7REG1`LE z%1=J<;L35!iX=b7l@vqyz2_Fz>RBGpk=*}Q=3LdeZglfp_r(?`Rns%1MI!AXcg0h-Ppn+wd=VvblkLVumK-kAwB?cw4mS-f^(xdagoH@u8$2*8FQh~$O`dU+U1frd-wHZ zNwI2F4Wzyss%kityTX_>SIi_*3mR<5qmVp3D}SkLVx|`q)b)kSx4TRgLi>tL4S%~f zHr&wmK($)|UFkvg#x>Yw?ikYa+WF;@Da9$v_Nqf;obQv|=pS$25l+a7ak9a2 z7*#n3Z1?%oW}dG!w`Iqc=@Vp6Kiat~^p~y0=S5o|J<|5;QTv{9v+QEZvZD#7FQ49L zN54Ehg#N&#vd2R5vMPYH+bMId*=Y!$L z0}>W_`eMpM%0C?nOjvThsVJ+E(w8QY)-k^;`S4-kdt13>KX~Q57*t4-D{byE;vOo< zJbrImCS_WPeed00%N0+2xSCOE;Usdch9RQzJiI<8J5Ooxhi0!G3ahya^r9Yx_c6g; zHBVJ8e|So%L8gZn<}LGcetQdMY(vn;e)m;qniq6D-TH}?oo=S zkVSfvWN)&MlUklk#celEdAT%N6S-cFF3w>THa}q8!FFD4acsPr@}lpoM#dZ;l62^i zOUjb8-geUiGZIDb*4su^54^eBs}iEn`{PcZC#!IHu0m6;karSwO40)V8|t?fyBfy! za@5GkxvCId-UEO2S?1t(1E`dhb&I0oKJLfpDCH6#k;+RE?T`}ZOk$qs{iV&KIjfe8eE4{4lAoMXp@tHQGF-13!>;W9!@JTU`~D%75+i zkmWwuZOPkfwq+FMfYHj<+lk!5Q&QbCWRkZ`Njl2xx_6_#q;HxIkubsMTiwc)&IoHK3wZGsTTjUfLUDA=OxUkE4R&rNK zjl|}xC#^ZHD<#F}9aGg*eRl{KIxjl5w}~@X&MD<^c#G~Ut=%bcv-?@D7;8CdZwuGd zD~I-Op2r#a0*2{>*t+fz-$4mdN10gQp8VwR*=j4f?c~#kYSpUxKQ?}pb9Z--+Z*_f zC>5^qsCW=nm1B-bk&jY8+U0edJ;E?o$mj@E0-gUA>--&`B_9`M57QSZ5>Qttnn_95-gKHULX}eXc;iiezEKcqAve;Z zD%~#{6;AJ$)=7I9ZQHuENm;?zjy#i|)`~CpR^PhDHU7xceh2!9&8~MNhM4YqN)1Kk z8|?=^T+a%hs;hrRc7*9@R4~_$5UH!Nmh0uY@Kk0jXWqr&_nKs+CgE+$2cclcpgYU! zBjaEDI>i+x+27jN7u8Xlyk+Ws4_{d$X=P8TkK4HC0)~d>FS$3;9p!ddJJh|vee>=b z`qkslbpzgKr>Wl_I(;ZPaO(bA|4W^=6^U)9A3oWW-w`fM`#CAxi+*R`o-~7*2kWBj z4cehzi!gLvcKUV4bnn%|M%&UhbcTfxN7h@X6g||xINfixi(h;3sqXhbxp^rxug*`V zPPKPDJYQ{5S5#6)bEg@#*W*!R+R8R1h2*W-1{Y_zbPjf$qXZ{OSP7*~5gDGNTA8}Y z@$EbR8oB#(XVht9hlgX;P!ub|;4-OxTYQM&1r=v*e|5UYf))|YhuHdxj+ttwMjS4z z=dPb^jSMEf7#iA`gFUgA;u~xVQhL(eDm2iB@NZ-*S{kkHW677QT6U&(9KuoH}8mN2Cgu; z99eA`rRLw4QDapjo@sOK{SvJO{Vc%U@bSb#jdPKi70Z3A#l?o%{_2@ePGz9hzCZV1 zR+4zg58R-^kIIHonYsj`lZt=W?l4KURC94t3*VO*N8Vkg&hfA<6zNpsTA`!NQ}GJ@ zL6&m4!Yk#{n&V{qia6En%TKUH?PumVs}*t1^sLfOup0T;cKcOjR@(LL(NkQ@T0bh= zZ7`OQ#Pz&J%5))(FE&P5sGkUOhlY!bIE>?c=|J5N#?)`_>WcGgtlikB?d=3tdSb5Sj_PNnDB7?6{}qnpW?f!ctQ0YBnS^YcwM`jFZfG*`ire{jAg@tOQm>S%Doc zOY?YcZq1^cix*w0x;l94QdbIkoIc8j}<`LnhXJu7;7yW$j z;I5`7qDSKNkmCe9RaeEz>S{ADFLetG3o92Fh1^FXVTJhp16AzwtgKrF0lM1S5ru^~ z;j}i{xjEaW=-fYxmtDQh24JpZvDSfWaH^cNgoGVu&!X!){N!IbeV@4?vMu6K$0`qx zh4Jz6ck1iSxmcwP;e@T5LqZfCvt5m8Gz|_T`9!?bEyeV!`6sP!UC|;sO>>B5e_YAZid{?jtV>=%~}_I5M5@gZEy}J@?#u&+?yh?|W(P zZVrkQ)h1#v7)2*XTMuv-fPa?UcyK*)Q%M`##&A6xtTDw`)qlfauxEI7UcAugAeaeZ zutXcl6PADrW^s8~qAiv{uwz0r9)}eM9sxa+MTdi6h%RX(;0PpL98njKUrNAXiB=>N zaKRDva72=igrCZQ!cal3tXP=Iq+$v726{LkO3Rf>gTr_%ZZMW;4SqX?@gOF+12VXh zJ;B8b{Nr){cmsbUZSZKtW;3DXP@pr+L!=Om^oWLltmW*u)YSz`umR6ZcohV04iIe> z3+Y1VP^0u<7GNjf^^Eik0Ldnp!wLmV_INx=52t5fL^RMN7@L5WH%qlA80Z=6;Q`H) z8bsy5Lxw}N5D9U(Fi^>$f~10x28c&xNT_%N2kt6E2WwaRXnTr-x0{zU&q<;ZUJM(O z@X(=1mgpqqqZ2>VgR&qrj;84IOM)E?ny+ zRL$%EO$i8NMN7;vOvj8qPX6*ZhKh{Z$~hYDkZ zbr0)=5IkvA=G)ZeFr5c{32|Cuf{|o;2)82yGlHcf;|+nOk%@;=rOfC`xWQC9D@xLT zko2}1rOM>6SU}zon@JjmUXfU`ftZLxOZAA31AOUU@B3OCoY(R6?}!5mKy(mEWk#TS zln}VQ7^z}8tcWoBV4wq+w2lgfd5|ZY3fz@w8wETG(1LlPOz;a29=QM^RzWr_CW`|o zVXQE~24Z?}42Z*nP;Yz3Dh5gKSEW?zz=a_Yxjaq`n7ULV(ufR$y03u=YTl?f&VV;T zd4h*I0!pN!4#;>d8xi+0Mglyjjo&~2cen!a!}^E9=vGL40DaP!R4x~$4OjVa`$*(L z^tXcLjlcX+O%Kte!8;TwrEwr8l?O)-56gEc_x;Km8AI+YkRyiJn}|*UwYMPx+=#)X zBXL;~92$i7`Z!Det3%>ZHq{u(Sp`}@D7&81ffdLdKh||6gnEww9gN5pU9~g+Z4`9(~00y-D8{%*PZ4nn7 zQVJkx#f2IO3ddndpUQ+8NDWK_B?$!bzBMR|c`zu=tk8x~m`+E^32P3-g=49KgQW#h zO|TJAAI!T3)>s2uK>4SDfYyy@XTXN)oK1a`*#?8={g5cZcq9|QM%&jl1WwPGWQ1Zm ziWSmI`%@(~$@0;`M`ScsP!JbFjq>5LM7q#VmRZn3dx!(Q?LYr(nZ;6i1B1qG_v;qL<6NA|HIadJa6#r6oB8`YA;wZ3_)VKaphB6vqD8gtk6q?#U zjiC%6V>$^GG;i&Fj0u56L&Nu#IzvmEKe}0$9*c}L)M(iI;6d5>!OhQ~R!V#xh9ZpI zLyU$}0{=0`SAUkFf{bYp?VYCyfyM?Tu*ZMT=HbtbsF68pR7BLLJofoq3^fw^Mnyz@ z%(>71SVX-AeWRf|$G?4E@}Jx8ekKt`7&skf_qPiJ^(!Xb@=!{dm|CW#Tk3bP zXn#ZLS7Xlx=j)GGk_}w5WK-I#fCo0d8B3Ip&8)ILa6M(`Sxe>BUus59TC+7U|4`P? zN!!aBtek%+&1x?;aW1RVt$FsHkEmdmcNJFCQk=Tsj^#3$EdN_I1ie{fs`YKpk3TvkukUEk_HcQ5_k>xq z4c++|;c|l#8`T}P50fi|Q+_#dyKOD8SwNPXWGbijG&Zi;g>Ao3LAWJr&y!chr8(y> zRTrIfkCMwN);qav8!m`qLD-%ZU>j>cBj?7~$Ede|G$fVzSx1re*Uej)x!XjIY(Y|b z*l|XOxvABZOzlXUZXCXEjCoF&=Bafw(azs?zC0JW$<=8R!>oG77rJF=3yn8prgm@!!Z|24pQ|0iU z*vY$mt~Q@*rY!I``)ZtL)nb}?LjtD0O<)stPu)eGpK;guy})aa zy|?%5wX{%ydR1@hq=N}Ef79u|EjXJ#&T}ztVN73Am);8V4D$RD5UZ=a|@Xed~1J=k1kgC3{bv?i|_QDDv*OCM`(9l)mo{t8Z#z3q zU60J^ys6FheLWMPA&9nZ~Q$aR@FBz-akI| zX1U-QE5iEBCSCW+%|53GL}s&2PL4X8ynjz?pYL9GecGg@5v4M1D)9@{w}>v!4BTxY zQj1-sHJ=azd=4^Td^3{ZcM^ ze&tT*&AHj0GsEIiNOzkOWlxGc#(!qpUv6#yl=v@hqayqFP`_stbHtA zwWX}tzPu%8;F^&mdu|{nWaY(|hPg$lwQeMHa%A=WR?e&H+5FQQxHiD!Sz_wp`eo5aT-7pC$7L>$ ze!llk^21j;r)i9UC$G}!wa>@qAL#3v9j@8^o$n!jZhM;M<;TRNBV0>IjsWh>Y9v{w zk9Ys}=MW{Uw2s!vHJfsOFP-mDF3=~(xwNhYEGgxxN&LF+<`*4zOs?6P<-cy5%QTA5 zdK!n$r>LGj9&95%V8!QCJRA3anK9LtFkZMN#b>~xA#OJ|*gLVBrDCpA6eRCcmn+(B zSKA#~BCFd*7GKF@9J-}oe)*+GLWy=$HEVkw^h#Uc{j{)jm$Vl-u4YPUq2Qj#cNyIN z^2y>j^BW=AsVeuq%*91VOLLfoR^4?onnEqcJ&jJX{d&rY8(UvJ_Mo@idyq{@S1)~B z7IJBZKEJN{swVGwq`bh~_-YRPeT{W`SmEZrhB)7YTjN=6I|K@r`C99u2ONnyMf(** zv0sjfXlacx6@KwelA5UGXSJYvcN=Ert?RSb^b^Cpow7GGbNO{R&)P@7U#z{MaZi12 z@;*SJ_?q2NJ8W9U(l8Z!!{6U07L}Eo)L;VVEYc||TYcETC5qW-)+^$6gvM#}M2>Jcp&Un<@EXss`28Odo;A~ zw3amkZaTT3Wu9-P>NK&UZ2A1t_3VAcDkw6}J%&-ObjHm#Kd{9^ePA!e7uxoX;%>{w zEgg&a`+AvV_KPzsDLxex>)bDF<_IjzRP!_B+!Ll@(>*iv##GN(k)wXaURFzxv8^)S zeg{UlMUT8$#aTuu6!teeJEu05&LJvR>)WJ-K3JdB^=e5~L1AIdcZ&%X9A?wft*v<4zG zK~^W8Ju_?3S&z$i73|M>^%N8oT+s+0A7&-a-(3c#UzWcd$A0)cY2(I;h2ACha~BTI z`LR{gi>`+kYik`Cj3YHE+5!b^+3^XvsHDY@=JsqA`Mu3z}!C37vs2Nbd@?ygUol}Mqem}yMj<8~zYLBq2RgPor~ zeL8dAya_N2-#HUk9{eE7-*&_8cDr+JJr8<&+8o{7R8Wa0z-&+CLjkivx&$nItfiRm ysykL$DX(v~_37dT$rfUM#@6IaOD&lJ8TV%-6SV-DCh#>o#>vjjw%9sg%|8Hwo=UO+ literal 0 HcmV?d00001