[AUDITORY] Hannah: dense audio-visual person annotation in "Hannah and her sisters" (Ozerov Alexey )


Subject: [AUDITORY] Hannah: dense audio-visual person annotation in "Hannah and her sisters"
From:    Ozerov Alexey  <Alexey.Ozerov@xxxxxxxx>
Date:    Mon, 14 Oct 2013 17:30:27 +0200
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--_000_B02BDB2D74045345919286EDF0DA730A0A9BD8C8E6MOPESMBX01eut_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear list, [Sorry for cross-posting] We have created and made publicly available a dense audio-visual person-ori= ented ground-truth annotation of a feature movie (100 minutes long): "Hanna= h and her sisters" by Woody Allen. The annotation includes * Face tracks in video (densely annotated, i.e., in each frame,= and person-labeled) * Speech segments in audio (person-labeled) * Shot boundaries in video The annotation can be useful for evaluating * Person-oriented video-based tasks (e.g., face tracking, autom= atic character naming, etc.) * Person-oriented audio-based tasks (e.g., speaker diarization = or recognition) * Person-oriented multimodal-based tasks (e.g., audio-visual ch= aracter naming) Detail on Hannah dataset and access to it can be obtained there: https://research.technicolor.com/rennes/hannah-home/ https://research.technicolor.com/rennes/hannah-download/ Acknowledgments: This work is supported by AXES EU project: http://www.axes-project.eu/ Best regards, Alexey Ozerov, Jean-Ronan Vigouroux, Louis Chevallier and Patrick P=E9rez Alexey Ozerov Technicolor Research & Innovation Alexey.Ozerov@xxxxxxxx --_000_B02BDB2D74045345919286EDF0DA730A0A9BD8C8E6MOPESMBX01eut_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:= //www.w3.org/TR/REC-html40"><head><meta http-equiv=3DContent-Type content= =3D"text/html; charset=3Diso-8859-1"><meta name=3DGenerator content=3D"Micr= osoft Word 12 (filtered medium)"><style><!-- /* Font Definitions */ @xxxxxxxx {font-family:"MS Mincho"; panose-1:2 2 6 9 4 2 5 8 3 4;} @xxxxxxxx {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @xxxxxxxx {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @xxxxxxxx {font-family:"\@xxxxxxxx Mincho"; panose-1:2 2 6 9 4 2 5 8 3 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;} .MsoChpDefault {mso-style-type:export-only;} @xxxxxxxx WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue vli= nk=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span style=3D'c= olor:#1F497D'>Dear list,<o:p></o:p></span></p><p class=3DMsoNormal><span st= yle=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><spa= n style=3D'color:#1F497D'>[Sorry for cross-posting]<o:p></o:p></span></p><p= class=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></= p><p class=3DMsoNormal><span style=3D'color:#1F497D'>We have created and ma= de publicly available a dense audio-visual person-oriented ground-truth ann= otation of a feature movie (100 minutes long): &#8220;Hannah and her sister= s&#8221; by Woody Allen.<o:p></o:p></span></p><p class=3DMsoNormal><span st= yle=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><spa= n style=3D'color:#1F497D'>The annotation includes<o:p></o:p></span></p><p c= lass=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></p>= <p class=3DMsoNormal><span style=3D'color:#1F497D'>&#8226;&nbsp;&nbsp;&nbsp= ;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Face tracks in vide= o (densely annotated, i.e., in each frame, and person-labeled)<o:p></o:p></= span></p><p class=3DMsoNormal><span style=3D'color:#1F497D'>&#8226;&nbsp;&n= bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Speech seg= ments in audio (person-labeled)<o:p></o:p></span></p><p class=3DMsoNormal><= span style=3D'color:#1F497D'>&#8226;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb= sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Shot boundaries in video<o:p></o:p></span= ></p><p class=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nbsp;</o:p></= span></p><p class=3DMsoNormal><span style=3D'color:#1F497D'>The annotation = can be useful for evaluating<o:p></o:p></span></p><p class=3DMsoNormal><spa= n style=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal>= <span style=3D'color:#1F497D'>&#8226;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n= bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Person-oriented video-based tasks (e.g.,= face tracking, automatic character naming, etc.)<o:p></o:p></span></p><p c= lass=3DMsoNormal><span style=3D'color:#1F497D'>&#8226;&nbsp;&nbsp;&nbsp;&nb= sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Person-oriented audio-b= ased tasks (e.g., speaker diarization or recognition)<o:p></o:p></span></p>= <p class=3DMsoNormal><span style=3D'color:#1F497D'>&#8226;&nbsp;&nbsp;&nbsp= ;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Person-oriented mul= timodal-based tasks (e.g., audio-visual character naming)<o:p></o:p></span>= </p><p class=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nbsp;</o:p></s= pan></p><p class=3DMsoNormal><span style=3D'color:#1F497D'>Detail on Hannah= dataset and access to it can be obtained there:<o:p></o:p></span></p><p cl= ass=3DMsoNormal><span style=3D'color:#1F497D'><a href=3D"https://research.t= echnicolor.com/rennes/hannah-home/">https://research.technicolor.com/rennes= /hannah-home/</a><o:p></o:p></span></p><p class=3DMsoNormal><span style=3D'= color:#1F497D'><a href=3D"https://research.technicolor.com/rennes/hannah-do= wnload/">https://research.technicolor.com/rennes/hannah-download/</a><o:p><= /o:p></span></p><p class=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nb= sp;</o:p></span></p><p class=3DMsoNormal><span style=3D'color:#1F497D'>Ackn= owledgments:<o:p></o:p></span></p><p class=3DMsoNormal><span style=3D'color= :#1F497D'>This work is supported by AXES EU project: <a href=3D"http://www.= axes-project.eu/">http://www.axes-project.eu/</a><o:p></o:p></span></p><p c= lass=3DMsoNormal><span style=3D'color:#1F497D'><o:p>&nbsp;</o:p></span></p>= <p class=3DMsoNormal><span lang=3DFR style=3D'color:#1F497D'>Best regards,<= o:p></o:p></span></p><p class=3DMsoNormal><span lang=3DFR style=3D'color:#1= F497D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span lang=3DFR sty= le=3D'color:#1F497D'>Alexey Ozerov, Jean-Ronan Vigouroux, Louis Chevallier = and Patrick P=E9rez<o:p></o:p></span></p><p class=3DMsoNormal><span lang=3D= FR><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span lang=3DFR><o:p>&n= bsp;</o:p></span></p><p class=3DMsoNormal><span lang=3DFR><o:p>&nbsp;</o:p>= </span></p><p class=3DMsoNormal><span style=3D'color:#1F497D'>Alexey Ozerov= <br>Technicolor Research &amp; Innovation <o:p></o:p></span></p><p class= =3DMsoNormal><span style=3D'color:#1F497D'><a href=3D"Alexey.Ozerov@xxxxxxxx= olor.com">Alexey.Ozerov@xxxxxxxx</a><o:p></o:p></span></p><p class= =3DMsoNormal><o:p>&nbsp;</o:p></p></div></body></html>= --_000_B02BDB2D74045345919286EDF0DA730A0A9BD8C8E6MOPESMBX01eut_--


This message came from the mail archive
/var/www/postings/2013/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University