0
Research Papers

A Cloud Service Framework for Virtual Try-On of Footwear in Augmented Reality PUBLIC ACCESS

[+] Author and Article Information
Chih-Hsing Chu

Department of Industrial Engineering and
Engineering Management,
National Tsing Hua University,
Hsinchu 30013, Taiwan
e-mail: chchu@ie.nthu.edu.tw

Chih-Hung Cheng, Han-Sheng Wu

Department of Industrial Engineering and
Engineering Management,
National Tsing Hua University,
Hsinchu 30013, Taiwan

Chia-Chen Kuo

National Center for
High-Performance Computing,
Hsinchu 30076, Taiwan

Contributed by the Computers and Information Division of ASME for publication in the JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING. Manuscript received March 20, 2018; final manuscript received November 26, 2018; published online February 4, 2019. Assoc. Editor: Monica Bordegoni.

J. Comput. Inf. Sci. Eng 19(2), 021002 (Feb 04, 2019) (7 pages) Paper No: JCISE-18-1067; doi: 10.1115/1.4042102 History: Received March 20, 2018; Revised November 26, 2018

This paper presents an experimental cloud service framework for design evaluation of personalized footwear in augmented reality (AR) via networks. The service allows users to ubiquitously perceive themselves trying on three-dimensional (3D) shoe models in a video stream. They upload a clip of feet motion recorded by a commercial depth camera to the cloud. A new clip is generated to display the try-on process and made available to specified receivers via video streaming on a mobile device. The framework design emphasizes making most use of open-source software and off-the-shelf technologies commercially available. A prototyping cloud system implementing the framework demonstrates the practical value of virtual footwear try-on as AR as a service (ARaaS). This experimental study realizes the idea of human-centric design evaluation in modern e-commerce. The cloud framework may provide a feasible example to improve the usability for real-time applications of AR.

FIGURES IN THIS ARTICLE
<>

In addition to functional requirements, modern customers start to consider a product's emotional appeal invoked by affective attributes of the product design. This trend is particularly evident in the apparel and fashion industries [1]. Consumers prefer design elements that reveal individual tastes and allow them to distinguish themselves from peers. It is critical to appraise whether and how much a design fits its users in most fashion products [2]. Traditional design tools may not offer a sufficient support to fulfilling this need. Although computer-aided design (CAD) technologies have accelerated the product design process by automating construction of product models and subsequent manufacturing tasks, most existing CAD tools were developed for engineers or people with a technical background. A product's end customers normally have difficulties in accessing these tools or using them. Their feedback to the product design or personalization information is not available to designers in the early stages. Not surprisingly, personalized designs have not been effectively implemented in practice and remain as a difficult task for most companies.

Augmented reality (AR) combining virtual information with real scenes is considered a more suitable interfacing technology for product evaluation or facing customers. AR is particularly useful in evaluating the design of fashion products, such as apparel, footwear, and wearable items, which is performed on a human body wearing the products [3]. Multimedia data with various formats can be integrated in an AR environment. Equipped with sensing technology, modern AR applications provide highly interactive functions that intelligently respond to their users in real time. The idea of design automation has been realized for free-form products using CAD techniques [4]. This progress helps realize the personalized design of products related to human body [5,6]. Implementing design personalization in an AR environment provides better interactivity than that of CAD-based solutions. This advantage is further strengthened by the recent progress in commercial depth cameras. Precise geometry of a real object can be quickly estimated by one single depth camera, or even using a smart phone containing such a camera. Novel AR applications in product display and marketing recently have been developed and deployed in practice, with a focus on on-line shopping or E-commerce [7,8]. Among them, virtual try-on technology of wearable items has received much attention [3,9]. This technology enables users to see themselves wearing different clothes, eye glasses, and shoes without physically having them on.

Most applications developed to virtually try on garments have used video streams to demonstrate the design result [3,912]. The values of virtual try-on technologies on new garment design and sales have been confirmed by Sesana and Canepa [13]. A commercial product implementing this idea is already available in the market1. Vitali and Rizzi [14] developed a virtual environment to simulate tailor's work on designing and making garments. The environment was constructed using open source libraries, commercial sensors, and a VR goggle. Fewer studies have looked into virtual try on of footwear. Antonio et al. [11] developed a high-quality stereoscopic vision system that allows users to try on shoe models while looking at a mirror. They reported that footwear customization in the AR environment improved product quality and increased consumer satisfaction. Eisert et al. [15] adopted the similar idea to visualize customized shoes in a large display screen that show the input of a camera capturing legs and shoes. A motion tracker was applied to estimate the three-dimensional (3D) positions of both shoes based on silhouette information from a single camera view. Greci et al. [16] developed a haptic device simulating the internal volume of a shoe through a set of mechatronics mechanism. This device allows the user to try on the shoe with a tactile sense, and thus to find the best fitting one.

The current technology of virtual footwear try-on still has limitations in its practical use. First, most of the previous augmented reality applications adopted specialized equipment like a magic mirror or large display device to demonstrate the try-on process. Not only is such equipment expensive, but it is also inaccessible to most customers, particularly to the ever-increasing online shoppers. Second, the system performance in real time is another problem yet to be fully resolved [17]. To identify human feet in a video stream and to precisely estimate their 3D positions involve heavy computations. Our previous study reported that a noticeable delay may occur in fast foot motion using a single commercial RGB-D camera [18], thus deteriorating the user experience.

To overcome those limitations, a feasible solution is to implement the virtual try-on functions using cloud technology. A cloud performs most of the computations required by creating the try-on process. It notifies the users once the computations are completed, without them being occupied. Displaying the try-on process on a smart hand-held device in an asynchronized manner helps popularize this idea as a service on demand. The simplified framework shown in Fig. 1 presents this idea, which may realize the concept of AR as a service (ARaaS) recently emerging in industry [18]. This paper presents a cloud service framework that supports virtual footwear try-on in augmented reality via networks. The framework enables users to evaluate the footwear design in a video of feet motion uploaded to the cloud. The video contains a series of image frames that provide the depth information of real scene captured by a commercial RGB-D camera. A set of cloud computing services accelerated by graphics processing units (GPU's) perform most heavy computations involved in positioning shoe models on the moving feet. The rendering result of the try-on process is made available for downloading and displaying on a mobile device as a streaming video on demand. A use scenario demonstrates how the proposed framework accomplishes online design evaluation of footwear. This work provides the end consumers in modern e-commerce a new method to ubiquitously evaluate the products related to human body. It may also provide a feasible solution to improve the usability for real-time applications in augmented reality.

A virtual try-on user can see himself/herself wear shoe models in a video stream. The models stay on the user's moving feet during the try-on process. Automatic foot recognition and tracking in an image frame is a key to accomplish this function. Tracking with markers is poorly suited to this use scenario. Those markers usually do not offer sufficient positioning accuracy or tracking robustness when lying on curved objects. The existence of special patterns in space deteriorates the visualization quality of the rendering result. Markerless tracking of human body is highly desirable in 3D virtual try on.

The main problem of estimating the 3D locations of a human foot without the use of markers is that a foot does not contain sufficient feature information for automatic object recognition from a color image [11]. Recent advances in depth sensing technology provide an effective solution for this problem. However, virtual try-on of footwear using commercial depth cameras still has to overcome several technical difficulties. The current cameras have a limited view angle, thus allowing only partial data to be captured from a moving foot. The depth image acquired by the time of flight principle often encompasses noise and/or missing data. A typical approach of pattern recognition in 3D registration is to search for the region of interest in the depth image with a predefined depth template. It is highly difficult, if not impossible, to obtain the foot geometry of an individual user before the try-on process. To identify a particular foot from the depth image of a real scene without a reference is a challenging task.

We made the following assumptions to simplify the foot recognition problem aforementioned. First, the try-on process must be conducted in a controlled lighting environment. Only one user is moving his/her own feet in front of a depth camera without other people or their feet existing in the background. No pants or other garment parts block the user's feet during the try-on process. The feet are moving at a regular walking speed within a distance range of 90–120 cm from the camera. The goal is to precisely position a shoe model with respect to the 3D positions of the foot identified from the depth image. The geometric shape of a human foot is clearly different from a shoe model. To directly align those two distinct shapes is thus not recommended. We adopt a template foot model as a tracking reference (see Fig. 2). Assume that both models remain as a rigid body during the try-on process. An affine transformation matrix M, predetermined prior to the process, defines the relative position between the reference model R and the shoe model S. It controls the allowance between R and S that accounts for the free movement of the foot within the shoe. For a single depth image, the foot tracking problem can be described as (see Fig. 3) Display Formula

(1)Mini=1nr*pi·R*

where the depth data P captured using a depth camera consists of n points. R* is the reference model that is trimmed with respect to the instant view angle of the camera. A best match Ω between P and R* can be determined using 3D registration techniques. Λ describes a 3D coordinate transformation matrix determined by Ω, and is decomposed into a rotation matrix Ψ and a translation matrix Γ. A point pi in P corresponds to r* in R* after applying the transformation specified by Λ. Ideally, 3D rigid registration methods [19] such as iterative closest point and robust point matching algorithms can be used to solve the best match matrix Λ. A number of additional functions are incorporated to ensure the visual quality in displaying the try-on result. They include preprocessing of depth image, foot tracking, and occlusion processing.

  • Preprocessing of depth image

    A regular depth image obtained by Kinect V2 contains 512 × 424 pixels. To identify the foot region directly from such an image without prior conditions normally requires a lengthy computation. A preprocessing step is thus developed to avoid this problem. The real scene data captured during the try-on process consist of three parts: the floor, the user's feet, and the wall. The wall belongs to the background and thus can be identified by the condition that its depth value is much greater than the other two parts. The pixels corresponding to the floor must lie on a plane. We separate the depth data of the feet from the floor using this condition. The pixels belonging to the background are located beyond a certain distance to the moving range suggested. After removing these two groups of pixels, the output is a set of 3D points K:ki,i=1,k containing the foot region to be identified. The geometric center of K is denoted as kc. Three mutually orthogonal axes exist in a human foot that reflects maximum geometric variations in 3D space. Principle components analysis [20] is then performed on the vectors connecting from the center to each point kikc,i=1,k. The principle components analysis result consists of three orthogonal base vectors v1,v2,v3 corresponding to three principal component values σ12,σ22,σ32, with σ12>σ22>σ32. We impose the following conditions to guarantee their unique solutions: (1) v1×v2=v3, (2) v1 is along the toe direction, and (3) v3 points to the depth camera.

  • Foot tracking

    The next step is to position the reference template to the foot region identified at the first step using 3D registration algorithms. The template foot model R is used as a matching target to search for the region that best matches the template geometrically. We apply the iterative closest point (ICP) algorithm [21] to optimally superimpose the template model on the pixels generated from the previous step. This method involves continually adjusting the position of a first point cloud through minimizing positional deviations with respect to the second one. The template model is changed due to the limited view angle of the depth camera being used. The matching target thus needs to be dynamically trimmed by using the instant view angle. The trimmed model Rt contains l points. Those two models to be best matched can differ in the point number, and assume l is greater than k. The ICP algorithm is described as follows:

    1. (1)For each ki in K, find the closest point ki* in Rt. The set of all closest points is denoted as K*.
    2. (2)A rotation matrix Ψ and a translation matrix Γ are computed to minimize the summation of the mean square error for each point pair generated from K and Rt.
    3. (3)The best match transformation Ω is applied to Rt.
    4. (4)If the termination conditions are not satisfied, go to step 1; otherwise, the algorithm stops.The transformation Ω places Rt as close as possible to M. In this study, the ICP algorithm is terminated under two conditions: (1) the number of iterations reaches a given limit, and (2) the summation of mean square errors is smaller than a threshold.
  • Occlusion processing

    Success of VR/AR applications largely depend on quality user experiences in their practical use [17]. People easily immerse into and interact with a virtual environment comprising of realistic rendering models. Seamless integration of virtual information into a real scene is critical to human's adaption to augmented reality. In computer graphics, occlusion culling is the process used to determine which models and parts of models are not visible from a certain viewpoint. In augmented reality, not only among virtual models, occlusions also occur between virtual models and real objects in a scene. Correct occlusion handling produces the visualization result that resembles human's natural perception and enhances their spatial reasoning. The result of occlusion culling between user's ankle and the shoe model are most relevant to influencing the user experience in the virtual try-on function [22]. The left image in Fig. 3 shows spacing between the user's ankle (real object) and the shoe model (virtual model) that is not supposed to appear. The right image demonstrates the case in which a portion of the fool is incorrectly blocked by the shoe. The occlusion processing result in both cases is not satisfactory, confusing people with uncharacteristic visualization content. The user may attract to those unnatural scenes, instead of focusing on shoe evaluation, and thus receive an unpleasant experience from the try-on process.

To solve the occlusion problem requires a precise estimation of the ankle geometry in 3D space. Unfortunately, the depth camera used may not always satisfy this requirement, because of its frequent noise and/or missing data, particularly around the boundary of a moving object. This limitation prevents us from direct use of the depth data instantly captured during the try-on process in the Z-buffering method2. Z-buffering is a simple but effective method in computer graphics to decide which elements of a rendered scene are visible, and which are hidden. The pixels with a wrong or missing depth value may lead to wrong visibility. The following procedure is proposed to approximate the ankle shape using those pixels under such circumstances.

The first step is to determine the instant ankle orientation. Kinect v2 automatically extracts the skeleton of a human user from the depth image, which entails 25 joints of human body (see Fig. 4).3 In the skeleton, the ankle axis connects the foot point to the ankle point for both feet. The ankle axis thus generated is too imprecise to serve our purpose. As shown in Fig. 5, moving a distance from the ankle point pa (ANKLE_RIGHT or ANKLE_LEFT) along the ankle axis t generates a point cs. The Kinect development kits provide functions that generate both pa and t. The distance controls the range in space from which the depth data are used to reconstruct the ankle shape. A sphere s is constructed with the center cs and the radius equal to the length pacs. A screening step is performed to identify the depth data that lie within the circle. The step can quickly remove outliers or noises in the depth data. The remaining points Pr normally belong to the user's ankle or a portion of the lower leg.

As shown in Fig. 6, the ankle geometry is approximated with a number of linking truncated circular cones of the same axis, which determines a more precise ankle axis t*. The next step is to calculate the sizes of the cones and the axis based on the points separated from the screening step. Observing the distribution of those points in 3D space, we claim that t* lies in the direction corresponding to the greatest variance of Pr, namely their first principal direction. Principal component analysis provides an effective approach to determining this direction. Each point in Pr is then projected onto t* to determine in which cone it is located and its projection distance. The lower and upper radius of a truncated cone is chosen as the minimum and maximum projection distance, respectively, within the height of the cone along t*.

We propose a cloud service framework to realize the idea of virtual shoe try-on in augmented reality (see Fig. 7). A prototyping system implementing the framework allows users to access various system functions as a network service on-demand. The framework design philosophy is to make most use of open-source software resources and off-the-shelf technologies commercially available. A high-level Python Web development framework django4 enables rapid development and deployment of the service. The django server consists of two parts: file I/O and video processing. The file I/O part is responsible for data receiving, sending, and communicating with clients. The video processing part provides the computations required by generating the virtual try-on process. It contains four major functions deployed as C++ applications in the cloud: preprocessing of depth image, foot tracking, occlusion culling, and shoe rendering, and a shoe model database. The foot tracking application recognizes the user's foot from an image frame and computes the location to place the shoe model in the image. The most time-consuming computation involving in the video processing part is to implement the ICP algorithm used to perform the recognition. A group of GPUs installed at the backend accelerates this computation by parallel processing the first step in the algorithm described in Sec. 2. Parallel processing is applicable in this case because finding the closet point in the target set for a given point is independent from that of any other points in the source.

Several open-source libraries are adopted to facilitate quick implementation of those C++ applications aforementioned. OpenCV5 provides a C++ library of computer vision functions for processing the data input from and output to the client. It also supports some of the image processing steps involved by the applications of preprocessing of depth data and foot tracking. In addition, the OpenGL library offers common functions in computer graphics that complete rendering of the shoe models in each image frame. The Z-buffer function of the library supports implementation of the proposed occlusion culling procedure.

On the client side, the users have to accomplish two major tasks to be able to access the try-on service. A client application program developed using OpenCV is available for downloading to serve this purpose. The first task is to record a clip of feet motion according to specific instructions using a Kinect v2. The data thus captured contain both the color and depth images. Synchronized frame by frame, they are uploaded to the cloud in the formats of JPEG and text files, respectively. Hypertext transfer protocol (HTTP) works as a communication protocol between the client program and the cloud. A user management module keeps track of user's profile and historical usage data. The shoe rendering application produces the try-on process as an AVI clip. The clip is then converted into the format of MP4 using ffmpeg6. Upon completion of the clip conversion, the cloud notifies a receiver by sending out a text massage via HTTP. The message prompts the receiver to download the virtual try-on clip, or to watch it online via video streaming using a smart phone.

This section demonstrates a user scenario of the implemented cloud system. As shown in Fig. 8(a), a user first logs onto the website of an online shoe store. The user browses a product catalog that displays highly realistic shoe models and provides model zooming, rotating, and positioning in real time. After browsing through different models, the user is interested in examining a particular model to see whether or not it fits on him/herself. The website instructs the first-time user to download the client program that helps preparing and uploading the clip of feet motion to try on. The user must use a Kinect v2 to film the feet motion within a distance range to the camera (see Fig. 8(b)). The user has to place his/her feet close to two target points in the video marked by the application program. It is recommended not to wear loose pants that may cover ankle or foot. The user can be barefoot or wearing socks, but not wearing shoes. The feet movement should be at regular walking speeds during the try-on process. The duration of a typical try-on motion is about 20 s, although it is not strictly limited.

The client program automatically compresses the clip containing a series of color and depth images into one file. The user then uploads the compressed file to the cloud from the same website, as shown in Fig. 8(c). The receiving program on the cloud posts a confirmation message to the user after the uploading has been properly completed. The user can log out of the website at this point. At the same time, the application programs deployed in the cloud start to uncompress the uploaded file, to perform the data preprocessing, and to recognize the foot locations frame by frame (see Fig. 8(d)). The occlusion culling application is triggered to calculate which part of the shoe model needs to be obscured using the Z-buffer technique. The visible part of the model is positioned to the foot region recognized from the color image. Finally, the rendering application displays the result in OpenGL with a default lighting condition and the camera viewport in Kinect. A new clip in the MP4 format is generated by sequentially arranging those processed frames. The user appears to be wearing the shoes during his/her motion in the new clip. Next, the cloud automatically updates the status of the try-on service on the website. It also sends out a notice to a receiver specified prior to starting the service. The receiver may click on the link embedded in the notice and start watching the try-on process via video streaming (see Fig. 8(e)). The user may forward the link to the peers on social networks so that they can also access to the clip and provide their opinions on the try-on result, as shown in Fig. 8(f).

Figure 9 shows several snapshots of the virtual try-on process. The occlusions appearing between the shoe model and the human body in the real scene have been properly processed in these images. The precision of occlusion processing influences the visualization quality of virtual shoe try-on. However, to explicitly define the precision is challenging, involving several uncertain factors. To quantitatively estimate the precision using one single criterion may not be feasible. For example, the try-on process normally lasts a period of time and consists of a series of images. The precision of a single image does not reflect the quality of the whole process. The best occlusion result depends on the shoe model to try on and the predetermined matrix M. Moreover, we have to distinguish between the two different cases, unexpected spacing, and incorrect blocking, as shown in Fig. 10. Each case requires a different way of evaluating the occlusion result. Manual labeling may be the only method to determine the correct “answer” of occlusion processing, i.e., a person visualizes a processed image and identifies the incorrect regions on the image pixel by pixel. This needs to be conducted for every image of a try-on process. The labeling result may also vary from one person to another.

The implemented cloud framework utilized Microsoft Kinect v2 to instantly capture the real scene data. There are other commercial depth cameras that provide similar functions and serve the same purpose as Kinect, e.g., Intel RealSense™ and Texas Instrument OPT™. Please refer to Ref [23] for a survey of commercial depth sensors and their comparisons. A major difference of those devices for implementing the virtual try-on service is the application software development kits each provides. This may lead to different development complexity. The image resolution of each camera also varies, resulting in different visualization quality or tracking precision. Some of them can replace the kinect sensor used in this research.

Most software tools used to design fashion products have been constructed from the perspective of designers instead of consumers. Those tools offer limited capability of engaging end users to participate in the design process. Enabling users to express their design ideas and instantly interact with product prototypes is desirable in modern product development. A feasible approach is to ubiquitously perform design evaluation in augmented reality environment. This paper presents a cloud service framework that realizes virtual footwear try-on in augmented reality via networks. Users select a shoe model they want to try on and upload a clip of feet motion to the cloud. A new clip generated in the cloud displays the try-on process in which the users see themselves wear the shoe model while moving in the clip. Generating the virtual try-on process involves three major computations: preprocessing of depth image, foot tracking, and occlusion processing. The motivation of the framework design is to make most use of open-source software and off-the-shelf technologies commercially available. A free Python Web framework, django, enables rapid development and deployment of the service. A file I/O module handles the tasks of user management, data transmission, and connecting with clients. A video processing module completes the three computations and shoe rendering as application services. A group of GPUs installed at the backend accelerates the ICP algorithm that performs foot recognition in a depth frame. OpenCV provides computer vision functions of for data transmission, preprocessing of depth data, and foot tracking. The OpenGL library offers computer graphics functions for rendering of shoe models and implementing occlusion culling using the Z-buffer technique. A client application program was developed using OpenCV to help users record a clip of feet motion using a Kinect v2 and upload to the cloud. Hypertext transfer protocol works as a communication protocol between the client program and the cloud. The try-on clip is sent back to the client side in the MP4 format as a streaming video over the networks. The receiver can download the clip or watch it online via networks using a mobile device. A use scenario demonstrates the practical value of a prototyping cloud service implementing the proposed framework.

In the current implementation, a Windows application program was developed for users to send the video clip with depth information captured by Kinect over to the cloud. Note that depth camera is probably not a device that every customer wants to have only for the sake of e-commerce. However, depth sensing (or 3D sensing) is a crucial function that major companies like Apple and Google have decided to include in their next-generation smart phones. Apple's iPhone X adopts the same time of flight technology as kinect sensor. To capture videos with depth information using smart phones would be a routine for most people in the near future. The proposed AR framework would become highly feasible then. Future studies can extend this work by including functional evaluation of shoe design like estimating wear comfort or simulating deformation of shoe models subject to various foot motions.

This work was financially support by Ministry of Science and Technology of Taiwan under the grant number MOST-103-2622-E-007-019-CC3.

Tseng, M. M. , and Piller, F. T. , 2003, The Customer Centric Enterprise, Advances in Mass Customization and Personalization, Springer-Verlag, Berlin.
Lo, C. H. , and Chu, C. H. , 2014, “ An Investigation of the Social-Affective Effects Invoked by Appearance-Related Products,” Hum Factors Ergonom. Manuf. Serv. Ind., 24(1), pp. 71–85. [CrossRef]
Huang, S. H. , Yang, Y. I. , and Chu, C. H. , 2012, “ Human-Centric Design Personalization of 3D Glasses Frame in Markerless Augmented Reality,” Adv. Eng. Inf., 16, pp. 35–45. https://www.sciencedirect.com/science/article/pii/S1474034611000565
Wang, C. C. L. , Hui, K. C. , and Tong, K. M. , 2007, “ Volume Parameterization for Design Automation of Customized Free-Form Products,” IEEE Trans. Autom. Sci. Eng., 4(1), pp. 11–21. [CrossRef]
Chu, C. H. , Wang, I. J. , Wang, J. B. , and Luh, Y. P. , 2017, “ 3D Parametric Human Face Modeling for Personalized Product Design: Eyeglasses Frame Design Case,” Adv. Eng. Inf., 32, pp. 202–223. [CrossRef]
Huang, S. H. , Yang, C. K. , Tseng, C. Y. , and Chu, C. H. , 2015, “ Design Customization of Respiratory Mask Based on 3D Face Anthropometric Data,” Int. J. Precis. Eng. Manuf., 16(3), pp. 487–494. https://link.springer.com/article/10.1007/s12541-015-0066-5
Cheng, K. , Nakazawa, M. , and Masuko, S. , 2017, “ MR-Shoppingu: Physical Interaction With Augmented Retail Products Using Continuous Context Awareness,” International Conference on Entertainment Computing, Tsukuba City, Japan, Sept. 18–21, pp. 452–455.
Speicher, M. , Cucerca, S. , and Krüger, A. , 2017, “ VRShop: A Mobile Interactive Virtual Reality Shopping Environment Combining the Benefits of On-and Offline Shopping,” Proc. ACM Interact., Mobile, Wearable Ubiquitous Technol., 1(3).
Yang, Y. I. , Yang, C. K. , and Chu, C. H. , 2014, “ A Virtual Try-On System in Augmented Reality Using RGB-D Cameras for Footwear Personalization,” J. Manuf. Syst., 33(4), pp. 690–698. [CrossRef]
Hauswiesner, S. , Straka, M. , and Reitmayr, G. , 2013, “ Virtual Try-On Through Image-Based Rendering,” IEEE Trans. Visualization Comput. Graph., 19(9), pp. 1552–1565. [CrossRef]
Antonio, J. M. , Jose Luis, S. R. , and Faustino, S. P. , 2013, “ Augmented and Virtual Reality Techniques for Footwear,” Comput. Ind., 64(9), pp. 1371–1382. https://www.sciencedirect.com/science/article/pii/S016636151300122X
Yuan, X. , Tang, D. , Liu, Y. , Ling, Q. , and Fang, L. , 2017, “ Magic Glasses: From 2D to 3D,” IEEE Trans. Circuits Syst. Video Technol., 27(4), pp. 843–854. [CrossRef]
Sesana, M. , and Canepa, A. , 2017, “ Virtual Showroom for New Textile Design and Sales Decrease Textile Collection Costs by Virtual Reality Samples,” International Conference on Engineering, Technology and Innovation (ICE/ITMC), Madeira Island, Portugal, June 27–29, pp. 1534–1537.
Vitali, A. , and Rizzi, C. , 2017, “ A Virtual Environment to Emulate Tailor's Work,” Comput.-Aided Des. Appl., 14(5), pp. 671–679. [CrossRef]
Eisert, P. , Fechteler, P. , and Rurainsky, J. , 2008, “ 3-D Tracking of Shoes for Virtual Mirror Applications,” IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, June 23–28.
Greci, L. , Sacco, M. , Cau, N. , and Buonanno, F. , 2012, “ FootGlove: A Haptic Device Supporting the Customer in the Choice of the Best Fitting Shoes,” Haptics: Perception, Devices, Mobility, and Communication, Springer, Berlin, pp. 148–159.
Shen, J. , Su, P. C. , Cheung, S. C. S. , and Zhao, J. , 2013, “ Virtual Mirror Rendering With Stationary RGB-D Cameras and Stored 3D Background,” IEEE Trans. Image Process., 22(9), pp. 3433–3448. [CrossRef] [PubMed]
Rızvanoğlu, K. , and Çetin, G. , 2013, Research and Design Innovations for Mobile User Experience, IGI Global, Hershey, PA.
Gold, S. , Rangarajan, A. , Lu, C. P. , Suguna, P. , and Mjolsness, E. , 1998, “ New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence,” Pattern Recognit., 38(8), pp. 1019–1031. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7722
Jolliffe, I. T. , 2002, Principle Component Analysis, 2nd ed., Springer-Verlag, New York.
Besl, P. , and McKay, N. , 1992, “ A Method for Registration of 3-D Shapes,” IEEE Trans. Pattern Anal. Mach. Intell., 14(2), pp. 239–256. [CrossRef]
Yuan, M. , Khan, I. R. , Farbiz, F. , Yao, S. , Niswar, A. , and Foo, M. H. , 2013, “ A Mixed Reality Virtual Clothes Try-On System,” IEEE Trans. Multimedia, 15(8), pp. 1958–1968. [CrossRef]
Horaud, R. , Hansard, M. , Evangelidis, G. , and Ménier, C. , 2016, “ An Overview of Depth Cameras and Range Scanners Based on Time-of-Flight Technologies,” Mach. Vision Appl., 27(7), pp. 1005–1020. [CrossRef]
Copyright © 2019 by ASME
View article in PDF format.

References

Tseng, M. M. , and Piller, F. T. , 2003, The Customer Centric Enterprise, Advances in Mass Customization and Personalization, Springer-Verlag, Berlin.
Lo, C. H. , and Chu, C. H. , 2014, “ An Investigation of the Social-Affective Effects Invoked by Appearance-Related Products,” Hum Factors Ergonom. Manuf. Serv. Ind., 24(1), pp. 71–85. [CrossRef]
Huang, S. H. , Yang, Y. I. , and Chu, C. H. , 2012, “ Human-Centric Design Personalization of 3D Glasses Frame in Markerless Augmented Reality,” Adv. Eng. Inf., 16, pp. 35–45. https://www.sciencedirect.com/science/article/pii/S1474034611000565
Wang, C. C. L. , Hui, K. C. , and Tong, K. M. , 2007, “ Volume Parameterization for Design Automation of Customized Free-Form Products,” IEEE Trans. Autom. Sci. Eng., 4(1), pp. 11–21. [CrossRef]
Chu, C. H. , Wang, I. J. , Wang, J. B. , and Luh, Y. P. , 2017, “ 3D Parametric Human Face Modeling for Personalized Product Design: Eyeglasses Frame Design Case,” Adv. Eng. Inf., 32, pp. 202–223. [CrossRef]
Huang, S. H. , Yang, C. K. , Tseng, C. Y. , and Chu, C. H. , 2015, “ Design Customization of Respiratory Mask Based on 3D Face Anthropometric Data,” Int. J. Precis. Eng. Manuf., 16(3), pp. 487–494. https://link.springer.com/article/10.1007/s12541-015-0066-5
Cheng, K. , Nakazawa, M. , and Masuko, S. , 2017, “ MR-Shoppingu: Physical Interaction With Augmented Retail Products Using Continuous Context Awareness,” International Conference on Entertainment Computing, Tsukuba City, Japan, Sept. 18–21, pp. 452–455.
Speicher, M. , Cucerca, S. , and Krüger, A. , 2017, “ VRShop: A Mobile Interactive Virtual Reality Shopping Environment Combining the Benefits of On-and Offline Shopping,” Proc. ACM Interact., Mobile, Wearable Ubiquitous Technol., 1(3).
Yang, Y. I. , Yang, C. K. , and Chu, C. H. , 2014, “ A Virtual Try-On System in Augmented Reality Using RGB-D Cameras for Footwear Personalization,” J. Manuf. Syst., 33(4), pp. 690–698. [CrossRef]
Hauswiesner, S. , Straka, M. , and Reitmayr, G. , 2013, “ Virtual Try-On Through Image-Based Rendering,” IEEE Trans. Visualization Comput. Graph., 19(9), pp. 1552–1565. [CrossRef]
Antonio, J. M. , Jose Luis, S. R. , and Faustino, S. P. , 2013, “ Augmented and Virtual Reality Techniques for Footwear,” Comput. Ind., 64(9), pp. 1371–1382. https://www.sciencedirect.com/science/article/pii/S016636151300122X
Yuan, X. , Tang, D. , Liu, Y. , Ling, Q. , and Fang, L. , 2017, “ Magic Glasses: From 2D to 3D,” IEEE Trans. Circuits Syst. Video Technol., 27(4), pp. 843–854. [CrossRef]
Sesana, M. , and Canepa, A. , 2017, “ Virtual Showroom for New Textile Design and Sales Decrease Textile Collection Costs by Virtual Reality Samples,” International Conference on Engineering, Technology and Innovation (ICE/ITMC), Madeira Island, Portugal, June 27–29, pp. 1534–1537.
Vitali, A. , and Rizzi, C. , 2017, “ A Virtual Environment to Emulate Tailor's Work,” Comput.-Aided Des. Appl., 14(5), pp. 671–679. [CrossRef]
Eisert, P. , Fechteler, P. , and Rurainsky, J. , 2008, “ 3-D Tracking of Shoes for Virtual Mirror Applications,” IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, June 23–28.
Greci, L. , Sacco, M. , Cau, N. , and Buonanno, F. , 2012, “ FootGlove: A Haptic Device Supporting the Customer in the Choice of the Best Fitting Shoes,” Haptics: Perception, Devices, Mobility, and Communication, Springer, Berlin, pp. 148–159.
Shen, J. , Su, P. C. , Cheung, S. C. S. , and Zhao, J. , 2013, “ Virtual Mirror Rendering With Stationary RGB-D Cameras and Stored 3D Background,” IEEE Trans. Image Process., 22(9), pp. 3433–3448. [CrossRef] [PubMed]
Rızvanoğlu, K. , and Çetin, G. , 2013, Research and Design Innovations for Mobile User Experience, IGI Global, Hershey, PA.
Gold, S. , Rangarajan, A. , Lu, C. P. , Suguna, P. , and Mjolsness, E. , 1998, “ New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence,” Pattern Recognit., 38(8), pp. 1019–1031. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7722
Jolliffe, I. T. , 2002, Principle Component Analysis, 2nd ed., Springer-Verlag, New York.
Besl, P. , and McKay, N. , 1992, “ A Method for Registration of 3-D Shapes,” IEEE Trans. Pattern Anal. Mach. Intell., 14(2), pp. 239–256. [CrossRef]
Yuan, M. , Khan, I. R. , Farbiz, F. , Yao, S. , Niswar, A. , and Foo, M. H. , 2013, “ A Mixed Reality Virtual Clothes Try-On System,” IEEE Trans. Multimedia, 15(8), pp. 1958–1968. [CrossRef]
Horaud, R. , Hansard, M. , Evangelidis, G. , and Ménier, C. , 2016, “ An Overview of Depth Cameras and Range Scanners Based on Time-of-Flight Technologies,” Mach. Vision Appl., 27(7), pp. 1005–1020. [CrossRef]

Figures

Grahic Jump Location
Fig. 1

A simplified framework presenting the cloud-based try-on service

Grahic Jump Location
Fig. 2

Predefine the relationship between the template foot and shoe models

Grahic Jump Location
Fig. 3

Problem description of virtual shoe try-on

Grahic Jump Location
Fig. 4

Skeleton of a human user extracted from Kinect

Grahic Jump Location
Fig. 5

Estimating the ankle axis of a high precision

Grahic Jump Location
Fig. 6

Approximating the ankle with linking truncated circular cones

Grahic Jump Location
Fig. 7

The proposed cloud service framework

Grahic Jump Location
Fig. 8

A use scenario of the cloud service: (a) select a shoe model to try on, (b) record a clip of the foot motion using Kinect v2, (c) upload the clip to the cloud, (d) foot recognition in a depth frame, (e) completion notification to the receiver, and (f) watch the virtual try-on process in a smart device

Grahic Jump Location
Fig. 9

Snapshots of the virtual try-on process

Grahic Jump Location
Fig. 10

Poor occlusion culling results (left: unexpected spacing, right: incorrect blocking)

Tables

Errata

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In