Cloud Data Acquisition from Shared-Use Facilities in A University-Scale Laboratory Information Management System

Effective data management across large-scale shared-use materials-science facility centers is crucial, as high-end facilities available at such centers are indispensable for cutting-edge research in modern materials science. However, establishing network connectivity for these facilities is often not practically feasible due to its uniqueness, forcing users to resort to the traditional manual method of data acquisition and transfer, such as the use of DVD disks. In this paper, we introduce an IoT device designed to facilitate direct data transfer from these non-networked and shared-use facilities to a cloud system. We propose a novel data transfer algorithm that integrates user management with data transfer processes for safely assigning each user's data to its private storage space on the cloud. The proposed technology serves as a fundamental building block in shaping a smarter laboratory information management system for the advancement of material researches and developments.


INTRODUCTION
With the rapid advancement and wide spread of the Internet technologies, various digital devices-not only personal computers (PCs) and mobile devices but also automobiles, home appliances, and the other sensor deviceses-have been interconnected, enhancing their data utilization.Recently, the application of such Internet of Things (IoT) technologies in the field of material research and development has attracted much more attentions.It aims to automatically collect a large volume of data generated from material-related facilities, such as electron microscopes and synchrotron radiation facilities, and store them in remote storage or databases, such as clouds, for structuring, analyzing, and publishing data, which is referred to as Laboratory Information Management Systems (LIMS) [1-4, 8, 12, 15, 17-19, 22, 25, 26].
Experimental facilities for materials consists of actual instruments and control computers to manage them.Data obtained from the instrument, e.g., image data and sensor data, are output to each control computer, where the data are either analyzed in real-time or extracted for further analysis.When concerning the networking of such experimental facilities, a typical existing solution involves the establishment of a Local Area Network (LAN) environment via standard Ethernet interface among the control computers [11,15,19,22].If the network or each facility is logically isolated from the external network via a firewall, unauthorized access can be prohibited, ensuring the safe data transfer from the control computers to a cloud.
However, in practice, the use of the aforementioned Ethernetbased solution is significantly limited, especially in shared-use largescale laboratories, such as the National Institute of Standards and Technology Electron Microscopy Nexus (NIST EM Nexus) [22], Micro and Nano Technology Lab and Material Research Lab at the University of Illinois at Urbana-Champaign (MNTL-, MRL-UIUC) [15,18], and the center of the University of Tokyo for the Advanced Research Infrastructure for Material and data hub (UTokyo-ARIM) [23].In these laboratories, highly sophisticated and expensive facilities, such as atomic-resolution electron microscopes, are installed and shared by a range of academic and industrial users under the management of universities or institutions.Even though the control computers of the facilities run on standard OS, they are unlike conventional PCs, as they often operate for more than a decade after installation due to their uniqueness.Many of these computers possess hardware interfaces, drivers, and software that are specifically tailored and optimized for particular setups, making it practically infeasible to establish a simple network environment across all control computers via Ethernet as shown in Figure 1.

Traditional Data Extraction for Shared-Use Facility
For example, at the hardware level, the types of interfaces that allow for external connections are significantly restricted in the control computers, with many only permitting external connections via USB, even when they are capable of Ethernet connectivity.On the software side, many control computers operate on potentially vulnerable legacy OS, such as Windows XP or 7, due to the specific requirements of each instrument.The introduction of additional software, the update of the OS, or the application of security patches to these control computers is often prohibitively complex or outright impossible.This not only increases the risk of external cyber-attacks but also carries the possibility that the control computer itself may already be infected with very older viruses, such as so-called Trojan horses, which could potentially compromise other computers within the network.As a result, even basic network connections via Ethernet can pose substantial security risks even using a firewall.
For these reasons, most of the experimental facilities are still typically managed in a non-networked standalone environment.Data extraction is manually performed through removable storage devices such as USB flash drives or DVDs.Such operation effectively eliminates the risk of data leakage and cyber-attacks, and its simplicity minimizes the risk of data confusion among shared users.However, these naive solutions cause difficulty in the efficient handling of large volumes of data from many users.Thus, a cloud-based system would be strongly required.
Therefore, this work investigates an IoT solution that facilitate direct data transfer from these non-networked shared-use facilities to a cloud system.Key contributions of this work are summarized as follows: • Unified Facility Connection via USB-based Interface: Presently, almost all the experimental facilities are extracted their data via USB interface (DVD disks or USB memory), which provides a reliable and universally applicable method for data acquisition.The developed IoT device is connected to each facility via USB to facilitate direct data transfer to a cloud storage.We propose a novel data transfer mechanism that integrates the USB gadget mode connection, which operates on the IoT device side, with user authentication and management.This enables each user to independently and privately transfer data to their own cloud storage spaces.It also offers robust fault tolerance, which is indispensable in IoT devices where the network and/or power supply may be unexpectedly disrupted.• University-Scale System Deployment at Ease: We have implemented and deployed the system to the laboratory center, including over 30 shared-use facilities and up to several hundreds of shared users.Since the IoT device only requires the USB disk driver and does not require any setups in the facility side, the installation was achieved in nearly all facilities with negligible cost.The preliminary system evaluation shows that our system can provide efficient and secure data transfer to the cloud storage system at a university scale.
The rest of this paper is organized as follows: Section 2 provides backgrounds and related work.Section 3 presents the overview of our laboratory information management system.Section 4 presents the design of our developed IoT device.Section 5 discusses on the proposed data transfer method working on the IoT device.Section 6 shows the performance evaluation of the system.Finally, we conclude the paper in Section 7.

LABORATORY INFORMATION MANAGEMENT SYSTEMS
In this section, we review Laboratory Information Management Systems (LIMS) for material-related facilities as backgrounds and related work of this paper.LIMS for material-related facilities aim to streamline the laboratory operations and its information managements, accelerating the material researches and developments.Large number of systems have been developed, and different work focus on the different goals.One of the primal goals for LIMS is to provide a common data structuring method in storing data from facilities and make it effective to retrieve and catalog the collected data [1,3,17,25].The effective data structuring enhances the fulfillment of FAIR-Findability, Accessibility, Interoperability, and Reusability-principle in the material data [24].The key efforts involve the development of the standard open software to process different vendor-specific data formats [6], the design of the common software interface to store, retrieve, and publish material data [1,25], as well as the design of universal "meta-data", such as, the user information, time information to generate, machine setups, the condition of samples, and so forth [3,17].
The other lines of work on LIMS are focused on establishing a holistic system that integrate from data collection to data analysis [2,15,18,19,22].The key efforts involve the establishment of the integrated system for the shared-use facilities [22], the system framework for the real-time data acquisition and analysis [15,18], the collaborative research notebook for laboratories [2].
Different from these aforementioned existing efforts, our work is focused more on the lower system-side issue when deploying such high-level systems.Specifically, our work aim to solving more restricted facility's scenarios where no additional software and setups (e.g., network attached storage, or network in general) are forbidden in the control computer.Thus, our proposal can serve as the foundational basis for building these high-level systems and gives an alternative solution for facilities that the existing Ethernetbased networking cannot cover [11,15,18,19,22].

OVERVIEW OF OUR LABORATORY INFORMATION MANAGEMENT SYSTEM
In this section, we briefly describe the overview of our LIMS.Figure 2 shows the overview.The system consists of two parts: a shared-use laboratory and an academic cloud.The shared-use laboratory is the center of material research and development, where over 30 shared-use facilities are installed, e.g., electron microscopes, X-ray diffractometers, physical property's measurement equipments, etc.These facilities are shared by over several hundreds of academic and industrial users.Majority of the facilities are managed under non-networked standalone environment, where each user currently extracts the data via DVD disks or secure USB flash drives.Each facility is located within the campus, and thus we utilize the campus networks (both wireless and wired) for their connection between the IoT device and the academic cloud.We construct a virtual LAN among the connected IoT devices to safely establish the network.
The academic cloud is a general-purpose public cloud service for science, such as Jetstream 2 [9], mdx [21], and EGI-ACE [7], where we deploy the user management system and the data management system on top of virtual machines (VMs) environment.The cloud system holds the peta-scale storage system, which we store the data generated from each facility.The generated data are stored and managed in a cloud-storage manner like Dropbox, Google Drive, or iCloud.Each user has its own private directory and the data from the facility are automatically collected from the IoT device.Once the data are stored in the storage system, each user can flexibly manage its own data through the web interface like Dropbox or the other cloud storage services.
Both the systems are connected through the high-speed academic network, such as Internet2 [10] and SINET [13], so that the data from the facilities does not pass the global Internet.Data access from the Internet to the peta-scale storage is only permitted via the academic cloud, where we can control the system-level access by using the firewall service of the cloud.Moreover, thanks to this high-speed academic network, the system can establish the reliable network connection to a world-class supercomputer, such as Fugaku [20] and Frontier [14], for further large-scale data analysis.

IOT DEVICE DESIGN FOR CLOUD DATA ACQUISITION
In this section, we discuss the design of the IoT device as shown in Figure 3.There are three purposes to introduce the IoT devices as follows: • Data Acquisition from Control Computer: As most of the facilities are non-networked, IoT device locally acquire data from the control computer.• Data Transfer to the Cloud: The acquired data are required to be transferred to the cloud immediately.• The Identification of Current Facility User: Since the user and data management system in the cloud side cannot identify the current facility user who are actually located in the room.The IoT device takes on responsibility to identify which users are currently in the room.
Thus, the IoT device possesses three interfaces to communicate with the other components, as shown in Figure 4.
• USB OTG Interface: USB OTG interface is utilized for the data acquisition from the control computer of the facility.The IoT device is connected to the control computers via OTGsupported USB.We use Mass Storage Gadget (MSG) [16] through USB Gadget API in Linux Kernel [5] and make IoT acting as a USB flash drive.• Network Interface: Network interface is utilized for the data transfer from IoT device to the cloud.We basically use wireless network as it is relatively easy to apply for many devices.Only for special cases, such as non-wirelessnetworked room, requiring high-performance data transfer, we use wired connection.• Short-Range Communication Interface: Short-Range communication interface, such as display, NFC, and Bluetooth, is utilized to send the message to nearby facility users.
In our implementation, we use displays attached to IoT devices to identify which users are currently working in the facility room, as shown in Figure 3.
Based on the IoT device, we develop a data transfer mechanism as discussed in the next section.

DATA TRANSFER ALGORITHM ON IOT DEVICE
In this section, we propose a data transfer algorithm that integrates data transfer with user management while ensuring fault tolerance.
The specific goals of this algorithm are summarized as follows: • Per-User Data Assignment by Integrating User Management System with Data Transfer: Even though USB driver, which we use in the control-computer side, is particularly common in all the facility control computers, it only provides simple mounting and unmounting mechanisms and the installation of some additional software programs are not permitted.Thus, the integration with user management system from the IoT device side is required in such a way that each user's data are transferred to the private space in the cloud.• Fault-Tolerant Data Transfer under Unreliable Network Connection and Power Supply: As with a general IoT devices, the network connection and power supply could be unexpectedly disrupted.Safe data transfer by keeping data completeness is required.The prevention of data mixedup among users is required even when the data transfer is shutdown unexpectedly.
Figure 5 shows the overall usage flow of the system.We will discuss how the system achieves these goals.

Fault-tolerant and Per-User Data Transfer
First, we discuss the principle of the fault-tolerant data transfer in a per-user manner.Suppose that only the USB driver is installed on the controller computer side, and the source of data communication, i.e., the control computers or the IoT device, never discards the data until the data transmission is completed and stored on the persistent storage (i.e., disk) at the destination.This ensures that the complete data is always stored on the persistent storage either at the source or the destination, leading to the following observation.
Observation 1.Consider a data transmission scenario where data are transferred by a responsible user to a given private directory on the cloud storage.Then, in order to recover the data transmission from the disruption at any point, user ID and the directory information are necessary.
From the observation, the following properties are required in the algorithm.Remark 1.For making the above data transmission scenario be fault-tolerant, the IoT device needs to locally keep the user ID and directory information on its persistent storage before the data transfer to the cloud storage.Also, for the safe data deletion to prevent data mixed-up among different users, data need to be deleted before user ID and directory information.

Integration with User Management
Second, we discuss the integration of the IoT device with the user management system on the cloud side.
The goal of the integration is to safely manage and synchronize the user ID and directory information in the cloud side and the IoT device side.The user authentication is operated on the cloud side.To send the user information and directory information, the authenticated user information is once updated in the cloud side.Then, the IoT device pulls the updated user info and store them to the local persistent storage.Such pull-based communication is a more scalable approach than the simple push-based one for sending a message from a central reliable server to many unreliable IoT devices.
The overall flows of the integrate system are shown in Figure 5. First, the user gets the temporal URL code of the login system from the IoT device.The URL code is periodically updated so that the user management system acknowledges which user is currently accessed from the facility room.Only if the temporal URL is the latest and the user authentication successes, then the user management system permits the user to access the IoT device located in the room.
Second, the user management system send the ID of the login user with its target directory information.When the IoT device receives this information, it starts the initialization process to establish the connection to the target directory in the cloud.Specifically, it stores the user ID and directory information on its local persistence storage before executing OTG gadget mode kernel so that the control computer recognizes the IoT as the USB flash memory.
Third, the actual data transfer from the control computer is started by the facility user.The data is immediately transfer to the cloud.When the user finishes the data transfer, the user accesses the user management system again to stop the data transfer and logout from the system.
Finally, when the user finalizes the data transfer, the user management system requests the IoT device to finalize the data transfer.The IoT device transfers all the remaining data before deleting data and user ID.As discussed above subsection, the data is deleted just before user ID so that the data transfer would be recovered again even if the IoT device is unexpectedly disrupted.Then, after the data transfer and the deletion process is finished, the IoT device disconnects its USB connection from the control computer.

Complete Algorithm
Finally, we discuss the algorithm executing on the IoT device to realize the fault-tolerant and per-user data transfer.The algorithm provides the data transfer and its recovery mechanisms from the unexpected disruption.
As shown in Figure 6, the algorithm executes two processes: (1) temporal URL generation and (2) data transfer.

5.3.1
Temporal URL Generation Process.In the temporal URL generation process, the time information and device information (i.e., the facility name) are generated and encrypted by using reversible hash function in the IoT device side.The URL is sent to the user via short-range interface (e.g., display, NFC, Bluetooth).In our implementation, we use display and the IoT device shows the QR code from its display, which each user accesses the user management system from the web browser in its mobile.When the user accesses the user management system, the user management system decrypts the time and device information by using the same hash function with the IoT device.If the time information is older, then the user management system forbid the access.The URL code is periodically updated to permit the incorrect login from the older facility users.

Data Transfer
Process.The data transfer process begins with the check of its local information of user ID with the target directory.If they exist, it means the IoT device is unexpectedly disrupted during the data transfer from the IoT device to the cloud.The IoT device finalizes its data transfer, i.e., it executes the data transfer, the data deletion, before the deletion of the user ID and directory information.
Then, the data transfer process remotely gets the user ID with directory information from the user management system.If there are no information of the user ID and directory information, it means that the user is just logged out or that there are no user currently using the facility.Then, the IoT device disconnects USB connection from the control computer (if connected), before executing the finalization in the same way as the above (i.e., transfer remaining data and delete them with user ID and directory information).
Otherwise, if a user ID and its directory information are registered in the user management system, then there are further three branches: (1) no local user is registered, (2) the remote information is exactly the same as the local information, and (3) the remote user is different from the local user.
Case (1) means the remote user is newly logging in to the system.Thus, the IoT device immediately stores the new login information of user and directory and establishes the USB connection to the control computer via USB OTG.
Case (2) means that there is a user currently connected.The IoT device checks the file image connected to the control computer and transfers data if exists.Since usual USB OTG mounting does not support real-time concurrent access, the update of files in the image is only recognized when the image is actually mounted.The IoT device periodically executes mounting and unmounting util there is no access to the image from the control computer via USB OTG.
Case (3) means that the new other user is accessing to the user management system before successfully logging out from the previous user.In this case, before establishing USB OTG connection,  By executing these above computational flow, the IoT device achieves the fault-tolerant and per-user data transfer from the control computer of each facility to the cloud.

EVALUATION
In this section, we evaluate our system.We have implemented the data transfer method of the IoT device on Raspberry Pi 4 Model B including Linux Raspberry Pi 6.1.21-v8+OS.The installed devices are connected via USB 2.0 to each of the facility computers, as the legacy facility computers only support USB 2.0.For the data management system in the cloud side, we use Nextcloud 20.6.1 on top of Red Hat Enterprise Linux release 8.5 with CPU Intel(R) Xeon(R) Gold 6348 × 2 (28 cores × 2 Sockets).and 64 GB Memory.Around 3 Petabyte DDN EXAScaler Storage (Lustre) are installed for the storage.For the basic data transmission, we utilize WebDAV protocol by using the Rclone client (v1.50.2).Since the typical data size from facilities is from few MBytes to 1GByte, we conduct the evaluation by using 10MByte, 100MByte, 1GByte data file.
The summary of this evaluation is as follows:   Figure 7 shows the performance of the USB data transfer from the facility computer to IoT devices.Since in many cases the workspace is located at a distance from the actual facility computer, the use of extended USB cables is necessary.Thus, we evaluate the performance effects when using USB extension.As shown in Figure 7, the performance degradation by the USB extension is very limited, and the transfer performance is stable in the different file sizes.Figure 8 shows the performance under the low CPU power of Raspberry Pi.The CPU power of Raspberry Pi (and the typical IoT devices) often becomes unstable due to the heat and/or power consumption.We simulate this situation by stressing CPU utility to 400% (100% × 4 CPUs) and evaluate its performance effects.As shown in Figure 8, the performance degradation is very limited even when all CPUs are occupied.

Data Transfer from IoT Device to Cloud
Figure 9 shows the performance in the actual Wi-Fi environment of the campus, where 100Mbps to 1GMbps networks are installed.We measured the performance in 10 facilities and plot its average and min/max values.As shown in Figure 9, the throughput values are range from around 3 MBytes to 8 MBytes, which are typical value in the practical use of 100Mbps network.
Next, we further evaluate the performance characteristics when the network environments become more sophisticated.We connect the IoT device to wired LAN environment and emulate network bandwidth from 10Mbps, 100Mbps, to 1Gbps.Figure 10 shows the results of the single data transfer.As the bandwidth increases, the throughput will increase and maximum value reaches to around 31 MByte per sec.Note that this maximum value is over USB transfer performance, as shown in Figure 7. Thus, when using 1Gbit/sec network, the system reaches to peak operational performance as USB interface cannot be improved or replaceable at all.
Figure 11 shows the results of the multiple data transfer, where we change the number of files from 1 to 16 files.In the 1Gbit/sec network (e.g., realized by wired LAN or high-end Wi-Fi such as Wi-Fi 802.11ac or Wi-Fi 5), the further improvement is achieved by using the concurrent data transfer.However, the data transfer method (i.e, Rclone and WebDAV) still have a limitation to fully utilize the network 1Gbit/sec bandwidth because at this scenario CPU power to compute the encryption of data and TLS communication is the main performance bottleneck rather than the network bandwidth.The effective exploitation of this high-volume bandwidth will be addressed in our future work.
Finally, we evaluate the overall performance when deploying multiple IoT devices in the system.Based on the aforementioned performance characterization, we simulate the realistic scenario by emulating 100Mbps network and deploy one to 64 clients in the virtual environment.As shown in Figure 12, the performance is dropped down from 8 to 64 clients when each client sends four 100MByte Files.However, the degradation rate of the system's performance is only around 55% when the number of clients increases from 1 to 64, which is a 64-fold increase in client count.

CONCLUSION
In this paper, we have presented a novel IoT-based solution to enable direct and seamless data transfer from non-networked, shared-use facilities to a cloud-based system.This solution addresses a significant gap in large-scale shared-use laboratories, where network connectivity for data transfer is often not practically implementable.Our innovative data transfer algorithm not only facilitates efficient and secure data transfer but also integrates user management, ensuring that each user's data is safely assigned to its designated private storage space on the cloud.Our evaluation has demonstrated the capability of these IoT devices to transfer data to cloud storage.This represents a significant step towards modernizing data management in material research facilities.As such, the technology proposed in this paper serves as a fundamental building block towards establishing a smarter and more efficient laboratory

Figure 1 :
Figure 1: Comparison of Data Extraction from Shared-Use Facility for Materials Science

Figure 2 :Figure 3 :
Figure 2: Overview of Our Laboratory Information Management System

Figure 5 :( 1 )
Figure 5: Overall Usage Flow of the System

Figure 6 :
Figure 6: Flow of Proposed Data Transfer Algorithm

Figure 7 :
Figure 7: Data Transfer from Facility to IoT Devices via USB

Figure 12 :
Figure 11: Performance of Concurrent File Transfer

•
Enabling Direct Data Transfer for Software-Inflexible Facilities: To ensure continuous stable operation of facilities, introducing additional software installations is often infeasible.The IoT device, connected to the control computers via USB gadget mode, acts as a USB flash drive and establishes connection only using a USB disk driver.This achieves a direct data transfer mechanism without the need to introduce additional software or setup.•Integrating Data Transfer with User Management for Shared Use: Facilities are shared among various users from academics and industries, but the USB disk driver solely handles the mounting/unmounting of disk images, hence cannot distribute access among different users as it stands.
• Stable USB Data Transfer from Facility Computer to IoT Device: Due to the legacy yet reliable USB-based technology, the data transfer from facility computers to IoT device is significantly stable.The performance is hardly affected by the condition of the cables and IoT devices.• Acceptable Data Transfer Time When Using High-End Networks: Utilizing high-speed networks enables the system to reach its peak operational performance.The data transfer rate is ultimately constrained by the limitations of USB 2.0, establishing a point beyond which further enhancement is unfeasible at this time.• Acceptable System Scalability for Campus-wide Environment: In a campus-wide the system a sufficiently high data transfer performance for practical use.Even when simultaneous data transfers are conducted from all shared-use facilities within the campus, the performance degradation is maintained at a low level.