文章目录
-
- A B S T R A C T
- 1. Introduction
- 2. Big Data analytics
-
- 2.1. Big Data definition
- 2.2 Big Data methodology
-
- 2.2.1. Apache Hadoop
- 2.2.2. MangoDB
- 2.2.3. Cassandra
- 2.3. Big Data processing
-
- 2.3.1. R
- 2.3.2. Datameer
- 2.3.3. BigSheets
- 3. Big data in upstream oil and gas industry
-
- 3.1. Big Data in exploration
- 3.2. Big Data in drilling
- 3.3. Big Data in reservoir engineering
- 3.4 Big Data in production engineering
- 4. Big data in downstream oil and gas industry
-
- 4.1. Big Data in refining
- 4.2. Big Data in oil and gas transportation
- 4.3. Big Data in Health and Safety Executive (HSE)
- 5. Big Data challenges
- 6. Conclusions
- References
A B S T R A C T
This paper reviews the utilization of Big Data analytics, as an emerging trend, in the upstream and downstream oil and gas industry. Big Data or Big Data analytics refers to a new technology which can be employed to handle large datasets which include six main characteristics of volume, variety, velocity, veracity, value, and complexity. With the recent advent of data recording sensors in exploration, drilling, and production operations, oil and gas industry has become a massive data intensive industry. Analyzing seismic and micro-seismic data, improving reservoir characterization and simulation, reducing drilling time and increasing drilling safety, optimization of the performance of production pumps, improved petrochemical asset management, improved shipping and transportation, and improved occupational safety are among some of the applications of Big Data in oil and gas industry. Although the oil and gas industry has become more interested in utilizing Big Data analytics recently, but, there are still challenges mainly due to lack of business support and awareness about the Big Data within the industry. Furthermore, quality of the data and understanding the complexity of the problem are also among the challenging parameters facing the application of Big Data.
本文回顾了上游和下游油气行业中作为新兴趋势的大数据分析的利用。大数据或大数据分析是指一种新技术,可用于处理大型数据集,其中包括容量,种类,速度,准确性,价值和复杂性的六个主要特征。随着勘探,钻探和生产作业中数据记录传感器的出现,石油和天然气行业已成为庞大的数据密集型行业。分析地震和微地震数据,改善储层特征和模拟,减少钻井时间并提高钻井安全性,优化生产泵的性能,改善石化资产管理,改善运输和运输以及提高职业安全性是其中的一些应用。石油和天然气行业的大数据。尽管最近石油和天然气行业对利用大数据分析越来越感兴趣,但是,仍然存在挑战,主要是由于缺乏业务支持和行业内对大数据的认识。此外,数据质量和理解问题的复杂性也是大数据应用面临的挑战性参数之一。
1. Introduction
The recent technological improvements have resulted in daily generation of massive datasets in oil and gas exploration and production industries. It has been reported that managing these datasets is a major concern among oil and gas companies. A report by Brule [1] stated that petroleum engineers and geoscientists spend over half of their time in searching and assembling data. Big Data refers to the new technologies in handling and processing these massive datasets. These datasets are recorded in different varieties and generated in large volume in various operations of upstream and downstream oil and gas industry [2–10]. Moreover, in most cases, if processed efficiently, they can reveal important underlying governing equations behind sophisticated engineering problems. It is reported by Mehta [11] that based on the results of a survey conducted by General Electric and Accenture among the executives, 81% of them considered Big Data to be among the top three priorities of oil and gas companies for 2018. Based on their paper, the main reason behind this popularity is the need for improving the oil and gas exploration and production efficiency. This viewpoint and future prediction among executives for 2018 become more interesting once we compare the findings by Feblowitz [12] in 2013. Based on a survey in 2012 by IDC Energy, 70% of the participants from U.S. oil and gas companies were not familiar with Big Data and its applications in petroleum engineering. This shows how the interest in Big Data has changed from 2012 to 2018 among the oil and gas industry executives.
最近的技术进步导致每天在油气勘探和生产行业中生成大量数据集。据报道,管理这些数据集是石油和天然气公司的主要关注点。 Brule [1]的报告指出,石油工程师和地球科学家花费了一半以上的时间来搜索和组装数据。大数据是指处理和处理这些海量数据集的新技术。这些数据集以不同的品种记录,并在上游和下游油气工业的各种操作中大量生成[2-10]。而且,在大多数情况下,如果得到有效处理,它们可以揭示出复杂的工程问题背后的重要底层控制方程。据Mehta [11]报道??,根据通用电气和埃森哲对高管人员进行的一项调查结果,其中81%的人认为大数据是2018年石油和天然气公司的三大优先事项之一。他们的论文之所以如此受欢迎,其主要原因是需要提高油气勘探和生产效率。一旦我们比较2013年Feblowitz [12]的发现,高管们的观点和对未来的预测就变得更加有趣。[12]根据IDC Energy 2012年的调查,美国石油和天然气公司的参与者中有70%不熟悉大数据及其在石油工程中的应用。这显示了石油和天然气行业高管从2012年到2018年对大数据的兴趣如何变化。
This paper presents an extensive review on the recent papers about the application of Big Data analytics in both upstream and downstream oil and gas industry. In the first part of the paper, Big Data is defined and the processing tools are introduced. In the second part of the paper, the utilization of Big Data in oil and gas industry is presented. For the last part, the major challenges facing the Big Data analytics in oil and gas industry are addressed.
本文对有关以下方面的最新论文进行了广泛的评论:大数据分析在上游和下游油气行业中的应用。 在本文的第一部分中,定义了大数据并介绍了处理工具。 本文的第二部分介绍了大数据在石油和天然气行业中的利用。 在最后一部分中,解决了石油和天然气行业大数据分析面临的主要挑战。
2. Big Data analytics
2.1. Big Data definition
Big data includes unstructured (not organized and text-heavy) and multi-structured data (including different data formats resulting from people/machines interactions) [13]. The term Big Data (also called Big Data Analytics or business analytics) defines the first characteristic of this method and that is the size of the available data set. There are other characteristics related to the data which make it viable for Big Data tools. Those characteristics are well named by IBM as three Vs. These three Vs refer to volume, variety, and velocity [14]. However, more recent articles have added two more Vs to give a better definition for Big Data. The additional Vs include veracity and value [15].
大数据包括非结构化(非组织化和文本繁重)以及多结构数据(包括来自人/机交互)[13]。 大数据(也称为大数据分析或业务分析)一词定义了此方法的第一个特征,即可用数据集的大小。 还有其他与数据相关的特性,使其适用于大数据工具。 这些特性被IBM很好地命名为三个V。 这三个V代表体积,种类和速度[14]。 但是,最近的文章又增加了两个V,以更好地定义Big Data。 附加的V包括准确性和价值[15]。
Volume refers to the quantity of data or information. These data can
come from any sensor or data recording tool. This vast quantity of data is challenging to be handled due to storage, sustainability, and analysis issues [13]. Many companies are dealing with huge volume of data in their archives; however they do not have the capability of processing these data. The main application of Big Data is to provide processing and analysis tools for the increasing amounts of data [15].
数量是指数据或信息的数量。 这些数据可以来自任何传感器或数据记录工具。 由于存储,可持续性和分析问题,海量数据难以处理[13]。 许多公司正在处理其存档中的大量数据。 但是它们没有处理这些数据的能力。 大数据的主要应用是为不断增加的数据量提供处理和分析工具[15]。
It is obvious that this characteristic of Big Data can be seen in
various sectors of oil and gas industry, such as exploration, drilling, and production. During oil and gas exploration seismic data acquisition generates a large amount of data used to develop 2D and 3D images of the subsurface layers. For the offshore seismic studies, narrow-azimuth towed streaming (NATS) uses the gathered data to develop images of the underlying geology. Wide azimuth (WAZ) is a more recent innovation to capture more data and develop higher quality images. All these tools and innovations are generating more data which requires processing and analysis.
很明显,可以从石油和天然气工业的各个领域,例如勘探,钻探和生产。 在油气勘探期间,地震数据采集会生成大量数据,用于开发地下层的2D和3D图像。 对于海上地震研究,窄方位拖曳流(NATS)使用收集到的数据来开发基础地质图像。 宽方位角(WAZ)是一项较新的创新,可以捕获更多数据并开发更高质量的图像。 所有这些工具和创新都在生成更多需要处理和分析的数据。
Recent innovations in drilling tools are also generating large amount of data during drilling operations. Tools such as logging while drilling (LWD) and measurement while drilling (MWD) are transmitting various data to the surface real time.
钻探工具的最新创新还在钻探操作期间生成了大量数据。 随钻测井(LWD)和随钻测井(MWD)等工具正在将各种数据实时传输到地面。
Optical fibers combined with various sensors are now being used in well tubular to record different parameters such as fluid pressure, temperature, and composition during oil and gas production [12].
结合各种传感器的光纤现已用于井筒以记录不同的参数,例如流体压力,温度和油气生产过程中的组成[12]。
The term velocity as a characteristic of Big Data refers to the speed of data transmission and processing. It also refers to the fast pace of data generation. The challenging issue about the velocity component is the limited number of available processing units compared to the volume of data. Recently, the data generation velocity is huge, as a data of 5 exabyte is generated just in two days. This is equivalent to the total amount of data created by humans until 2003 [16].
作为大数据的特征,术语速度是指速度数据传输和处理。 它也指的是快节奏 数据生成。 关于速度分量的具有挑战性的问题是,与数据量相比,可用处理单元的数量有限。 最近,数据生成速度非常快,因为仅两天就生成了5艾字节的数据。 这相当于到2003年人类创造的数据总量[16]。
The velocity characteristic is even more prominent for oil and gas
industry due to complex nature of various petroleum engineering problems. Processing large amount of generated data by an individual for a complex problem is impossible and results in significant delay and uncertainty. There are many cases in which real time and fast processing of data is crucial in oil and gas industry. For example, fast processing of well data during drilling can result in identifying kicks and preventing destructive blow-outs efficiently [12].
速度特性在石油和天然气中更加突出由于各种石油工程问题的复杂性而导致工业发展。 由个人处理大量生成的数据来解决一个复杂的问题是不可能的,并且会导致大量的延迟和不确定性。 在许多情况下,实时和快速处理数据在石油和天然气行业中至关重要。 例如,在钻井过程中对井眼数据的快速处理可能导致识别井涌并有效地防止破坏性井喷[12]。
Variety refers to the various types of data which are generated, stored, and analyzed. The data recording devices and sensors are different in types and as a result the generated data can be in different sizes and formats. The formats of the generated data can be in text, image, audio, or video. The classification can be done in a more technical way as structured, semi-structured, and unstructured data [16]. It is reported that generally 90% of the generated data is unstructured [15]. However, the majority of oil and gas generated data from SCADA systems, surface and subsurface facilities, drilling data, and production data are structured data. These data could be time series data which have been recorded through a certain course of time. Another source of structured data includes the asset, risk, and project management reports. There would be also external structured data sources such as market prices and weather data, which can be used for forecasting. The sources of unstructured data in oil and gas industry include well logs, daily written reports of drilling, and CAD drawing. The sources of semistructured data include processed data as a result of modeling and simulation. There are various practices of experimental and computer simulation in the oil and gas industry to generate data for further analysis. These data can be categorized as semi-structured data and later to be used with Big Data tools [12].
多样性是指生成,存储和分析的各种类型的数据。数据记录设备和传感器的类型不同,因此生成的数据可以具有不同的大小和格式。生成的数据的格式可以是文本,图像,音频或视频。可以采用技术性更强的方式对结构化,半结构化和非结构化数据进行分类[16]。据报道,通常90%的生成数据都是非结构化的
[15]。但是,大多数石油和天然气是从SCADA生成的数据系统,地面和地下设施,钻探数据和生产数据都是结构化数据。这些数据可以是经过一定时间记录的时间序列数据。结构化数据的另一个来源包括资产,风险和项目管理报告。也将有外部结构化数据源,例如市场价格和天气数据,可用于预测。石油和天然气行业中非结构化数据的来源包括测井记录,钻井的每日书面报告和CAD图。半结构化数据的来源包括作为建模和仿真结果的处理数据。石油和天然气工业中有多种实验和计算机模拟实践,可以生成数据进行进一步分析。这些数据可以归类为半结构化数据,以后可以与大数据工具一起使用[12]。
Veracity refers to the quality and usefulness of the available data for the purpose of analysis and decision making. It is about distinguishing between clean and dirty data. This is very important as the dirty data can significantly affect the velocity and accuracy of data analysis. The generated data should be professionally and efficiently processed and filtered to be used for data analysis; otherwise the results will not be reliable. The veracity of data is challenging in oil and gas industry specifically due to nature of data, which mainly comes from subsurface facilities and it might include uncertainty. Another challenge comes from the data collected by conventional manual data recording, which is done by human operators.
准确性是指可用于以下目的的可用数据的质量和有用性:分析和决策的目的。 它是关于区分干净数据和脏数据。 这非常重要,因为脏数据会严重影响数据分析的速度和准确性。 生成的数据应经过专业,有效的处理和过滤,以用于数据分析; 否则结果将不可靠。 特别是由于数据的性质,数据的准确性在石油和天然气行业中具有挑战性,这主要来自地下设施,并且可能包括不确定性。 另一个挑战来自由人工操作的常规手动数据记录收集的数据。
Value is a very significant characteristic of the Big Data. The returned value of investments for Big Data infrastructures is of a great importance. Big Data analyzes huge data sets to reveal the underlying
trends and help the engineers to forecast the potential issues. Knowing the future performance of equipments used during operation and identifying the failures before happening can make the company to have competitive advantage and bring value to the company.
价值是大数据的非常重要的特征。 大数据基础架构的投资回报率非常高重要性。 大数据分析巨大的数据集以揭示潜在的趋势并帮助工程师预测潜在问题。 了解操作期间使用的设备的未来性能并在发生故障之前进行识别可以使公司具有竞争优势并为公司带来价值。
It is also stated in the literature that beside these five Vs there is another important characteristic, which should be considered for applying Big Data. This important characteristic is about the complexity of the problem for which the data gathering is conducted [17]. Dealing with large data sets which are coming from a complex computing problem is sophisticated and finding the underlying trend can be challenging. For these problems Big Data tools can be very helpful.
文献中还指出,除了这五个V外,还有另一个重要特征,在应用大数据时应考虑到这一点。 这个重要的特征是关于数据收集问题的复杂性[17]。 处理来自复杂计算问题的大数据集非常复杂,要找到潜在的趋势可能会充满挑战。 对于这些问题,大数据工具可能会非常有帮助。
Fig. 1 summarizes the above mentioned characteristics of Big Data.
图1总结了大数据的上述特征。
2.2 Big Data methodology
As the Big Data is involving huge data sets and in some cases complicated problems, it is very important to have access to innovative and powerful technologies. These robust technologies should be very fast and accurate processors. In this section the tools and technologies which are available for Big Data analytics are listed and introduced.
由于大数据涉及海量数据集,在某些情况下复杂的问题,获得创新和强大的技术非常重要。 这些强大的技术应该是非常快速和准确的处理器。 在本节中,列出并介绍了可用于大数据分析的工具和技术。
2.2.1. Apache Hadoop
This tool is an open-source framework which is created by Doug Cutting and Mike Caferella in 2005 which is named after a toy elephant [15]. Hadoop is initially written in Java [14] and it uses distributed processing through enormous clusters of computers [15]. Hadoop has the capability of parallel processing of huge data sets, which results in scalable computing. Apache Hadoop is comprised of two major layers: Hadoop distributed file system (HDFS) and MapReduce. In fact, Apache Hadoop is a framework to implement MapReduce programming model [18]. The tasks are handled in two major phases. The first phase, which is storing data, is done under HDFS layer with its master/slave architecture by a master server called NameNode and clusters of slaves which are called DataNodes. Fig. 2 shows the architecture of the HDFS layer.
该工具是由Doug创建的开源框架Cuting和Mike Caferella在2005年以玩具大象的名字命名[15]。 Hadoop最初是用Java编写的[14],它通过庞大的计算机集群使用分布式处理[15]。 Hadoop具有并行处理海量数据集的能力,从而实现了可扩展的计算。 Apache Hadoop由两个主要层组成:Hadoop分布式文件系统(HDFS)和MapReduce。 实际上,Apache Hadoop是实现MapReduce编程模型的框架[18]。 这些任务分为两个主要阶段。 存储数据的第一阶段是在HDFS层及其主/从体系结构下,由称为NameNode的主服务器和称为DataNode的从属群集组成的。 图2显示了HDFS层的体系结构。
The second phase of handling tasks, which includes tracking and executing jobs, will take place in MapReduce layer. The master node for MapReduce is called JobTracker and the slave node is called TaskTracker [18]. In other words, the data processing and analysis in Hadoop is conducted in two phases which are called Map phase and Reduce phase. MapReduce can handle large datasets in parallel by using multiple clusters. These clusters are scalable and they are flexible and fault-tolerant [18]. In Map phase the data will be divided into two groups of Key and Value. In fact, key is node ID and value is the property of the node. So, the input data are taken by MapReduce in key-value pairs and JobTracker assigns tasks to TaskTracker. Then furtherprocessing of data will be conducted by TaskTracker. Then the output data during Map phase will be sorted and stored in a local file system during an intermediate phase. In the next step, the sorted data will be passed to Reduce phase, where the input data will be combined [20].Fig. 3 shows the architecture of MapReduce.
处理任务的第二阶段,包括跟踪和 执行作业,将在MapReduce层中进行。 MapReduce的主节点称 JobTracker,从节点称为 TaskTracker [18]。换句话说,数据处理和分析 Hadoop分两个阶段进行,分别称为Map阶段和 减少阶段。 MapReduce可以并行处理大型数据集 使用多个群集。这些集群是可扩展的,并且具有灵活性和容错性[18]。在Map阶段,数据将分为Key和Value两组。实际上,键是节点ID,值是节点的属性。因此,MapReduce以键-值对形式获取输入数据,JobTracker将任务分配给TaskTracker。然后由TaskTracker进行数据的进一步处理。然后,将在Map阶段将输出数据分类,并在中间阶段将其存储在本地文件系统中。在下一步中,已排序的数据将传递到Reduce阶段,在此阶段将合并输入数据[20]。图3显示了MapReduce的体系结构。
2.2.2. MangoDB
This is a NoSQL (non-relational) database technology which is document-orientated, based on JSON and written in C++. JSON is data processing format based on a JavaScript and is built on a collection of name/value pairs or an ordered list of values. NoSQL database technology can handle unstructured data such as documents, multi-media, and social media. Moreover, MangoDB provides a dynamic and flexible structure to be customized to fit the requirements of various users [13,21–24].
这是一种NoSQL(非关系)数据库技术,基于JSON并以C ++编写的面向文档的文档。 JSON是基于JavaScript的数据处理格式,建立在名称/值对或值的有序列表的集合上。 NoSQL数据库技术可以处理非结构化数据,例如文档,多媒体和社交媒体。 此外,MangoDB提供了动态灵活的结构,可以对其进行自定义以满足各种用户的需求[13,21–24]。
2.2.3. Cassandra
This is another NoSQL database technology which is key and column orientated. Cassandra was first a Facebook project that became open sourced few years later. It is especially efficient where it is possible to spend more time to learn a complex system which will provide a lot of power and flexibility [23].
这是另一种以键和列为导向的NoSQL数据库技术。 Cassandra最初是一个Facebook项目,几年后成为开源项目。 如果可以花更多的时间来学习一个复杂的系统,这将非常有效,它将提供很多功能和灵活性[23]。
2.3. Big Data processing
Big data sets which are collected need to be analyzed to extract the valuable underlying information. There have been different processing tools which translates the large data sets into meaningful results and outcomes. Following is a list of common processing tools for Big Data.
需要分析收集的大数据集以提取有价值的基础信息。 进行了不同的处理可以将大型数据集转化为有意义的结果的工具,以及结果。 以下是大数据的常用处理工具列表。
2.3.1. R
R is a modern, functional programming language that allows for rapid development of ideas, together with object-oriented features for rigorous software development initially created by Robert Gentleman
and Robert Ihaka. The powerful set of inbuilt functions makes it ideal for high-volume analysis or statistical simulations. It also supports the packaging system, which means that the code provided by others can easily be shared. Finally, it generates high-quality graphical outputs, so that all stages of a study, from modeling/analysis to publication, can be undertaken within R [25].
R是一种现代的函数式编程语言,它允许想法的快速发展,以及面向对象的功能最初由Robert Gentleman创建的严格的软件开发和罗伯特·伊卡(Robert Ihaka)。 强大的内置功能使其成为理想选择用于大量分析或统计模拟。 它还支持打包系统,这意味着可以轻松共享其他人提供的代码。 最后,它生成高质量的图形输出,因此研究的所有阶段,从建模/分析到发布,都可以在R中进行[25]。
It can be said that R is a specialized language which includes various modules and toolboxes to mainly facilitate the statistical computations. It can help with loading data, conducting complicated computations, and finally visualizing the results and outputs. However, from data processing point of view, R’s major drawback is working with datasets that fit within a single machine’s memory [23].
可以说R是一种特殊的语言,其中包括各种模块和工具箱,主要方便统计计算。 它可以帮助加载数据,进行复杂的计算并最终可视化结果和输出。 但是,从数据处理的角度来看,R的主要缺点是使用适合单个机器内存的数据集[23]。
2.3.2. Datameer
Datameer is an easy to use programming platform which uses Hadoop to improve its data processing. It comes with user-friendly data importing and output visualization tools. It is estimated to gain more interest as it uses a user-friendly interface to conduct various data processing tasks [23].
Datameer是一个易于使用的编程平台,它使用Hadoop改善了其数据处理能力。 它带有用户友好的数据导入和输出可视化工具。 据估计,由于它使用用户友好的界面来执行各种数据处理任务,因此越来越引起人们的兴趣[23]。
2.3.3. BigSheets
IBM has offered a web application called BigSheets, which helps less expert and nontechnical users to gather unstructured data from various online and internal sources and then conduct a data analysis and present the results with simple visualization tools. BigSheets also utilizes Hadoop to process massive datasets. It also employs some additional tools such as OpenCalais to facilitate the extracting of structured data from a pool of unstructured data. This tool should be used for data analysis individually and it is easier to be used by the users familiar with spreadsheet applications [23].
IBM提供了一个名为BigSheets的Web应用程序,该应用程序的帮助较少专家和非技术用户从各种在线和内部来源收集非结构化数据,然后进行数据分析并使用简单的可视化工具展示结果。 BigSheets还利用Hadoop处理大量数据集。 它还使用诸如OpenCalais之类的一些其他工具来促进从非结构化数据池中提取结构化数据。 该工具应单独用于数据分析,熟悉电子表格应用程序的用户更容易使用它[23]。
3. Big data in upstream oil and gas industry
The application of Big Data is now extended beyond the database, marketing, and business techniques. Many engineering disciplines are utilizing Big Data analytics for various applications. Recently, the up-
stream oil and gas industry is also impacted by the versatility of Big Data. The application of Big Data has become prominent as the amount of data generated and recorded in oil and gas industry has significantly increased. The improvements in seismic acquisitions devices, channel counting, fluid front monitoring geophones, carbon capture and sequestration sites, LWD, and MWD tools have provided vast amount of data to be processed and analyzed [26]. Anand [27] presents an informative description on why and how Big Data can now reveal too much hidden information from the vast amount of available data in oil and gas industry. He used a 3D plain to show the relationship between data, science, technology, engineering, and mathematics (STEM) tools, and pattern recognition. As it is shown in Fig. 4, if limited amount of data is utilized with basic STEM tools, the result would reveal limited patterns, which may lack thorough insight and may carry significant uncertainty. However, if a large data set is available and used with more sophisticated STEM tools, more promising patterns can be recognized, which may be much closer to the true values [27].
大数据的应用现已扩展到数据库之外,市场营销和业务技巧。许多工程学科是将大数据分析用于各种应用程序。最近,石油和天然气行业也受到Big的多功能性的影响数据。随着石油和天然气行业中生成和记录的数据量显着增加,大数据的应用变得日益重要。地震采集设备,通道计数,流体前部监测地震检波器,碳捕获和封存地点,随钻测井和随钻测井工具的改进提供了大量要处理和分析的数据[26]。 Anand [27]提供了一个信息丰富的描述,说明了为什么大数据现在以及如何能够从石油和天然气行业的大量可用数据中揭示太多隐藏的信息。他使用3D平原来显示数据,科学,技术,工程和数学(STEM)工具与模式识别之间的关系。如图4所示,如果基本STEM工具使用的数据量有限,则结果将显示出有限的模式,这可能缺乏透彻的洞察力,并可能带来很大的不确定性。但是,如果有大量数据集并与更复杂的STEM工具一起使用,则可以识别出更有前途的模式,这可能更接近真实值[27]。
3.1. Big Data in exploration
The task of interpreting the seismic data requires sophisticated processing computers with powerful visualizations capabilities. With the recent improvements in seismic devices, the amount of generated
data has boosted significantly. The detailed interpretation of these new datasets needs to go beyond the conventional methods. In fact, one of the most important applications of Big Data in oil and gas industry is analyzing the seismic data [28]. Machine learning tools can reveal the relationship between the recorded data more efficiently, specifically for the recent case of dealing with huge datasets. In a research conducted by Roden [29], the author incorporated principal component analysis (PCA) with self-organizing maps (SOM) to carry out multi-component seismic analysis. In his research, the analysis was followed during five stages. During the first stage, the geological issue was clearly defined; then during the second stage, PCA was run to identify the key attributes related to the defined problem; during the third stage, SOM was run by employing machine learning tools to train a prediction tool; during the fourth stage, the outcomes of SOM analysis was further analyzed by 2D maps to identify the important geological features; finally during the fifth stage, a sensitivity analysis was conducted to refine the results by considering various attributes and different training scenarios [29].
解释地震数据的任务需要复杂的具有强大的可视化功能来处理计算机。用地震设备的最新改进,产生的数量数据大大增加。这些新数据集的详细解释需要超越常规方法。实际上,大数据在石油和天然气行业中最重要的应用之一就是分析地震数据[28]。机器学习工具可以更有效地揭示记录的数据之间的关系,特别是在处理大型数据集的最新情况下。在Roden [29]进行的一项研究中,作者将主成分分析(PCA)与自组织图(SOM)结合在一起以进行多成分地震分析。在他的研究中,分析分五个阶段进行。在第一阶段,对地质问题进行了明确定义;然后在第二阶段中,运行PCA以识别与已定义问题相关的关键属性;在第三阶段,通过使用机器学习工具来训练预测工具来运行SOM。在第四阶段,通过2D地图进一步分析SOM分析的结果,以识别重要的地质特征。最后在第五阶段,通过考虑各种属性和不同的训练场景进行了敏感性分析,以完善结果[29]。
In another research done by Joshi et al. [30], Big Data was utilized to analyze the micro-seismic data sets to model the fracture propagation maps during hydraulic fracturing. In this research, the authors used the Hadoop platform instead of conventional tools to manage the massive datasets generated by micro-seismic tools. They used various datasets from exploration, drilling, and production operations to characterize the reservoir. Furthermore, the success ratio was improved by detecting the potential anomalies based on the previous failed jobs [30].
在乔希等人完成的另一项研究中。 [30],利用了大数据分析微地震数据集以对水力压裂过程中的裂缝扩展图建模。 在这项研究中,作者使用Hadoop平台代替了传统工具来管理由微地震工具生成的海量数据集。 他们使用了来自勘探,钻探和生产作业的各种数据集来表征油藏。 此外,通过基于先前失败的作业检测潜在的异常来提高成功率[30]。
In a study by Olneva et al. Big Data was used to cluster 1D, 2D, and 3D geological maps for West Siberian Petroleum Basin with seismic data. For their work, they followed two different approaches which was called by the authors as “from general to particulars” and “from particulars to general” approaches. For the first approach, they used drilling data and regional maps for 5000 wells. For the second approach,they used seismic and geological patterns for more than 40000km 2[31].
在Olneva等人的研究中。 大数据曾用于对1D,2D和西伯利亚石油盆地地震3D地质图数据。 对于他们的工作,他们遵循两种不同的方法,作者称之为“从一般到具体”和“从具体到一般”的方法。 对于第一种方法,他们使用了5000口井的钻井数据和区域图。 对于第二种方法,他们使用了超过40000km 2的地震和地质模式[31]。
3.2. Big Data in drilling
There are various sources of data in drilling industry which mainly include the generated data from digital rig site and manually entered data by human operators. These data which are gathered from different operations through drilling can be applied to conduct various analyses from scheduling to drilling operation itself. The invention and application of new data recording tools and data formats have made it even more applicable to employ the Big Data tools in drilling operations. There are now more than 60 different sensors, which are recording various parameters throughout drilling operations [32]. In a work done by Duffy et al. [33] the drilling rig efficiency was improved by implementing best-safe-practices initiatives identified by an automated drilling state detection monitoring service. In their case study on pad drilling in Bakken, they focused on Wight to Weight (W2W) connection time during drilling operations. Based on their results, a savings of more than 11.75 days on a single pad of nine wells drilled by the same rig was observed. They also found that the total non-drilling time was improved by 45%. In another study by Maidla et al. [34] the drilling performance was improved by applying Big Data analytics and including drilling and formation parameters. In their study, the data from morning report, electronic drilling recorder (EDR), and cross-plots of weight on bit (WOB) and differential pressure were used to optimize the drilling performance. In their study, they emphasized that data filtering, quality control, and also knowing the basic physics behind the problem under study are critical factors, which should be considered in order to find a reliable optimized outcome. Otherwise, the findings can be misleading which result in loss of time and resources.
钻井行业有各种数据来源,主要是包括从数字钻机站点生成的数据并手动输入人工操作员提供的数据。这些通过钻井从不同作业中收集的数据可以应用于从调度到钻井作业本身进行各种分析。新数据记录工具和数据格式的发明与应用使其更适用于在钻井作业中使用大数据工具。现在有60多个不同的传感器,它们在整个钻井作业中记录着各种参数[32]。在达菲等人完成的工作中。 [33]通过执行自动钻探状态检测监控服务确定的最佳安全实践措施,提高了钻机效率。在他们对巴肯(Bakken)垫板钻孔的案例研究中,他们重点研究了钻孔操作期间的重量与重量(W2W)连接时间。根据他们的结果,在同一台钻机钻的9口井中,观察到节省了11.75天以上的时间。他们还发现总的非钻孔时间缩短了45%。在Maidla等人的另一项研究中。 [34]通过应用大数据分析并包括钻井和地层参数,提高了钻井性能。在他们的研究中,早间报告,电子钻井记录仪(EDR)的数据以及钻压(WOB)和压差的交叉图被用于优化钻井性能。在他们的研究中,他们强调数据过滤,质量控制以及对所研究问题的基本物理学知识的了解是至关重要的因素,应考虑这些因素以找到可靠的优化结果。否则,发现可能会产生误导,从而导致时间和资源的浪费。
In another study done by Yin et al. [35], Big Data was used to find the invisible non-production time (INPT) by using the collected real-time logging data. The authors improved the drilling operations by
optimizing the INPT through using mathematical statistics, the artificial experience, and cloud computing.
在尹等人进行的另一项研究中。 [35],使用大数据来查找通过使用收集的实数时间记录数据。 作者通过以下方法改进了钻井作业通过使用数学统计,人工经验和云计算来优化INPT。
In a study by Johnston and Guichard [36] Big Data was employed to reduce the risks associated with drilling operations. They used drilling data, well logging data, and geological formation tops for about 350 oil and gas wells in the UK North Sea. They were dealing with different data types such as .txt, .xls, .pdf, and .las. They reported that the challenging part was the data gathering and processing step in the project.
在Johnston和Guichard的研究中[36]大数据被用于降低与钻井作业相关的风险。 他们用钻英国北海约350口油气井的数据,测井数据和地质构造顶部。 他们正在处理不同数据类型,例如.txt,.xls,.pdf和.las。 他们报告说,具有挑战性的部分是项目中的数据收集和处理步骤。
In a study by Hutchinson et al. [37] the data from downhole vibration sensors were utilized to characterize the drill string dynamics. In their study they combined the actual data with the simulation data to develop a drilling automation application. There developed model reduced the risks of drilling failures and also lowered the drilling development costs.
在Hutchinson等人的研究中。 [37]来自井下振动传感器的数据被用来表征钻柱动力学。 在他们的研究中,他们将实际数据与模拟数据相结合,以开发钻井自动化应用程序。 开发的模型降低了钻井失败的风险,并降低了钻井开发成本。
3.3. Big Data in reservoir engineering
The advent of distributed downhole sensors such as distributed temperature sensors (DTS), discrete distributed temperature sensors (DDTS), distributed acoustic sensors (DAS), single-point permanent
downhole gauges (PDG), and discrete distributed strain sensors (DDSS) has resulted in generation of huge amount of data in the field of reservoir characterization. Bello et al. [38] used these data to develop a reservoir management application based on utilizing Big Data analytics. The four major components of their application included visualizer, downhole data filtering, model builder, and model application. The visualizer helped with data viewing and analysis, while the filtering component was used to eliminate the outliers and non-reliable data. For the model builder, machine learning tools were used to do the training, model development, and validation. They used the Apache Spark machine learning tool (MLib) to conduct the Big Data analytics. They also showed that transferring the developed model to a web-based platform can facilitate the user/system interactions [38].
分布式井下传感器的问世,例如分布式温度传感器(DTS),离散分布式温度传感器(DDTS),分布式声学传感器(DAS),单点永久井下测量仪(PDG)和离散分布应变传感器(DDSS)已导致在储层表征领域生成大量数据。贝洛等。 [38]使用这些数据来开发基于大数据分析的油藏管理应用程序。其应用程序的四个主要组件包括可视化器,井下数据过滤,模型构建器和模型应用程序。可视化工具有助于数据查看和分析,而过滤组件用于消除异常值和不可靠的数据。对于模型构建者,机器学习工具用于进行训练,模型开发和验证。他们使用Apache Spark机器学习工具(MLib)进行大数据分析。他们还表明,将开发的模型转移到基于Web的平台可以促进用户/系统的交互[38]。
Recently, a new generation of reservoir simulation technique is becoming more popular. This new technique incorporates the artificial intelligence and data mining technologies with the Closed-Loop Reservoir Management (CLRM) and Integrated Asset Modeling (IAM). The result will be an innovative information-oriented reservoir modeling approach. In fact, data-driven methods can improve the modeling by predicting the affective parameters which theory-based equations of state cannot capture [1,39].
近来,新一代油藏模拟技术变得越来越流行。 这项新技术结合了人工闭环水库管理(CLRM)和集成资产建模(IAM)的智能和数据挖掘技术。结果将是一种创新的面向信息的油藏建模方法。 实际上,数据驱动的方法可以通过预测情感参数来改进建模,而情感参数是基于理论的状态方程无法捕获的[1,39]。
In a study by Haghighat et al. [40], Big Data and data-driven methods were utilized to improve the CO 2 sequestration by predicting the possibility of CO 2 leakage. For this purpose, two permanent downhole gauges (PDG) were installed in an observation well to collect the pressure data. The different scenarios of CO 2 leakage were modeled by using a simulated reservoir model for the field of interest (Citronelle Dome, Alabama). Machine learning tools were used to analyze the high volume and high frequency pressure data. Finally, they were able to develop a real-time, long-term, CO 2 leakage detection system.
在Haghighat等人的研究中。 [40],大数据和数据驱动通过预测CO 2泄漏的可能性,利用各种方法来改善CO 2隔离。 为此,在观测井中安装了两个永久性井下压力计(PDG),以收集压力数据。 通过使用感兴趣的领域(阿拉巴马州Citronelle Dome)的模拟储层模型,对不同的CO 2泄漏情景进行了建模。 机器学习工具被用来分析大量和高频压力数据。 最终,他们能够开发出实时的,长期的CO 2泄漏检测系统。
In a study carried out by Popa et al. [41] Big Data was utilized to conduct an optimization on heavy oil reservoirs which are under steam assisted gravity drainage (SAGD) and cyclic steam operations. In their research, the authors focused on Chevron’s San Joaquin heavy oil reservoirs which in total included more than 14,200 wells. These number of wells provided vast amount of structured and unstructured static and dynamic data including various logs, temperature, steam, core data, fluid saturation, well completion data, geological features, steam injection rate and pressure, and flow-line and wellhead temperature. The workflow of their study was followed in following steps: 1. Data acquisition 2. Data transfer to business domain 3. Data storage [41].
在Popa等人进行的一项研究中。 [41]大数据被用于对处于蒸汽辅助重力排水(SAGD)和循环蒸汽操作下的重油储层进行优化。 在他们的研究中,作者将重点放在雪佛龙的圣华金稠油油藏中,总共包括14200多口井。 这些数量的井提供了大量的结构化和非结构化静态和动态数据,包括各种测井,温度,蒸汽,岩心数据,流体饱和度,完井数据,地质特征,注汽速率和压力以及流线和井口温度。 他们的研究工作流程遵循以下步骤:1.数据获取2.数据传输到业务域3.数据存储[41]。
Big Data has also been used to conduct reservoir modeling for unconventional oil and gas resources [42,43]. Lin [42] combined the physics and analytics-based solutions to carry out reservoir modeling by using Big Data.
大数据也已用于非常规油气资源的储层模拟[42,43]。 林[42]结合了基于物理和分析的解决方案,以使用大数据进行储层建模。
Udegbe et al. [44] used Big Data to improve the modeling of hydraulically fractured reservoirs by analyzing the production data. They generated the required data by developing a dual-permeability model and trying various fracture parameters. They applied the pattern recognition (similar to a face detection technology) methodology to the generated data to reveal the underlying trends in the data.
Udegbe等。 [44]使用大数据通过分析生产数据来改善水力压裂储层的建模。 他们通过开发双重渗透率模型并尝试各种裂缝参数来生成所需数据。 他们将模式识别(类似于面部检测技术)方法应用于所生成的数据,以揭示数据的潜在趋势。
Big Data has also been used to optimize the selection and application of costly enhanced oil recovery (EOR) methods. In a study done by Xiao and Sun [45], the researchers employed Big Data analytics to
optimize the application of EOR projects through an improved hydrodynamic reservoir simulation.
大数据还已用于优化选择和应用昂贵的强化采油(EOR)方法。 在Xiao和Sun进行的一项研究中[45],研究人员采用了大数据分析技术通过改进的水动力储层模拟,优化EOR项目的应用。
3.4 Big Data in production engineering
Seemann et al. [46] from Saudi Aramco developed a smart forecast and flow method to conduct automated decline analysis. Their goal was to identify the underlying pattern in production data and to forecast the production performance.
Seemann等。 来自沙特阿美的[46]提出了明智的预测和流量方法进行自动下降分析。 他们的目标是确定生产数据中的基本模式并预测生产性能。
Rollins et al. [47] conducted a study for Devon Energy to develop a production allocation technique by using Big Data. For the first task, they used the publicly available data from IHS to develop an allocation
methodology. In the next step Big Data was used as a platform to conduct the allocation procedure for the users. The processing tool for Big Data in their study was Hadoop. They finally developed a user friendly map-based visual output for the allocated production data.
Rollins等。 [47]为Devon Energy进行了一项研究,以开发一种大数据的生产分配技术。 对于第一个任务,
他们使用了IHS的公开数据来制定分配方法。 在下一步中,大数据被用作进行用户分配程序。 他们研究中的大数据处理工具是Hadoop。 他们最终为分配的生产数据开发了基于用户友好的基于地图的视觉输出。
Moreover, Big Data has been successfully used to optimize the performance of electric submersible pumps (ESPs) [48,49]. Sarapulov and Khabibullin [48] utilized Big Data to evaluate the performance of
ESPs by identifying emergency situations such as overheating and unsuccessful start-ups. For their study a total of about 200 million logs were gathered from 1649 wells during one year. The raw data gathered were in various formats, so the authors first converted all the data to csv format.
此外,大数据已成功用于优化潜水电泵的性能[48,49]。 萨拉普洛夫和Khabibullin [48]利用大数据评估了
通过识别紧急情况(例如过热和启动失败)来进行ESP。 在一年的研究中,从1649口井中收集了大约2亿个测井数据。 收集的原始数据有多种格式,因此作者首先将所有数据转换为csv格式。
In a study done by Palmer and Turland [50], Big Data was utilized to optimize the performance of rod pump wells based on a three-step workflow. The three steps of their workflow included the first step to be the data acquisition which was comprised of well test data, well equipment data, and supervisory control and data acquisition (SCADA), the second step was automated workflows which conducted the required calculations to develop the model, and the third step was interactive data visualization which provided a user-friendly interface to extract the results [50].
在Palmer和Turland进行的一项研究中[50],利用了大数据通过三步优化杆式泵井的性能工作流程。 他们工作流程的三个步骤包括第一步是数据采集,该数据采集包括试井数据,设备数据,监督控制和数据采集(SCADA),第二步是自动化工作流程,进行所需的计算以开发模型,第三步是交互式数据可视化,提供了用户友好的界面来提取结果[50 ]。
Shale operators are also using Big Data to improve hydraulic fracturing projects. In a project done by a shale operator, Southwestern Energy, the field and simulation data revealed that proppant loading and spacing between fracturing stages would significantly affect the productivity index [51].
页岩作业人员还使用大数据来改善水力压裂项目。 在一个由页岩运营商西南能源公司完成的项目中,现场和模拟数据表明,支撑剂的负载和压裂阶段之间的间距会显着影响生产率指数[51]。
In another study conducted by Ockree et al. [52] Big Data was used to develop AI- based production type curves to be incorporated with economic analysis to conduct field development. In their work, the first step was followed based on an extensive data processing pipeline including raw data (structured and unstructured databases) gathering, data filtering, joining the filtered data, and transferring the data to machine learning pipeline. The authors used Robust Mahalonobis technique to remove the outliers from the gathered data.
在Ockree等人进行的另一项研究中。 [52]使用了大数据开发基于AI的生产类型曲线以与进行现场开发的经济分析。 在他们的工作中,第一步是基于广泛的数据处理管道,包括原始数据(结构化和非结构化数据库)的收集,数据过滤,加入过滤后的数据并将数据传输到机器学习管道。 作者使用鲁棒的Mahalonobis技术从收集的数据中删除异常值。
4. Big data in downstream oil and gas industry
4.1. Big Data in refining
In a study by Plate [53], the application of Big Data in refining is reviewed. In this case study, the historical data were analyzed and processed to improve a petrochemical asset management in a three-step procedure. In this case study, the equipment of interest was a four-stage cracked gas compressor (CGC). The analysis started by first predicting the performance of CGC by analyzing the current and historical operating data. In the next phase, based on the device’s end-of-life criteria and failure conditions, the performance prediction of the CGC was further tuned. Finally, the estimated performance of the CGC was presented in a user-friendly and visual report to be used for management decisions [53]. These predictive reports which are developed by employing data analysis can significantly reduce the downtime and maintenance costs.
在Plate [53]的一项研究中,大数据在提炼中的应用是已审查。 在本案例研究中,分析了历史数据并通过三步过程来改进石化资产管理。 在此案例研究中,感兴趣的设备是四级裂解气压缩机(CGC)。 分析首先要通过分析当前和历史操作数据来预测CGC的性能。 在下一阶段,根据设备的使用寿命终止条件和故障条件,进一步调整了CGC的性能预测。 最后,CGC的估计性能以易于使用的直观报告的形式呈现 可用于管理决策[53]。 通过采用数据分析开发的这些预测报告可以显着减少停机时间和维护成本。
In a recent project by Repsol SA, Big Data analytics is utilized to conduct management optimization for one of the company’s integrated refineries in Spain. For this project, Google Cloud would provide Repsol with data analytics products and consultation as well as Google Cloud machine learning services [54].
在Repsol SA最近的一个项目中,大数据分析被用于对该公司在西班牙的一家综合炼油厂进行管理优化。 对于该项目,Google Cloud将为Repsol提供数据分析产品和咨询以及Google Cloud机器学习服务[54]。
In a study by Khvostichenko and Makarychev-Mikhailov [17] Big Data was used to develop a workflow to investigate the effects of completion parameters on well productivity. They gathered the data from 4500 well which were under slickwater treatment. They investigated the effects of two different chemicals i.e. linear guar gels and surfactant-based flowback aids. They also gathered the monthly production data from the IHS Energy database. The statistical approach used to analyze the data was t-test.
Khvostichenko和Makarychev-Mikhailov进行的研究[17]大数据用于开发工作流程以调查完井参数对油井产能的影响。 他们收集了数据从4500口井中进行了滑水处理。 他们研究了两种不同化学药品的效果,即线性瓜尔胶和基于表面活性剂的回流助剂。 他们还从IHS能源数据库中收集了每月生产数据。 用于分析数据的统计方法是t检验。
4.2. Big Data in oil and gas transportation
Anagnostopoulos [55] conducted a research to apply Big Data analytics in order to improve the shipping performance. In his study, he aimed to predict the propulsion power to improve the performances of ships and consequently to lower the greenhouse gas emissions. The data gathered for this study were collected over period of three months from the sensors throughout a LCTC (Large Car Truck Carrier) M/V. In the next step, they used eXtreme Gradient Boosting (XGBoost) and Multi-Layer Perceptron (MLP) neural networks to conduct the data analysis.
Anagnostopoulos [55]进行了一项研究,以应用大数据分析以提高运输性能。 在他的研究中,他旨在预测推进力,以改善船舶的性能,从而降低温室气体的排放。 在整个LCTC(大型卡车运输工具)M / V中,在三个月的时间内从传感器收集了本研究的数据。 在下一步中,他们使用了极限梯度增强(XGBoost)和多层感知器(MLP)神经网络来进行数据分析。
4.3. Big Data in Health and Safety Executive (HSE)
In a study by Park et al. [56] Big Data was utilized to develop an energy efficiency model based on the operation data gathered during ship operations. In their study, an energy indicator called energy efficiency operational indicator (EEOI) was estimated based on publicly available automatic identification system data and marine environment data. The energy efficiency was defined as the ship fuel consumption by engine power versus the operation weight and distance. For implementing Big Data, authors used Hadoop framework and Apache Spark for machine learning tasks.
在Park等人的研究中。 [56]大数据被用于根据在运营期间收集的运营数据来开发能效模型船舶作业。 在他们的研究中,根据可公开获得的自动识别系统数据和海洋环境数据,估算了一个称为能效操作指标(EEOI)的能源指标。 能源效率定义为通过发动机功率与操作重量和距离之间的船舶燃油消耗。 为了实现大数据,作者使用Hadoop框架和Apache Spark进行机器学习任务。
In a study conducted by Tarrahi and Shadravan [57], Big Data analytics was used to improve the oil and gas occupational safety by managing the risk and enhancing the safety. The study was carried on based on a case by Bureau of Labor Statistics (BLS) which included 846 sources of injury from 1278 industries between 2011 and 2014. The first step in their study was data collection and processing. For this purpose they filtered the raw data based on the quality of recordings and they eliminated the outliers from datasets by relative standard error measurement. Then they developed the structured data by format conversion and data decoding. In the next step the authors conducted data clustering and mapping to identify the underlying hidden trends. At the end, in order to present an easily understandable outcome, they used multi-dimensional statistical analysis [57,58].
在Tarrahi和Shadravan [57]进行的一项研究中,大数据分析被用于通过管理风险和增强安全性来改善石油和天然气职业安全。 这项研究是根据劳工统计局(BLS)的一个案例进行的,该案例包括2011年至2014年间来自1278个行业的846个伤害源。研究的第一步是数据收集和处理。 为此,他们根据记录的质量过滤了原始数据,并通过相对标准误差测量从数据集中消除了异常值。 然后他们通过格式转换和数据解码来开发结构化数据。 在下一步中,作者进行了数据聚类和映射,以识别潜在的潜在趋势。 最后,为了提供易于理解的结果,他们使用了多维统计分析[57,58]。
It is reported by Pettinger [59] that the data gathered from safety inspections can be used to develop safety predictive analytics. It is crucial to gather the safety indicator data within the company continuously and incorporate them in predictive analytics. The safety indicators which will provide the required data includes assessing behaviors and assessing compliance.
Pettinger [59]报告说,从安全中收集的数据 检查可用于开发安全性预测分析。 它是 对于持续收集公司内部的安全指标数据并将其纳入预测分析至关重要。 提供所需数据的安全指标包括评估行为和评估合规性。
Cadei et al. [60] employed Big Data to develop prediction software to forecast hazard events and operational upsets during oil and gas production operations. The indicator that they used as a hazard event for prediction was H 2 S concentration. They gathered data from various sources including real-time series, historical data, maintenance reports, operator data, and chemical analysis. The workflow of their study includes data collection, problem definition, data processing, modeling(using artificial neural network (ANN), random forest), and finally model validation.
Cadei等。 [60]使用大数据来开发预测软件预测石油和天然气中的危险事件和操作失常生产作业。 他们用作危险事件的指标用于预测的是H 2 S浓度。 他们从各种来源收集数据,包括实时序列,历史数据,维护报告,操作员数据和化学分析。 他们研究的工作流程包括数据收集,问题定义,数据处理,建模(使用人工神经网络(ANN),随机森林),最后是模型验证。
5. Big Data challenges
One of the major challenges of Big Data’s application in any industry including oil and gas industry is the cost associated with managing the data recording, storage, and analysis. With the recent technological improvements, fog computing, cloud computing, and Internet of Things (IoT) have become available to fix the issues regarding data storage and computations [22,61]. Costly and limited cloud computing facilities are not suitable options for non-fixed location or latency-sensitive applications. On the other hand, fog computing facilities provide storage and computing facilities closer to data generation sources, which resolves the mentioned challenges to some extent. However, IoT is a newer technology, which is more mobile and fixes the latency issues as well[62].
大数据在任何行业中的应用面临的主要挑战之一包括石油和天然气行业在内的是与管理数据记录,存储和分析相关的成本。 随着最近的技术改进,雾计算,云计算和物联网(IoT)可以用来解决有关数据存储和计算的问题[22,61]。 昂贵且受限的云计算工具不适用于非固定位置或对延迟敏感的应用程序。 另一方面,雾计算设施提供了更靠近数据生成源的存储和计算设施,这在一定程度上解决了上述难题。 然而,物联网是一种较新的技术,它具有更大的移动性,并且还解决了延迟问题[62]。
In a study done by Cameron [63], the author mentions that the challenges of using Big Data for oilfield service companies include the knowledge of personnel in oil companies and the data ownership issues. He mentions that Big Data can be used for seismic analysis, reservoir modeling, drilling services, and production reporting [63]. Furthermore, he defined nine factors for a successful application of Big Data for oil and gas industries including accurately defining the business problem, combining Big Data methods with physics-based data analysis, using interdisciplinary team of computer scientists and petroleum engineers, delivering the results as a user-friendly interface, being need-driven, and addressing exactly how the solved problem is related to the whole picture [63].
在卡梅伦[63]所做的一项研究中,作者提到将大数据用于油田服务公司的挑战包括石油公司人员的知识和数据所有权问题。 他提到大数据可用于地震分析,储层建模,钻井服务和生产报告[63]。 此外,他定义了大数据在石油和天然气行业成功应用的九个因素,包括准确定义业务问题,将大数据方法与基于物理的数据分析相结合,使用计算机科学家和石油工程师的跨学科团队,将结果 一个易于使用的界面,受到需求的驱动,并准确地解决了已解决的问题与整个情况的关系[63]。
The emergence of Big Data in oil and gas industry has become more prominent by evolution of digital oilfields, where various sensors and recording devices are generating millions of data each day. One of the critical challenges in digital oilfields is the data transfer from the field to data processing facilities based on the type of data, amount of data, and data protocols [64,65].
石油和天然气行业中大数据的出现越来越多 因数字油田的发展而突出,那里有各种传感器和 记录设备每天都在生成数百万个数据。 其中一个数字油田的关键挑战是来自现场的数据传输基于数据类型,数据量和数据协议的数据处理工具[64,65]。
In a survey conducted by IDC Energy [12], it was found that the biggest challenge in utilizing Big Data in oil and gas industry is lack of awareness and business support. Other challenges found in that survey were decision about the relevant data, lack of skilled personnel, and cost of Big Data infrastructure. Therefore, familiarizing the staff and executive members with the technology and its applications will significantly facilitate the implementation of Big Data in oil and gas industry.
在IDC能源[12]进行的一项调查中,发现在石油和天然气行业中利用大数据的最大挑战是缺乏意识和业务支持。 该调查中发现的其他挑战是有关数据的决策,缺乏熟练的人员以及大数据基础架构的成本。 因此,使员工和执行人员熟悉该技术及其应用将极大地促进大数据在石油和天然气行业的实施。
In a more recent study, Maidla et al. [34] listed more technical challenges facing the application of Big Data. Based on their research, the technical issues were mainly related to the limitations associated
with the data recording sensors. The other issue was the frequency of data recording and also the quality of the recorded data. Finally, an important challenge is the thorough understanding of the physics of the problem. Expert petroleum engineers should collaborate with data scientists to correctly apply the Big Data tools to solve the various problems in the field of petroleum engineering.
在最近的一项研究中,Maidla等人。 [34]列出了大数据应用面临的更多技术挑战。 根据他们的研究,技术问题主要与相关限制有关带有数据记录传感器。 另一个问题是数据记录以及记录数据的质量。 最后,重要的挑战是对问题物理学的透彻理解。 石油专家应该与数据合作科学家正确使用大数据工具来解决石油工程领域的各种问题。
It is recommended by Preveral et al. [66] that each company develop their specific Big Data tools, including data recording and storage facilities and also data analytic tools. This would reduce the cost of software ownership and it would optimize the value of the recorded data.
它由Preveral等人推荐。 [66]每个公司都开发其特定的大数据工具,包括数据记录和存储工具以及数据分析工具。 这将减少软件拥有成本,并将优化记录数据的价值。
6. Conclusions
In this paper a comprehensive review was conducted on the application of Big Data analytics in oil and gas industry. The term Big Data (also called Big Data Analytics or business analytics) defines the first characteristic of this method, which is the volume (size) of the available data set. The other characteristics of Big Data are velocity, variety, veracity, value, and complexity. Because of the recent improvements in data recording technologies and the necessity for efficient exploration and production operations, Big Data has gained interest and significance in oil and gas industry. For the exploration operations, the recent improvements in seismic devices, the amount of generated data has boosted significantly. It has been reported that methods such as PCA analysis or platforms such as Hadoop can be used to interpret seismic and micro-seismic data. In a case study in the field of drilling engineering, the data obtained through an automated drilling state detection monitoring service, was analyzed to improve the drilling time and drilling safety. Furthermore, analyzing the data from DTS, DDTS, DAS, PDG, and DDSS sensors have improved the reservoir characterization and simulation. Big Data has been successfully used in production engineering in areas such as optimization of the performance electric submersible pumps and production allocation techniques. Big data has also been successfully used in downstream of oil and gas industry in areas such as oil refining, oil and gas transportation, and HSE. Although Big Data is gaining interest by E&P companies, but there are still some major challenges which are required to be addressed in order to apply the Big Data efficiently. Those challenges mainly include lack of business support and awareness about the Big Data within the industry, quality of the data, and understanding the complexity of the problem.
本文对大数据分析在石油和天然气行业中的应用进行了全面的回顾。术语大数据(也称为大数据分析或业务分析)定义了此方法的第一个特征,即可用数据集的数量(大小)。大数据的其他特征是速度,多样性,准确性,价值和复杂性。由于数据记录技术的最新改进以及有效勘探和生产运营的必要性,大数据已在石油和天然气行业中引起了兴趣和意义。对于勘探作业,地震设备的最新改进,生成的数据量大大增加。据报道,可以使用诸如PCA分析之类的方法或诸如Hadoop之类的平台来解释地震和微地震数据。在钻井工程领域的案例研究中,分析了通过自动钻井状态检测监控服务获得的数据,以改善钻井时间和钻井安全性。此外,分析来自DTS,DDTS,DAS,PDG和DDSS传感器的数据可改善储层特征和模拟。大数据已成功用于生产工程中的领域,例如性能优化的潜水电泵和生产分配技术。大数据也已成功用于石油和天然气行业的下游,如炼油,石油和天然气运输以及HSE。尽管E&P公司越来越关注大数据,但是为了有效地应用大数据,仍然需要解决一些主要挑战。这些挑战主要包括缺乏业务支持和对行业内大数据的认识,数据质量以及对问题的复杂性的了解。
References
[1] M.R. BruléGroup IBMS, The Data Reservoir : How Big Data Technologies Advance
Data Management and Analytics in E & P Introduction – General Data Reservoir
Concepts Data, Reservoir for E & P, 2015.
[2] B.C. Wipro, K.K. Wipro, Smart Decision Making Needs Automated Analysis " Making
Sense Out of Big Data in Real-time, (2014).
[3] W. Wu, X. Lu, B. Cox, G. Li, L. Lin, Q. Yang, et al., Retrieving Information and
Discovering Knowledge from Unstructured Data Using Big Data Mining Technique:
Heavy Oil Fields Example, (2014).
[4] A Bin Mahfoodh, M. Ibrahim, M. Hawi, K. Hakami, S. Aramco, Introducing a Big
Data System for Maintaining Well Data Quality and Integrity in a World of
Heterogeneous Environment Methedology, (2017).
[5] R.K. Perrons, J. Jensen, The Unfinished Revolution : what Is Missing from the E & P
Industry ’ S Move to “ Big Data ”, (2014).
[6] R.K. Perrons, J.W. Jensen, I. Corporation, Data as an Asset : what the Upstream Oil
& Gas Industry Can Learn about “ Big Data ” from Companies like Social Media what
Has Made Big Data Possible ? (2014).
[7] M. Akoum, A. Mahjoub, SPE 167410 a unified Framework for Implementing
Business Intelligence , Real-time Operational Intelligence and Big data Analytics for
Upstream, (2013), pp. 1–15.
[8] C.J.N. Sousa, I.H.F. Santos, V.T. Almeida, A.R. Almeida, G.M. Silva, A.E. Ciarlini,
et al., Applying Big Data Analytics to Logistics Processes of Oil and Gas Exploration
and Production through a Hybrid Modeling and Simulation, (2015).
[9] K. Hilgefort, Big data analysis using bayesian network modeling: a case study with
WG-ICDA of a gas storage field, Nace Int. (2018) 1–13.
[10] A. Sukapradja, J. Clark, H. Hermawan, S. Tjiptowiyono, E. Total, P. Indonesie, Sisi
nubi Dashboard : implementation of business intelligence in reservoir modelling &
Synthesis : managing Big data and streamline the decision making process, Field
General. 1–14 (2017).
[11] A. Mehta, Tapping the Value from Big Data Analytics, (2018) 2016–7.
[12] J. FeblowitzInsights IDCE, Analytics in Oil and Gas: the Big Deal about Big Data,
(2013), pp. 5–7.
[13] Trifu MR, Ivan ML. Big Data: Present and Future n.d.:32–41.
[14] H.E. Pence, What is Big Data and Why is it Important ? vol. 43, (2015), pp.
159–171, https://doi.org/10.2190/ET.43.2.d.
[15] J. Ishwarappa, J. Anuradha, A Brief Introduction on Big Data 5Vs Characteristics
and Hadoop Technology vol. 48, (2015), pp. 319–324, https://doi.org/10.1016/j.
procs.2015.04.188.
[16] M.S. Sumbal, E. Tsui, See-to EWK, M.S. Sumbal, E. TsuiSee-to EWK,
Interrelationship between Big Data and Knowledge Management : an Exploratory
Study in the Oil and Gas Sector, (2017), https://doi.org/10.1108/JKM-07-2016-
0262.
[17] D. Khvostichenko, S. Makarychev-mikhailov, Effect of fracturing chemicals on well
Productivity : avoiding pitfalls in Big data analysis, SPE Int. Conf. Exhib. Form.
Damage Control, Lafayette: Society of Petroleum Engineers, 2018, https://doi.org/
10.2118/189551-MS.
[18] M. Rehan, D. Gangodkar, Hadoop, MapReduce and HDFS: a developers perspective,
Procedia - Procedia Comput Sci 48 (2015) 45–50, https://doi.org/10.1016/j.procs.
2015.04.108.
[19] Borthakur D. HDFS Design n.d. https://hadoop.apache.org/docs/r1.2.1/hdfs_
design.html (Accessed August 7, 2018).
[20] A4ACADEMICS. MapReduce Architecture n.d. http://a4academics.com/tutorials/
83-hadoop/840-map-reduce-architecture (Accessed August 7, 2018).
[21] C. Gy?r?di, R. Gy?r?di, G. Pecherle, A. Olah, A comparative Study : MongoDB vs,
MySQL (2015) 0–5.
[22] N. Mounir, Y. Guo, Y. Panchal, I.M. Mohamed, A.W. Management, Integrating Big
Data: Simulation , Predictive Analytics , Real Time Monitoring , and Data
Warehousing in a Single Cloud Application, (2018).
[23] P. Warden, Big Data Glossary, O’REILLY, Sebastopol, CA, USA, 2011.
[24] T. Kudo, M. Ishino, K. Saotome, N. Kataoka, A proposal of transaction processing
method for MongoDB, Procedia - Procedia Comput Sci 96 (2016) 801–810, https://
doi.org/10.1016/j.procs.2016.08.251.
[25] S.J. Eglen, A Quick Guide to Teaching R Programming to Computational Biology
Students 5 (2009) 8–11, https://doi.org/10.1371/journal.pcbi.1000482.
[26] J. Spath, S.P.E. President, Big Data !, (2015).
[27] P. Anand, S. Resources, Big Data Is a Big Deal, (2013).
[28] A. Alfaleh, S. Aramco, Y. Wang, A. Texas, B. Yan, Topological Data Analysis to Solve
Big Data Problem in Reservoir Engineering : Application to Inverted 4D Seismic
Data, (2015).
[29] R. Roden, G. Insights, Seismic Interpretation in the Age of Big Data, (2016), pp.
4911–4915.
[30] P. Joshi, R. Thapliyal, A.A. Chittambakkam, R. Ghosh, S. Bhowmick, S.N. Khan,
OTC-28381-MS Big Data Analytics for Micro-seismic Monitoring, (2018), pp.
20–23.
[31] T. Olneva, D. Kuzmin, S. Rasskazova, A. Timirgalin, G. Ntc, Big data approach for
geological study of the Big region West Siberia, SPE Annu. Tech. Conf. Exhib, SPE,
Dallas, 20182018.
[32] N. Rossi, J. Michelez, F. Concina, Big Data for Advanced Well Engineering Holds
Strong Potential to Optimize Drilling Costs, (2018).
[33] W. Duffy, J. Rigg, E. Maidla, T.D.E. Petroleum, D. Solutions, Efficiency
Improvement in the Bakken Realized through Drilling Data Processing Automation
and the Recognition and Standardization of Best Safe Practices, (2017).
[34] E. Maidla, W. Maidla, J. Rigg, M. Crumrine, P. Wolf-zoellner, Drilling analysis using
Big data has been misused and abused, IADC/SPE Drill. Conf. Exhib., Fort Worth,
2018 https://doi.org/10.2118/189583-MS.
[35] Q. Yin, J. Yang, B. Zhou, M. Jiang, X. Chen, C. Fu, et al., Improve the Drilling
Operations Efficiency by the Big Data Mining of Real-time Logging, SPE/IADC-
189330-MS, 2018.
[36] J. Johnston, A. Guichard, New findings in drilling and wells using Big data analy-
tics, Offshore Technol. Conf, SPE, Houston, 2015.
[37] M. Hutchinson, L.D. International, B. Thornton, P. Theys, Optimizing drilling by
simulation and automation with Big data, SPE Annu. Tech. Conf. Exhib, Society of
Petroleum Engineers, Dallas, 2018, https://doi.org/10.2118/191427-MS.
[38] O. Bello, D. Yang, S. Lazarus, X.S. Wang, T. Denney, B.H. Incorporated, Next
Generation Downhole Big Data Platform for Dynamic Data-driven Well and
Reservoir Management, (2017).
[39] M.R. Brulé, I.B.M.S. Group, Big Data in E & P: Real-time Adaptive Analytics and
Data-flow Architecture, (2013), pp. 5–7.
[40] S.A. Haghighat, S.D. Mohaghegh, V. Gholami, A. Shahkarami, D. Moreno,
W. Virginia, Using Big Data and Smart Field Technology for Detecting Leakage in a
CO2 Storage Project, (2013), pp. 1–7.
[41] A.S. Popa, E. Grijalva, S. Cassidy, J. Medel, A. Cover, C. North, et al., SPE-174912-
MS Intelligent Use of Big Data for Heavy Oil Reservoir Management, (2015).
[42] A. Lin, Principles of Big Data Algorithms and Application for Unconventional Oil
Introduction : Insufficient Resources, (ISR) Computing and BD, 2014.
[43] C. Chelmis, J. Zhao, V. Sorathia, S. Agarwal, V. Prasanna, M. Hsieh, Semiautomatic ,
semantic assistance to manual curation of data in smart oil fields, SPE West. Reg.
Meet, SPE, Bakersfield, CA, USA, 2012, pp. 1–18.
[44] E. Udegbe, E. Morgan, S. Srinivasan, T. Pennsylvania, SPE-187328-MS from Face
Detection to Fractured Reservoir Characterization : Big Data Analytics for
Restimulation Candidate Selection, (2017).
[45] J. Xiao, X. Sun, Big Data Analytics Drive EOR Projects, (2017), pp. 5–8.
[46] D. Seemann, M.W. Spe, S. Hasan, S. Aramco, SPE 167482 Improving Resevoir
Management through Big Data Technologies, (2013), pp. 28–30.
[47] B.T. Rollins, A. Broussard, B. Cummins, A. Smiley, N. Dobbs, Continental produc-
tion allocation and analysis through Big data, Unconv. Resour. Technol. Conf,
Society of Petroleum Engineers, Austin, 2017, , https://doi.org/10.15530/urtec-
2017-2678296.
[48] N.P. Sarapulov, R.A. Khabibullin, SPE-187738-MS Application of Big Data Tools for
Unstructured Data Analysis to Improve ESP Operation Efficiency, (2017).
[49] S. Gupta, L. Saputelli, F. Corporation, M. Nikolaou, Big Data Analytics Workflow to
Safeguard ESP Operations in Real-time, (2016), pp. 25–27.
[50] T. Palmer, M. Turland, SPE-181216-MS Proactive Rod Pump Optimization:
Leveraging Big Data to Accelerate and Improve Operations, (2016).
[51] J. Betz, J.P.T.S. Writer, Low oil prices increase value of Big Data in fracturing, J.
Petol. Technol. 67 (2015) 60–61 https://doi.org/10.2118/0415-0060-JPT.
[52] M. Ockree, K.G. Brown, J. Frantz, M. Deasy, R. Resources-appalachia, Integrating
Big data analytics into development planning optimization, SPE/AAPG East. Reg.
Meet, Society of Petroleum Engineers, Pittsburgh, 2018, https://doi.org/10.2118/
191796-18ERM-MS.
[53] M Von Plate, C. Ag, SPE-181037-MS Big Data Analytics for Prognostic Foresight
New Dimension of Petroleum Asset Management, (2016), pp. 6–8.
[54] R. Brelsford, Repsol launches Big data, AI project at tarragona refinery, Oil Gas J.
116 (2018).
[55] A. Anagnostopoulos, Big Data Techniques for Ship Performance Study, (2018), pp.
887–893.
[56] S. Park, M. Roh, M. Oh, S. Kim, W. Lee, I. Kim, et al., Estimation model of energy
efficiency operational indicator using public data based on Big data technology,
28th Int. Ocean Polar Eng. Conf., Sapporo, International Society of Offshore and
Polar Engineers, 2018, pp. 894–897.
[57] M. Tarrahi, A. Shadravan, R. Llc, Advanced Big Data Analytics Improves HSE
Management, (2016).
[58] M. Tarrahi, A. Shadravan, R. Llc, Intelligent HSE Big Data Analytics Platform
Promotes Occupational Safety Fatal Occupational Injuries Data Base, (2016), pp.
26–28.
[59] C.B. Pettinger, Leading indicators , culture and Big Data: using your data to elim-
inate death, ASSE Prof. Dev. Conf. Expo, American Society of Safety Engineers,
Orlando, 2014.
[60] L. Cadei, M. Montini, F. Landi, F. Porcelli, V. Michetti, E.S. Upstream, et al., Big data
advanced anlytics to forecast operational upsets in upstream production system,
Abu Dhabi Int. Pet. Exhib. Conf, Society of Petroleum Engineers, Abu Dhabi, 2018,
https://doi.org/10.2118/193190-MS.
[61] R. Beckwith, Managing Big Data: cloud computing, J Pet Technol 63 (2011) 42–45
https://doi.org/10.2118/1011-0042-JPT.
[62] S. Konovalov, R. Irons-mclean, Addressing O & G Big Data Challenges at the Remote
Edge Fog Computing and Key Use Cases, (2015), pp. 3–5.
[63] D. Cameron, S. As, Big Data in Exploration and Production : Silicon Snake-oil ,
Magic Bullet , or Useful Tool? (2014).
[64] P. Neri, Big Data in the Digital Oilfield Requires Data Transfer Standards to Perform
Industry Environment a Trend towards More Collaboration Standardization the
Critical Role of Metadata, (2018), pp. 1–6.
[65] Y. Gidh, N. Deeks, L.O. Grovik, D. Johnson, J. Schey, J. Hollingsworth, Paving the
Way for Big Data Analytics through Improved Data Assurance and Data
Organization, (2016).
[66] A. Preveral, A. Trihoreau, N. Petit, Geographically-distributed Databases : a Big data
technology for production analysis in the oil & gas industry, SPE Intell. Energy Conf.
Exhib, Society of Petroleum Engineers, Utrecht, 2014, https://doi.org/10.2118/
167844-MS.