当前位置: 代码迷 >> python >> 如何解析,展平并将大JSON文件转换为CSV文件?
  详细解决方案

如何解析,展平并将大JSON文件转换为CSV文件?

热度:113   发布时间:2023-06-13 15:16:53.0

本质上,我有一个大文件(?1 GB),其中每行都是一个JSON对象,其中包含嵌套的属性,其中一些值可能是对象数组。

对象的示例:

{
  "business_id": "b9WZJp5L1RZr4F1nxclOoQ",
  "full_address": "1073 Washington Ave\nCarnegie, PA 15106",
  "hours": {
    "Monday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Tuesday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Friday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Wednesday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Thursday": {
      "close": "14:30",
      "open": "06:00"
    },
    "Sunday": {
      "close": "12:30",
      "open": "07:00"
    },
    "Saturday": {
      "close": "12:30",
      "open": "06:00"
    }
  },
  "open": true,
  "categories": ["Breakfast & Brunch", "Restaurants"],
  "city": "Carnegie",
  "review_count": 38,
  "name": "Gab & Eat",
  "neighborhoods": [],
  "longitude": -80.084799799999999,
  "state": "PA",
  "stars": 4.5,
  "latitude": 40.396744099999999,
  "attributes": {
    "Alcohol": "none",
    "Noise Level": "average",
    "Has TV": true,
    "Attire": "casual",
    "Ambience": {
      "romantic": false,
      "intimate": false,
      "classy": false,
      "hipster": false,
      "divey": true,
      "touristy": false,
      "trendy": false,
      "upscale": false,
      "casual": true
    },
    "Good for Kids": true,
    "Wheelchair Accessible": false,
    "Delivery": false,
    "Caters": true,
    "BYOB": false,
    "Corkage": false,
    "Accepts Credit Cards": false,
    "BYOB/Corkage": "yes_free",
    "Take-out": true,
    "Price Range": 1,
    "Outdoor Seating": false,
    "Takes Reservations": false,
    "Waiter Service": true,
    "Wi-Fi": "no",
    "Order at Counter": true,
    "Good For": {
      "dessert": false,
      "latenight": false,
      "lunch": false,
      "dinner": false,
      "brunch": false,
      "breakfast": true
    },
    "Parking": {
      "garage": false,
      "street": false,
      "validated": false,
      "lot": true,
      "valet": false
    },
    "Good For Kids": true,
    "Good For Groups": false
  },
  "type": "business"
},

如何将其展平并将其转换为CSV,以便有一个具有属性(列)的单个对象,例如business_idhours.Monday.closeattributes.Ambience.hipster等?

问题在于,并非所有对象都具有所有属性,因此我需要扫描整个文件以获取所有可能的平面属性的列表。 本质上,我试图模仿的功能,但对于数组值属性,我没有将其拆分为多列,而是将整个数组字符串作为值存储在CSV中。

如何使用Python或.NET完成此操作?

这可行。 可能需要进一步扩展以钻取数组等。使用Newtonsoft的JSON库,并假定JSON字符串是一个对象 ,而不是数组或基元(或其他任何东西)

void Main()
{
    var obj = JsonConvert.DeserializeObject(jsonStr) as JObject;    
    var props = GetPropPaths(string.Empty, obj);
    props.Dump();
}

private IEnumerable<Tuple<string, string>> GetPropPaths(string currPath, JObject obj)
{
    foreach(var prop in obj.Properties())
    {
        var propPath = string.IsNullOrWhiteSpace(currPath) ? prop.Name : currPath + "." + prop.Name;

        if (prop.Value.Type == JTokenType.Object)
        {   
            foreach(var subProp in GetPropPaths(propPath, prop.Value as JObject))
                yield return subProp;
        } else {
            yield return new Tuple<string, string>(propPath, prop.Value.ToString());
        }
    }
}

对于您上面的json,它提供了以下内容:

  相关解决方案