2016-09-21 246 views
1

我想運行下面的代碼,利用graphframes,現在我得到一個錯誤,最好的我的知識,經過幾個小時的谷歌搜索,我無法解決。看起來像一個班級不能加載,但我真的不知道我應該做什麼。PySpark,GraphFrames,異常導致:java.lang.ClassNotFoundException:com.typesafe.scalalogging.slf4j.LazyLogging

有人可以請再看看下面的代碼和錯誤?我已按照here的說明操作,如果您想快速嘗試,可以找到我的數據集here

""" 
Program: RUNNING GRAPH ANALYTICS WITH SPARK GRAPH-FRAMES: 
Author:  Dr. C. Hadjinikolis 
Date:  14/09/2016 
Description: This is the application's core module from where everything is executed. 
       The module is responsible for: 
       1. Loading Spark 
       2. Loading GraphFrames 
       3. Running analytics by leveraging other modules in the package. 
""" 
# IMPORT OTHER LIBS -------------------------------------------------------------------------------# 
import os 
import sys 
import pandas as pd 

# IMPORT SPARK ------------------------------------------------------------------------------------# 
# Path to Spark source folder 
USER_FILE_PATH = "/Users/christoshadjinikolis" 
SPARK_PATH = "/PycharmProjects/GenesAssociation" 
SPARK_FILE = "/spark-2.0.0-bin-hadoop2.7" 
SPARK_HOME = USER_FILE_PATH + SPARK_PATH + SPARK_FILE 
os.environ['SPARK_HOME'] = SPARK_HOME 

# Append pySpark to Python Path 
sys.path.append(SPARK_HOME + "/python") 
sys.path.append(SPARK_HOME + "/python" + "/lib/py4j-0.10.1-src.zip") 

try: 
    from pyspark import SparkContext 
    from pyspark import SparkConf 
    from pyspark.sql import SQLContext 
    from graphframes import * 

except ImportError as ex: 
    print "Can not import Spark Modules", ex 
    sys.exit(1) 

# GLOBAL VARIABLES --------------------------------------------------------------------------------# 
# Configure spark properties 
CONF = (SparkConf() 
     .setMaster("local") 
     .setAppName("My app") 
     .set("spark.executor.memory", "10g") 
     .set("spark.executor.instances", "4")) 

# Instantiate SparkContext object 
SC = SparkContext(conf=CONF) 

# Instantiate SQL_SparkContext object 
SQL_CONTEXT = SQLContext(SC) 

# MAIN CODE ---------------------------------------------------------------------------------------# 
if __name__ == "__main__": 

    # Main Path to CSV files 
    DATA_PATH = '/PycharmProjects/GenesAssociation/data/' 
    FILE_NAME = 'gene_gene_associations_50k.csv' 

    # LOAD DATA CSV USING PANDAS -----------------------------------------------------------------# 
    print "STEP 1: Loading Gene Nodes -------------------------------------------------------------" 
    # Read csv file and load as df 
    GENES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME, 
         usecols=['OFFICIAL_SYMBOL_A'], 
         low_memory=True, 
         iterator=True, 
         chunksize=1000) 

    # Concatenate chunks into list & convert to dataFrame 
    GENES_DF = pd.DataFrame(pd.concat(list(GENES), ignore_index=True)) 

    # Remove duplicates 
    GENES_DF_CLEAN = GENES_DF.drop_duplicates(keep='first') 

    # Name Columns 
    GENES_DF_CLEAN.columns = ['id'] 

    # Output dataFrame 
    print GENES_DF_CLEAN 

    # Create vertices 
    VERTICES = SQL_CONTEXT.createDataFrame(GENES_DF_CLEAN) 

    # Show some vertices 
    print VERTICES.take(5) 

    print "STEP 2: Loading Gene Edges -------------------------------------------------------------" 
    # Read csv file and load as df 
    EDGES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME, 
         usecols=['OFFICIAL_SYMBOL_A', 'OFFICIAL_SYMBOL_B', 'EXPERIMENTAL_SYSTEM'], 
         low_memory=True, 
         iterator=True, 
         chunksize=1000) 

    # Concatenate chunks into list & convert to dataFrame 
    EDGES_DF = pd.DataFrame(pd.concat(list(EDGES), ignore_index=True)) 

    # Name Columns 
    EDGES_DF.columns = ["src", "dst", "rel_type"] 

    # Output dataFrame 
    print EDGES_DF 

    # Create vertices 
    EDGES = SQL_CONTEXT.createDataFrame(EDGES_DF) 

    # Show some edges 
    print EDGES.take(5) 

    print "STEP 3: Generating the Graph -----------------------------------------------------------" 

    GENES_GRAPH = GraphFrame(VERTICES, EDGES) 

    print "STEP 4: Running Various Basic Analytics ------------------------------------------------" 
    print "Vertex in-Degree -----------------------------------------------------------------------" 
    GENES_GRAPH.inDegrees.sort('inDegree', ascending=False).show() 
    print "Vertex out-Degree ----------------------------------------------------------------------" 
    GENES_GRAPH.outDegrees.sort('outDegree', ascending=False).show() 
    print "Vertex degree --------------------------------------------------------------------------" 
    GENES_GRAPH.degrees.sort('degree', ascending=False).show() 
    print "Triangle Count -------------------------------------------------------------------------" 
    RESULTS = GENES_GRAPH.triangleCount() 
    RESULTS.select("id", "count").show() 
    print "Label Propagation ----------------------------------------------------------------------" 
    GENES_GRAPH.labelPropagation(maxIter=10).show()  # Convergence is not guaranteed 
    print "PageRank -------------------------------------------------------------------------------" 
    GENES_GRAPH.pageRank(resetProbability=0.15, tol=0.01)\ 
     .vertices.sort('pagerank', ascending=False).show() 

    print "STEP 5: Find Shortest Paths w.r.t. Landmarks -------------------------------------------" 
    # Shortest paths 
    SHORTEST_PATH = GENES_GRAPH.shortestPaths(landmarks=["ARF3", "MAP2K4"]) 
    SHORTEST_PATH.select("id", "distances").show() 

    print "STEP 6: Save Vertices and Edges --------------------------------------------------------" 
    # Save vertices and edges as Parquet to some location. 
    # Note: You can't overwrite existing vertices and edges directories. 
    GENES_GRAPH.vertices.write.parquet("vertices") 
    GENES_GRAPH.edges.write.parquet("edges") 

    print "STEP 7: Load " 
    # Load the vertices and edges back. 
    SAME_VERTICES = GENES_GRAPH.read.parquet("vertices") 
    SAME_EDGES = GENES_GRAPH.read.parquet("edges") 

    # Create an identical GraphFrame. 
    SAME_GENES_GRAPH = GF.GraphFrame(SAME_VERTICES, SAME_EDGES) 

# END OF FILE -------------------------------------------------------------------------------------# 

這是輸出:

火花殼:

Ivy Default Cache set to: /Users/username/.ivy2/cache 
The jars for the packages stored in: /Users/username/.ivy2/jars 
:: loading settings :: url = jar:file:/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml 
graphframes#graphframes added as a dependency 
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 
    confs: [default] 
    found graphframes#graphframes;0.2.0-spark2.0-s_2.11 in list 
    found com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 in list 
    found com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 in list 
    found org.scala-lang#scala-reflect;2.11.0 in list 
    [2.11.0] org.scala-lang#scala-reflect;2.11.0 
    found org.slf4j#slf4j-api;1.7.7 in list 
:: resolution report :: resolve 391ms :: artifacts dl 14ms 
    :: modules in use: 
    com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 from list in [default] 
    com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 from list in [default] 
    graphframes#graphframes;0.2.0-spark2.0-s_2.11 from list in [default] 
    org.scala-lang#scala-reflect;2.11.0 from list in [default] 
    org.slf4j#slf4j-api;1.7.7 from list in [default] 
    --------------------------------------------------------------------- 
    |     |   modules   || artifacts | 
    |  conf  | number| search|dwnlded|evicted|| number|dwnlded| 
    --------------------------------------------------------------------- 
    |  default  | 5 | 0 | 0 | 0 || 5 | 0 | 
    --------------------------------------------------------------------- 
:: retrieving :: org.apache.spark#spark-submit-parent 
    confs: [default] 
    0 artifacts copied, 5 already retrieved (0kB/11ms) 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
Setting default log level to "WARN". 
To adjust logging level use sc.setLogLevel(newLevel). 
16/09/20 11:00:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
OK1 
Traceback (most recent call last): 
    File "/Users/username/PycharmProjects/GenesAssociation/main.py", line 32, in <module> 
    g = GraphFrame(v, e) 
    File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 62, in __init__ 
    File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 34, in _java_api 
    File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__ 
    File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco 
    return f(*a, **kw) 
    File "/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib" \ 
     "/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o53.newInstance. 
: java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging 
    at java.lang.ClassLoader.defineClass1(Native Method) 
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763) 
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) 
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at java.lang.ClassLoader.defineClass1(Native Method) 
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763) 
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) 
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at org.graphframes.GraphFrame$.<init>(GraphFrame.scala:677) 
    at org.graphframes.GraphFrame$.<clinit>(GraphFrame.scala) 
    at org.graphframes.GraphFramePythonAPI.<init>(GraphFramePythonAPI.scala:11) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
    at java.lang.Class.newInstance(Class.java:442) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
    at py4j.Gateway.invoke(Gateway.java:280) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:211) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.ClassNotFoundException: com.typesafe.scalalogging.slf4j.LazyLogging 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 43 more 


Process finished with exit code 1 
+0

我得到一個非常類似的錯誤。你有解決這個問題嗎? – dvreed77

+0

尚未解決此錯誤,但您可能想在此處關注我的博客帖子:http://www.datareply.co.uk/blog/2016/9/20/running-graph-analytics-with-spark-graphframes-a - 簡單-示例 –

回答