2013-02-11 61 views
4

我試圖找出使用JPA的GAE的百萬扇出問題。如果我沒有理解的東西正確,我應該有什麼樣的Twitter(只是一個例子)以下實體:使用JPA的Google App Engine中的百萬用戶扇出使用JPA

public User { 
    @Id Key id; 
    String name; 
    String displayName; 
    List<Key> subscribers; // users 
} 

public Tweet { 
    @Id Key id; 
    User tweetMaker; 
    String message; 
} 

public TweetIndex { 
    @Id Key id; 
    Key tweetMaker;  // user 
    List<Key> subscribers; // users 
} 

當鳴叫製成,鳴叫對象被保存,並且TweetIndex被保存在tweetMaker是用戶發送推文,並且用戶從用戶對象複製到TweetIndex中。然後,我會查詢TweetIndex中的訂閱者以獲取特定訂戶的消息。

  1. 那麼有這樣的權利嗎?對於我來說事情變得模糊的是我期望訂戶被存儲到多值屬性中。由於多值屬性只能有5000個條目,我認爲應該爲每個5000個訂閱者ID重複TweetIndex。
  2. 什麼控件將多值屬性分爲5000組?我是否需要管理代碼中的重複保存?
  3. 而我將如何存儲訂戶的原始列表?在我看來,用戶對象中的用戶列表也將被限制爲相同的5000限制。

感謝您的任何答案/見解/建議!

+0

首先,你在哪裏得到的多值屬性的大小限制:是電線都在一起

/** * USE CASE : Retrieve tweets one user subscribed to * * Same goes for User subscription */ public class TweetSubscriptionShardedEntity { /** unused */ @Id Key shardKey; /** INDEXED : Tweet reference */ Key tweetId; /** INDEXED : Users reference */ List<Key> userKeys; /** INDEXED : subscriber count, to retrieve shards that are actually under the limitation of 20K */ int subscribersCount = 0; /** * Add a subscriber and increment the subscriberCount */ public void addSubscriber(Key userId) { userKeys.add(userId); subscribersCount++; } } 

樣品鳴叫服務?然後,爲了確保我的使用案例正確:人們可以訂閱他們尚未訂閱的人的推文嗎?我的意思是用戶可以在'TweetIndex'而不是'User'嗎? – 2014-03-11 16:42:06

+0

好吧,我明白了,多值屬性大小是有限的,當他們被索引(看起來更像20K雖然):http://stackoverflow.com/questions/20200307/what-is-maximum-size-limitation-of-listproperty- for-google-app-engine-datastore – 2014-03-11 17:16:23

回答

1

1)那麼有這樣的權利嗎? - >種類 多值屬性列表大小在索引時限制在20K左右(這是您的情況,因爲您將針對訂閱者ID運行查詢)What is maximum size/limitation of ListProperty for Google App Engine datastore? 總結一下,您將在這種用例中遇到的限制是: - 索引的多值屬性的大小(20K) - 實體大小(1MB) - 這應該是OK,除非你存儲BLOB在那裏

2)故障將需要人工處理,因爲我不不瞭解任何持久性框架。 Objectify是GAE數據存儲專用的唯一持久性框架,具有這種功能,但我沒有使用它,儘管如此IDK。

3)您需要清楚地理解推動您在GAE數據存儲上建模用例的約束條件。 在我看來,你仍然深受關係數據庫建模的影響:

由於您正在爲數百萬用戶進行規劃,因此您正在構建應用程序的規模和性能。這些「連接」恰恰是您必須避免的,這就是爲什麼您首先不使用RDBMS的原因。 重點是:DUPLICATE!非規範化以便您的數據與您的用例相匹配。

public class UserEntity { 

    @Id Key id; 
    String name; 

    /** INDEXED : to retrieve a user by display name */ 
    String displayName; 

    /** For the sake of the example below */ 
    int tweetCount; 

    /** 
    * USE CASE : See a user's followers from his "profile" page. 
    * 
    * Easily get subscribers data from your user entity. 
    * Duplicate UserEntity (this object) 's data in the UserSubscriberEntity. 
    * You just need to run an ancestor query on UserSubscriberEntity using the User id. 
    */ 
    List<UserSubscriberChildEntity> subscribers; 

} 

/** Duplicate user data in this entity, retrieved easily with an ancestor query */ 
public class UserSubscriberChildEntity { 
    /** The id of this entity */ 
    @Id Key subscriberId; 
    /** Duplicate your User Entity data */ 
    String name; 
    String displayName; 
    /** The id from the UserEntity referenced */ 
    String userId; 
} 

public class TweetEntity { 
    @Id Key id; 

    /** 
    * The actual text message 
    */ 
    String tweetContent; 

    /** 
    * USE CASE : display the tweet maker name alongside the tweet content. 
    * 
    * Duplicate user data to prevent an expensive join when not needed. 
    * You will always need to display this along with the tweet content ! 
    * Model your entity based on what you want to see when you display them 
    */ 
    String tweetMakerName; 
    String tweetMakerDisplayName; 
    /** 
    * USE CASE 
    * 1) to retrieve tweets MADE by a given user 
    * 2) In case you actually need to access the User entity 
    * (for example, if you remove this tweet and want to decrease the user tweet counter) 
    * 
    * INDEXED 
    */ 
    Key tweetMakerId; 

    /** 
    * USE CASE : display tweet subscribers from the "tweet page" 
    * 
    * Same as "UserSubscriberChildEntity", retrieve data fast by duplicating 
    */ 
    List<TweetSubscriberChildEntity> subscribers; 
} 

現在的核心問題: 你如何找回「所有的鳴叫一個用戶訂閱了」?

拆分您的訂閱翻過實體:

/** 
* Pseudo code 
*/ 
public class TweetService { 

    public List<TweetEntity> getTweetsSubscribed(Key userId) { 
     List<TweetEntity> tweetsFollowed = new ArrayList<TweetEntity>; 
     // Get all the subscriptions from a user 
     List<TweetSubscriberShardedEntity> shards = datastoreService.find("from TweetSubscriberShardedEntity where userKeys contains (userId)"); 
     // Iterate over each subscription to retrieve the complete Tweet 
     for (TweetSubscriberShardedEntity shard : shards) { 
      TweetEntity tweet = datastoreService.get(TweetEntity.class, shard.getTweetId); 
      tweetsFollowed.add(tweet); 
     } 
     return tweetsFollowed; 
    } 

    public void subscribeToTweet(Key subscriberId, Key tweetId) { 
     TweetSubscriberShardedEntity shardToUse = null; 
     // Only get the first shard with under 20000 subscribers 
     TweetSubscriberShardedEntity shardNotFull = datastoreService.find(" 
     FROM TweetSubscriberShardedEntity 
     WHERE tweetId == tweetId 
     AND userKeys contains (subscriberId) 
     AND subscribersCount < 20000 
     LIMIT 1"); 
     if (shardNotFull == null) { 
      // If no shard exist create one 
      shardToUse = new TweetSubscriberShardedEntity(); 
     } 
     else { 
      shardToUse = shardNotFull; 
     } 
     // Link user and tweet 
     shardToUse.setTweet(tweetId); 
     shardToUse.getUserKeys().add(subscriberId); 
     // Save shard 
     datastoreService.put(shardToUse); 
    } 

    /** 
    * Hard to put in a transaction with so many entities updated ! 
    * See cross entity group docs for more info. 
    */ 
    public void createTweet(UserEntity creator, TweetEntity newTweet) { 

     creator.tweetCount++; 
     newTweet.tweetMakerName = creator.name; 
     newTweet.tweetMakerDisplayName = creator.displayName; 
     newTweet.tweetMakerId = creator.id; 

     // Duplicate User subscribers to Tweet 
     for(UserSubscriberChildEntity userSubscriber : creator.subcribers) { 
      // Create a Tweet child entity 
      TweetSubscriberChildEntity tweetSubscriber = new TweetSubscriberChildEntity(); 
      tweetSubscriber.name = userSubscriber.name; 
      // ... (duplicate all data) 
      newTweet.add(tweetSubscriber); 

      // Create a shard with the previous method !! 
      subscribeToTweet(newTweet.id, subscriber.id); 
     }   
     // Update the user (tweet count) 
     datastoreService.put(creator); 
     // Create the new tweet and child entities (duplicated subscribers data) 
     datastoreService.put(newTweet);   
    } 

} 
+0

注意:TweetSubscriptionShardedEntity存在的唯一原因是因爲您擁有的訂閱數量。如果您每條推文的訂閱量只有20K,那麼您可以直接在推文上直接擁有訂閱者媒體資源(列表訂閱者)。 – 2014-03-11 18:25:19

相關問題